Sample records for performance computing infrastructure

  1. [Design and study of parallel computing environment of Monte Carlo simulation for particle therapy planning using a public cloud-computing infrastructure].

    PubMed

    Yokohama, Noriya

    2013-07-01

    This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.

  2. The High-Performance Computing and Communications program, the national information infrastructure and health care.

    PubMed Central

    Lindberg, D A; Humphreys, B L

    1995-01-01

    The High-Performance Computing and Communications (HPCC) program is a multiagency federal effort to advance the state of computing and communications and to provide the technologic platform on which the National Information Infrastructure (NII) can be built. The HPCC program supports the development of high-speed computers, high-speed telecommunications, related software and algorithms, education and training, and information infrastructure technology and applications. The vision of the NII is to extend access to high-performance computing and communications to virtually every U.S. citizen so that the technology can be used to improve the civil infrastructure, lifelong learning, energy management, health care, etc. Development of the NII will require resolution of complex economic and social issues, including information privacy. Health-related applications supported under the HPCC program and NII initiatives include connection of health care institutions to the Internet; enhanced access to gene sequence data; the "Visible Human" Project; and test-bed projects in telemedicine, electronic patient records, shared informatics tool development, and image systems. PMID:7614116

  3. Using Cloud Computing infrastructure with CloudBioLinux, CloudMan and Galaxy

    PubMed Central

    Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

    2012-01-01

    Cloud computing has revolutionized availability and access to computing and storage resources; making it possible to provision a large computational infrastructure with only a few clicks in a web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this protocol, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatics analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to setup the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command line interface, and the web-based Galaxy interface. PMID:22700313

  4. Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy.

    PubMed

    Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

    2012-06-01

    Cloud computing has revolutionized availability and access to computing and storage resources, making it possible to provision a large computational infrastructure with only a few clicks in a Web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this unit, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy, into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to set up the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command-line interface, and the Web-based Galaxy interface.

  5. Performance measurement and modeling of component applications in a high performance computing environment : a case study.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Armstrong, Robert C.; Ray, Jaideep; Malony, A.

    2003-11-01

    We present a case study of performance measurement and modeling of a CCA (Common Component Architecture) component-based application in a high performance computing environment. We explore issues peculiar to component-based HPC applications and propose a performance measurement infrastructure for HPC based loosely on recent work done for Grid environments. A prototypical implementation of the infrastructure is used to collect data for a three components in a scientific application and construct performance models for two of them. Both computational and message-passing performance are addressed.

  6. Distributed Accounting on the Grid

    NASA Technical Reports Server (NTRS)

    Thigpen, William; Hacker, Thomas J.; McGinnis, Laura F.; Athey, Brian D.

    2001-01-01

    By the late 1990s, the Internet was adequately equipped to move vast amounts of data between HPC (High Performance Computing) systems, and efforts were initiated to link together the national infrastructure of high performance computational and data storage resources together into a general computational utility 'grid', analogous to the national electrical power grid infrastructure. The purpose of the Computational grid is to provide dependable, consistent, pervasive, and inexpensive access to computational resources for the computing community in the form of a computing utility. This paper presents a fully distributed view of Grid usage accounting and a methodology for allocating Grid computational resources for use on a Grid computing system.

  7. Modeling the Cloud to Enhance Capabilities for Crises and Catastrophe Management

    DTIC Science & Technology

    2016-11-16

    order for cloud computing infrastructures to be successfully deployed in real world scenarios as tools for crisis and catastrophe management, where...Statement of the Problem Studied As cloud computing becomes the dominant computational infrastructure[1] and cloud technologies make a transition to hosting...1. Formulate rigorous mathematical models representing technological capabilities and resources in cloud computing for performance modeling and

  8. Elastic Cloud Computing Infrastructures in the Open Cirrus Testbed Implemented via Eucalyptus

    NASA Astrophysics Data System (ADS)

    Baun, Christian; Kunze, Marcel

    Cloud computing realizes the advantages and overcomes some restrictionsof the grid computing paradigm. Elastic infrastructures can easily be createdand managed by cloud users. In order to accelerate the research ondata center management and cloud services the OpenCirrusTM researchtestbed has been started by HP, Intel and Yahoo!. Although commercialcloud offerings are proprietary, Open Source solutions exist in the field ofIaaS with Eucalyptus, PaaS with AppScale and at the applications layerwith Hadoop MapReduce. This paper examines the I/O performance ofcloud computing infrastructures implemented with Eucalyptus in contrastto Amazon S3.

  9. RAPPORT: running scientific high-performance computing applications on the cloud.

    PubMed

    Cohen, Jeremy; Filippis, Ioannis; Woodbridge, Mark; Bauer, Daniela; Hong, Neil Chue; Jackson, Mike; Butcher, Sarah; Colling, David; Darlington, John; Fuchs, Brian; Harvey, Matt

    2013-01-28

    Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.

  10. Infrastructures for Distributed Computing: the case of BESIII

    NASA Astrophysics Data System (ADS)

    Pellegrino, J.

    2018-05-01

    The BESIII is an electron-positron collision experiment hosted at BEPCII in Beijing and aimed to investigate Tau-Charm physics. Now BESIII has been running for several years and gathered more than 1PB raw data. In order to analyze these data and perform massive Monte Carlo simulations, a large amount of computing and storage resources is needed. The distributed computing system is based up on DIRAC and it is in production since 2012. It integrates computing and storage resources from different institutes and a variety of resource types such as cluster, grid, cloud or volunteer computing. About 15 sites from BESIII Collaboration from all over the world joined this distributed computing infrastructure, giving a significant contribution to the IHEP computing facility. Nowadays cloud computing is playing a key role in the HEP computing field, due to its scalability and elasticity. Cloud infrastructures take advantages of several tools, such as VMDirac, to manage virtual machines through cloud managers according to the job requirements. With the virtually unlimited resources from commercial clouds, the computing capacity could scale accordingly in order to deal with any burst demands. General computing models have been discussed in the talk and are addressed herewith, with particular focus on the BESIII infrastructure. Moreover new computing tools and upcoming infrastructures will be addressed.

  11. Defense of Cyber Infrastructures Against Cyber-Physical Attacks Using Game-Theoretic Models

    DOE PAGES

    Rao, Nageswara S. V.; Poole, Stephen W.; Ma, Chris Y. T.; ...

    2015-04-06

    The operation of cyber infrastructures relies on both cyber and physical components, which are subject to incidental and intentional degradations of different kinds. Within the context of network and computing infrastructures, we study the strategic interactions between an attacker and a defender using game-theoretic models that take into account both cyber and physical components. The attacker and defender optimize their individual utilities expressed as sums of cost and system terms. First, we consider a Boolean attack-defense model, wherein the cyber and physical sub-infrastructures may be attacked and reinforced as individual units. Second, we consider a component attack-defense model wherein theirmore » components may be attacked and defended, and the infrastructure requires minimum numbers of both to function. We show that the Nash equilibrium under uniform costs in both cases is computable in polynomial time, and it provides high-level deterministic conditions for the infrastructure survival. When probabilities of successful attack and defense, and of incidental failures are incorporated into the models, the results favor the attacker but otherwise remain qualitatively similar. This approach has been motivated and validated by our experiences with UltraScience Net infrastructure, which was built to support high-performance network experiments. In conclusion, the analytical results, however, are more general, and we apply them to simplified models of cloud and high-performance computing infrastructures.« less

  12. Defense of Cyber Infrastructures Against Cyber-Physical Attacks Using Game-Theoretic Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rao, Nageswara S. V.; Poole, Stephen W.; Ma, Chris Y. T.

    The operation of cyber infrastructures relies on both cyber and physical components, which are subject to incidental and intentional degradations of different kinds. Within the context of network and computing infrastructures, we study the strategic interactions between an attacker and a defender using game-theoretic models that take into account both cyber and physical components. The attacker and defender optimize their individual utilities expressed as sums of cost and system terms. First, we consider a Boolean attack-defense model, wherein the cyber and physical sub-infrastructures may be attacked and reinforced as individual units. Second, we consider a component attack-defense model wherein theirmore » components may be attacked and defended, and the infrastructure requires minimum numbers of both to function. We show that the Nash equilibrium under uniform costs in both cases is computable in polynomial time, and it provides high-level deterministic conditions for the infrastructure survival. When probabilities of successful attack and defense, and of incidental failures are incorporated into the models, the results favor the attacker but otherwise remain qualitatively similar. This approach has been motivated and validated by our experiences with UltraScience Net infrastructure, which was built to support high-performance network experiments. In conclusion, the analytical results, however, are more general, and we apply them to simplified models of cloud and high-performance computing infrastructures.« less

  13. Cooperative high-performance storage in the accelerated strategic computing initiative

    NASA Technical Reports Server (NTRS)

    Gary, Mark; Howard, Barry; Louis, Steve; Minuzzo, Kim; Seager, Mark

    1996-01-01

    The use and acceptance of new high-performance, parallel computing platforms will be impeded by the absence of an infrastructure capable of supporting orders-of-magnitude improvement in hierarchical storage and high-speed I/O (Input/Output). The distribution of these high-performance platforms and supporting infrastructures across a wide-area network further compounds this problem. We describe an architectural design and phased implementation plan for a distributed, Cooperative Storage Environment (CSE) to achieve the necessary performance, user transparency, site autonomy, communication, and security features needed to support the Accelerated Strategic Computing Initiative (ASCI). ASCI is a Department of Energy (DOE) program attempting to apply terascale platforms and Problem-Solving Environments (PSEs) toward real-world computational modeling and simulation problems. The ASCI mission must be carried out through a unified, multilaboratory effort, and will require highly secure, efficient access to vast amounts of data. The CSE provides a logically simple, geographically distributed, storage infrastructure of semi-autonomous cooperating sites to meet the strategic ASCI PSE goal of highperformance data storage and access at the user desktop.

  14. Defense of Cyber Infrastructures Against Cyber-Physical Attacks Using Game-Theoretic Models.

    PubMed

    Rao, Nageswara S V; Poole, Stephen W; Ma, Chris Y T; He, Fei; Zhuang, Jun; Yau, David K Y

    2016-04-01

    The operation of cyber infrastructures relies on both cyber and physical components, which are subject to incidental and intentional degradations of different kinds. Within the context of network and computing infrastructures, we study the strategic interactions between an attacker and a defender using game-theoretic models that take into account both cyber and physical components. The attacker and defender optimize their individual utilities, expressed as sums of cost and system terms. First, we consider a Boolean attack-defense model, wherein the cyber and physical subinfrastructures may be attacked and reinforced as individual units. Second, we consider a component attack-defense model wherein their components may be attacked and defended, and the infrastructure requires minimum numbers of both to function. We show that the Nash equilibrium under uniform costs in both cases is computable in polynomial time, and it provides high-level deterministic conditions for the infrastructure survival. When probabilities of successful attack and defense, and of incidental failures, are incorporated into the models, the results favor the attacker but otherwise remain qualitatively similar. This approach has been motivated and validated by our experiences with UltraScience Net infrastructure, which was built to support high-performance network experiments. The analytical results, however, are more general, and we apply them to simplified models of cloud and high-performance computing infrastructures. © 2015 Society for Risk Analysis.

  15. Galaxy CloudMan: delivering cloud compute clusters.

    PubMed

    Afgan, Enis; Baker, Dannon; Coraor, Nate; Chapman, Brad; Nekrutenko, Anton; Taylor, James

    2010-12-21

    Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is "cloud computing", which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate "as is" use by experimental biologists. We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.

  16. Scientific Services on the Cloud

    NASA Astrophysics Data System (ADS)

    Chapman, David; Joshi, Karuna P.; Yesha, Yelena; Halem, Milt; Yesha, Yaacov; Nguyen, Phuong

    Scientific Computing was one of the first every applications for parallel and distributed computation. To this date, scientific applications remain some of the most compute intensive, and have inspired creation of petaflop compute infrastructure such as the Oak Ridge Jaguar and Los Alamos RoadRunner. Large dedicated hardware infrastructure has become both a blessing and a curse to the scientific community. Scientists are interested in cloud computing for much the same reason as businesses and other professionals. The hardware is provided, maintained, and administrated by a third party. Software abstraction and virtualization provide reliability, and fault tolerance. Graduated fees allow for multi-scale prototyping and execution. Cloud computing resources are only a few clicks away, and by far the easiest high performance distributed platform to gain access to. There may still be dedicated infrastructure for ultra-scale science, but the cloud can easily play a major part of the scientific computing initiative.

  17. Galaxy CloudMan: delivering cloud compute clusters

    PubMed Central

    2010-01-01

    Background Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is “cloud computing”, which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate “as is” use by experimental biologists. Results We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon’s EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. Conclusions The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge. PMID:21210983

  18. Analysis of Pervasive Mobile Ad Hoc Routing Protocols

    NASA Astrophysics Data System (ADS)

    Qadri, Nadia N.; Liotta, Antonio

    Mobile ad hoc networks (MANETs) are a fundamental element of pervasive networks and therefore, of pervasive systems that truly support pervasive computing, where user can communicate anywhere, anytime and on-the-fly. In fact, future advances in pervasive computing rely on advancements in mobile communication, which includes both infrastructure-based wireless networks and non-infrastructure-based MANETs. MANETs introduce a new communication paradigm, which does not require a fixed infrastructure - they rely on wireless terminals for routing and transport services. Due to highly dynamic topology, absence of established infrastructure for centralized administration, bandwidth constrained wireless links, and limited resources in MANETs, it is challenging to design an efficient and reliable routing protocol. This chapter reviews the key studies carried out so far on the performance of mobile ad hoc routing protocols. We discuss performance issues and metrics required for the evaluation of ad hoc routing protocols. This leads to a survey of existing work, which captures the performance of ad hoc routing algorithms and their behaviour from different perspectives and highlights avenues for future research.

  19. NCI's High Performance Computing (HPC) and High Performance Data (HPD) Computing Platform for Environmental and Earth System Data Science

    NASA Astrophysics Data System (ADS)

    Evans, Ben; Allen, Chris; Antony, Joseph; Bastrakova, Irina; Gohar, Kashif; Porter, David; Pugh, Tim; Santana, Fabiana; Smillie, Jon; Trenham, Claire; Wang, Jingbo; Wyborn, Lesley

    2015-04-01

    The National Computational Infrastructure (NCI) has established a powerful and flexible in-situ petascale computational environment to enable both high performance computing and Data-intensive Science across a wide spectrum of national environmental and earth science data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress so far to harmonise the underlying data collections for future interdisciplinary research across these large volume data collections. NCI has established 10+ PBytes of major national and international data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the major Australian national-scale scientific collections), leading research communities, and collaborating overseas organisations. New infrastructures created at NCI mean the data collections are now accessible within an integrated High Performance Computing and Data (HPC-HPD) environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large-scale high-bandwidth Lustre filesystems. The hardware was designed at inception to ensure that it would allow the layered software environment to flexibly accommodate the advancement of future data science. New approaches to software technology and data models have also had to be developed to enable access to these large and exponentially increasing data volumes at NCI. Traditional HPC and data environments are still made available in a way that flexibly provides the tools, services and supporting software systems on these new petascale infrastructures. But to enable the research to take place at this scale, the data, metadata and software now need to evolve together - creating a new integrated high performance infrastructure. The new infrastructure at NCI currently supports a catalogue of integrated, reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis. One of the challenges for NCI has been to support existing techniques and methods, while carefully preparing the underlying infrastructure for the transition needed for the next class of Data-intensive Science. In doing so, a flexible range of techniques and software can be made available for application across the corpus of data collections available, and to provide a new infrastructure for future interdisciplinary research.

  20. Towards Portable Large-Scale Image Processing with High-Performance Computing.

    PubMed

    Huo, Yuankai; Blaber, Justin; Damon, Stephen M; Boyd, Brian D; Bao, Shunxing; Parvathaneni, Prasanna; Noguera, Camilo Bermudez; Chaganti, Shikha; Nath, Vishwesh; Greer, Jasmine M; Lyu, Ilwoo; French, William R; Newton, Allen T; Rogers, Baxter P; Landman, Bennett A

    2018-05-03

    High-throughput, large-scale medical image computing demands tight integration of high-performance computing (HPC) infrastructure for data storage, job distribution, and image processing. The Vanderbilt University Institute for Imaging Science (VUIIS) Center for Computational Imaging (CCI) has constructed a large-scale image storage and processing infrastructure that is composed of (1) a large-scale image database using the eXtensible Neuroimaging Archive Toolkit (XNAT), (2) a content-aware job scheduling platform using the Distributed Automation for XNAT pipeline automation tool (DAX), and (3) a wide variety of encapsulated image processing pipelines called "spiders." The VUIIS CCI medical image data storage and processing infrastructure have housed and processed nearly half-million medical image volumes with Vanderbilt Advanced Computing Center for Research and Education (ACCRE), which is the HPC facility at the Vanderbilt University. The initial deployment was natively deployed (i.e., direct installations on a bare-metal server) within the ACCRE hardware and software environments, which lead to issues of portability and sustainability. First, it could be laborious to deploy the entire VUIIS CCI medical image data storage and processing infrastructure to another HPC center with varying hardware infrastructure, library availability, and software permission policies. Second, the spiders were not developed in an isolated manner, which has led to software dependency issues during system upgrades or remote software installation. To address such issues, herein, we describe recent innovations using containerization techniques with XNAT/DAX which are used to isolate the VUIIS CCI medical image data storage and processing infrastructure from the underlying hardware and software environments. The newly presented XNAT/DAX solution has the following new features: (1) multi-level portability from system level to the application level, (2) flexible and dynamic software development and expansion, and (3) scalable spider deployment compatible with HPC clusters and local workstations.

  1. NASA's Participation in the National Computational Grid

    NASA Technical Reports Server (NTRS)

    Feiereisen, William J.; Zornetzer, Steve F. (Technical Monitor)

    1998-01-01

    Over the last several years it has become evident that the character of NASA's supercomputing needs has changed. One of the major missions of the agency is to support the design and manufacture of aero- and space-vehicles with technologies that will significantly reduce their cost. It is becoming clear that improvements in the process of aerospace design and manufacturing will require a high performance information infrastructure that allows geographically dispersed teams to draw upon resources that are broader than traditional supercomputing. A computational grid draws together our information resources into one system. We can foresee the time when a Grid will allow engineers and scientists to use the tools of supercomputers, databases and on line experimental devices in a virtual environment to collaborate with distant colleagues. The concept of a computational grid has been spoken of for many years, but several events in recent times are conspiring to allow us to actually build one. In late 1997 the National Science Foundation initiated the Partnerships for Advanced Computational Infrastructure (PACI) which is built around the idea of distributed high performance computing. The Alliance lead, by the National Computational Science Alliance (NCSA), and the National Partnership for Advanced Computational Infrastructure (NPACI), lead by the San Diego Supercomputing Center, have been instrumental in drawing together the "Grid Community" to identify the technology bottlenecks and propose a research agenda to address them. During the same period NASA has begun to reformulate parts of two major high performance computing research programs to concentrate on distributed high performance computing and has banded together with the PACI centers to address the research agenda in common.

  2. VMEbus based computer and real-time UNIX as infrastructure of DAQ

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yasu, Y.; Fujii, H.; Nomachi, M.

    1994-12-31

    This paper describes what the authors have constructed as the infrastructure of data acquisition system (DAQ). The paper reports recent developments concerned with HP VME board computer with LynxOS (HP742rt/HP-RT) and Alpha/OSF1 with VMEbus adapter. The paper also reports current status of developing a Benchmark Suite for Data Acquisition (DAQBENCH) for measuring not only the performance of VME/CAMAC access but also that of the context switching, the inter-process communications and so on, for various computers including Workstation-based systems and VME board computers.

  3. Parallel high-performance grid computing: capabilities and opportunities of a novel demanding service and business class allowing highest resource efficiency.

    PubMed

    Kepper, Nick; Ettig, Ramona; Dickmann, Frank; Stehr, Rene; Grosveld, Frank G; Wedemann, Gero; Knoch, Tobias A

    2010-01-01

    Especially in the life-science and the health-care sectors the huge IT requirements are imminent due to the large and complex systems to be analysed and simulated. Grid infrastructures play here a rapidly increasing role for research, diagnostics, and treatment, since they provide the necessary large-scale resources efficiently. Whereas grids were first used for huge number crunching of trivially parallelizable problems, increasingly parallel high-performance computing is required. Here, we show for the prime example of molecular dynamic simulations how the presence of large grid clusters including very fast network interconnects within grid infrastructures allows now parallel high-performance grid computing efficiently and thus combines the benefits of dedicated super-computing centres and grid infrastructures. The demands for this service class are the highest since the user group has very heterogeneous requirements: i) two to many thousands of CPUs, ii) different memory architectures, iii) huge storage capabilities, and iv) fast communication via network interconnects, are all needed in different combinations and must be considered in a highly dedicated manner to reach highest performance efficiency. Beyond, advanced and dedicated i) interaction with users, ii) the management of jobs, iii) accounting, and iv) billing, not only combines classic with parallel high-performance grid usage, but more importantly is also able to increase the efficiency of IT resource providers. Consequently, the mere "yes-we-can" becomes a huge opportunity like e.g. the life-science and health-care sectors as well as grid infrastructures by reaching higher level of resource efficiency.

  4. Exploring Cloud Computing for Large-scale Scientific Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lin, Guang; Han, Binh; Yin, Jian

    This paper explores cloud computing for large-scale data-intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address thesemore » challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.« less

  5. A cyber infrastructure for the SKA Telescope Manager

    NASA Astrophysics Data System (ADS)

    Barbosa, Domingos; Barraca, João. P.; Carvalho, Bruno; Maia, Dalmiro; Gupta, Yashwant; Natarajan, Swaminathan; Le Roux, Gerhard; Swart, Paul

    2016-07-01

    The Square Kilometre Array Telescope Manager (SKA TM) will be responsible for assisting the SKA Operations and Observation Management, carrying out System diagnosis and collecting Monitoring and Control data from the SKA subsystems and components. To provide adequate compute resources, scalability, operation continuity and high availability, as well as strict Quality of Service, the TM cyber-infrastructure (embodied in the Local Infrastructure - LINFRA) consists of COTS hardware and infrastructural software (for example: server monitoring software, host operating system, virtualization software, device firmware), providing a specially tailored Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) solution. The TM infrastructure provides services in the form of computational power, software defined networking, power, storage abstractions, and high level, state of the art IaaS and PaaS management interfaces. This cyber platform will be tailored to each of the two SKA Phase 1 telescopes (SKA_MID in South Africa and SKA_LOW in Australia) instances, each presenting different computational and storage infrastructures and conditioned by location. This cyber platform will provide a compute model enabling TM to manage the deployment and execution of its multiple components (observation scheduler, proposal submission tools, MandC components, Forensic tools and several Databases, etc). In this sense, the TM LINFRA is primarily focused towards the provision of isolated instances, mostly resorting to virtualization technologies, while defaulting to bare hardware if specifically required due to performance, security, availability, or other requirement.

  6. Large-scale parallel genome assembler over cloud computing environment.

    PubMed

    Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

    2017-06-01

    The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.

  7. High-performance computing with quantum processing units

    DOE PAGES

    Britt, Keith A.; Oak Ridge National Lab.; Humble, Travis S.; ...

    2017-03-01

    The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors into modern high-performance computing (HPC) systems. We examine how QPUs can be integrated into current and future HPC system architectures by accounting for func- tional and physical design requirements. We identify two integration pathways that are differentiated by infrastructure constraints on the QPU and the use cases expected for the HPC system. This includes a tight integration that assumes infrastructure bottlenecks can be overcome as well asmore » a loose integration that as- sumes they cannot. We find that the performance of both approaches is likely to depend on the quantum interconnect that serves to entangle multiple QPUs. As a result, we also identify several challenges in assessing QPU performance for HPC, and we consider new metrics that capture the interplay between system architecture and the quantum parallelism underlying computational performance.« less

  8. High-performance computing with quantum processing units

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Britt, Keith A.; Oak Ridge National Lab.; Humble, Travis S.

    The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors into modern high-performance computing (HPC) systems. We examine how QPUs can be integrated into current and future HPC system architectures by accounting for func- tional and physical design requirements. We identify two integration pathways that are differentiated by infrastructure constraints on the QPU and the use cases expected for the HPC system. This includes a tight integration that assumes infrastructure bottlenecks can be overcome as well asmore » a loose integration that as- sumes they cannot. We find that the performance of both approaches is likely to depend on the quantum interconnect that serves to entangle multiple QPUs. As a result, we also identify several challenges in assessing QPU performance for HPC, and we consider new metrics that capture the interplay between system architecture and the quantum parallelism underlying computational performance.« less

  9. High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis

    PubMed Central

    Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E.; Tkachenko, Valery; Torcivia-Rodriguez, John; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja

    2016-01-01

    The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure. The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu PMID:26989153

  10. High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis.

    PubMed

    Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E; Tkachenko, Valery; Torcivia-Rodriguez, John; Voskanian, Alin; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja

    2016-01-01

    The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu. © The Author(s) 2016. Published by Oxford University Press.

  11. Benchmarking infrastructure for mutation text mining

    PubMed Central

    2014-01-01

    Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600

  12. Benchmarking infrastructure for mutation text mining.

    PubMed

    Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

    2014-02-25

    Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.

  13. Parallel, distributed and GPU computing technologies in single-particle electron microscopy

    PubMed Central

    Schmeisser, Martin; Heisen, Burkhard C.; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

    2009-01-01

    Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today’s technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined. PMID:19564686

  14. Parallel, distributed and GPU computing technologies in single-particle electron microscopy.

    PubMed

    Schmeisser, Martin; Heisen, Burkhard C; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

    2009-07-01

    Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today's technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined.

  15. Testing of SWMM Model’s LID Modules

    EPA Science Inventory

    EPA’s Storm Water Management Model (SWMM) is a computational code heavily relied upon by industry for the simulation of wastewater and stormwater infrastructure performance to . design and build multi-billion-dollar, multi-decade infrastructure upgrades. Since the 1970’s, EPA a...

  16. Monitoring performance of a highly distributed and complex computing infrastructure in LHCb

    NASA Astrophysics Data System (ADS)

    Mathe, Z.; Haen, C.; Stagni, F.

    2017-10-01

    In order to ensure an optimal performance of the LHCb Distributed Computing, based on LHCbDIRAC, it is necessary to be able to inspect the behavior over time of many components: firstly the agents and services on which the infrastructure is built, but also all the computing tasks and data transfers that are managed by this infrastructure. This consists of recording and then analyzing time series of a large number of observables, for which the usage of SQL relational databases is far from optimal. Therefore within DIRAC we have been studying novel possibilities based on NoSQL databases (ElasticSearch, OpenTSDB and InfluxDB) as a result of this study we developed a new monitoring system based on ElasticSearch. It has been deployed on the LHCb Distributed Computing infrastructure for which it collects data from all the components (agents, services, jobs) and allows creating reports through Kibana and a web user interface, which is based on the DIRAC web framework. In this paper we describe this new implementation of the DIRAC monitoring system. We give details on the ElasticSearch implementation within the DIRAC general framework, as well as an overview of the advantages of the pipeline aggregation used for creating a dynamic bucketing of the time series. We present the advantages of using the ElasticSearch DSL high-level library for creating and running queries. Finally we shall present the performances of that system.

  17. A Grid Infrastructure for Supporting Space-based Science Operations

    NASA Technical Reports Server (NTRS)

    Bradford, Robert N.; Redman, Sandra H.; McNair, Ann R. (Technical Monitor)

    2002-01-01

    Emerging technologies for computational grid infrastructures have the potential for revolutionizing the way computers are used in all aspects of our lives. Computational grids are currently being implemented to provide a large-scale, dynamic, and secure research and engineering environments based on standards and next-generation reusable software, enabling greater science and engineering productivity through shared resources and distributed computing for less cost than traditional architectures. Combined with the emerging technologies of high-performance networks, grids provide researchers, scientists and engineers the first real opportunity for an effective distributed collaborative environment with access to resources such as computational and storage systems, instruments, and software tools and services for the most computationally challenging applications.

  18. Big-BOE: Fusing Spanish Official Gazette with Big Data Technology.

    PubMed

    Basanta-Val, Pablo; Sánchez-Fernández, Luis

    2018-06-01

    The proliferation of new data sources, stemmed from the adoption of open-data schemes, in combination with an increasing computing capacity causes the inception of new type of analytics that process Internet of things with low-cost engines to speed up data processing using parallel computing. In this context, the article presents an initiative, called BIG-Boletín Oficial del Estado (BOE), designed to process the Spanish official government gazette (BOE) with state-of-the-art processing engines, to reduce computation time and to offer additional speed up for big data analysts. The goal of including a big data infrastructure is to be able to process different BOE documents in parallel with specific analytics, to search for several issues in different documents. The application infrastructure processing engine is described from an architectural perspective and from performance, showing evidence on how this type of infrastructure improves the performance of different types of simple analytics as several machines cooperate.

  19. User-level framework for performance monitoring of HPC applications

    NASA Astrophysics Data System (ADS)

    Hristova, R.; Goranov, G.

    2013-10-01

    HP-SEE is an infrastructure that links the existing HPC facilities in South East Europe in a common infrastructure. The analysis of the performance monitoring of the High-Performance Computing (HPC) applications in the infrastructure can be useful for the end user as diagnostic for the overall performance of his applications. The existing monitoring tools for HP-SEE provide to the end user only aggregated information for all applications. Usually, the user does not have permissions to select only the relevant information for him and for his applications. In this article we present a framework for performance monitoring of the HPC applications in the HP-SEE infrastructure. The framework provides standardized performance metrics, which every user can use in order to monitor his applications. Furthermore as a part of the framework a program interface is developed. The interface allows the user to publish metrics data from his application and to read and analyze gathered information. Publishing and reading through the framework is possible only with grid certificate valid for the infrastructure. Therefore the user is authorized to access only the data for his applications.

  20. Development of a SaaS application probe to the physical properties of the Earth's interior: An attempt at moving HPC to the cloud

    NASA Astrophysics Data System (ADS)

    Huang, Qian

    2014-09-01

    Scientific computing often requires the availability of a massive number of computers for performing large-scale simulations, and computing in mineral physics is no exception. In order to investigate physical properties of minerals at extreme conditions in computational mineral physics, parallel computing technology is used to speed up the performance by utilizing multiple computer resources to process a computational task simultaneously thereby greatly reducing computation time. Traditionally, parallel computing has been addressed by using High Performance Computing (HPC) solutions and installed facilities such as clusters and super computers. Today, it has been seen that there is a tremendous growth in cloud computing. Infrastructure as a Service (IaaS), the on-demand and pay-as-you-go model, creates a flexible and cost-effective mean to access computing resources. In this paper, a feasibility report of HPC on a cloud infrastructure is presented. It is found that current cloud services in IaaS layer still need to improve performance to be useful to research projects. On the other hand, Software as a Service (SaaS), another type of cloud computing, is introduced into an HPC system for computing in mineral physics, and an application of which is developed. In this paper, an overall description of this SaaS application is presented. This contribution can promote cloud application development in computational mineral physics, and cross-disciplinary studies.

  1. Cloud computing applications for biomedical science: A perspective.

    PubMed

    Navale, Vivek; Bourne, Philip E

    2018-06-01

    Biomedical research has become a digital data-intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research.

  2. Cloud computing applications for biomedical science: A perspective

    PubMed Central

    2018-01-01

    Biomedical research has become a digital data–intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research. PMID:29902176

  3. ATLAS computing on Swiss Cloud SWITCHengines

    NASA Astrophysics Data System (ADS)

    Haug, S.; Sciacca, F. G.; ATLAS Collaboration

    2017-10-01

    Consolidation towards more computing at flat budgets beyond what pure chip technology can offer, is a requirement for the full scientific exploitation of the future data from the Large Hadron Collider at CERN in Geneva. One consolidation measure is to exploit cloud infrastructures whenever they are financially competitive. We report on the technical solutions and the performances used and achieved running simulation tasks for the ATLAS experiment on SWITCHengines. SWITCHengines is a new infrastructure as a service offered to Swiss academia by the National Research and Education Network SWITCH. While solutions and performances are general, financial considerations and policies, on which we also report, are country specific.

  4. Big data computing: Building a vision for ARS information management

    USDA-ARS?s Scientific Manuscript database

    Improvements are needed within the ARS to increase scientific capacity and keep pace with new developments in computer technologies that support data acquisition and analysis. Enhancements in computing power and IT infrastructure are needed to provide scientists better access to high performance com...

  5. Challenges and opportunities of cloud computing for atmospheric sciences

    NASA Astrophysics Data System (ADS)

    Pérez Montes, Diego A.; Añel, Juan A.; Pena, Tomás F.; Wallom, David C. H.

    2016-04-01

    Cloud computing is an emerging technological solution widely used in many fields. Initially developed as a flexible way of managing peak demand it has began to make its way in scientific research. One of the greatest advantages of cloud computing for scientific research is independence of having access to a large cyberinfrastructure to fund or perform a research project. Cloud computing can avoid maintenance expenses for large supercomputers and has the potential to 'democratize' the access to high-performance computing, giving flexibility to funding bodies for allocating budgets for the computational costs associated with a project. Two of the most challenging problems in atmospheric sciences are computational cost and uncertainty in meteorological forecasting and climate projections. Both problems are closely related. Usually uncertainty can be reduced with the availability of computational resources to better reproduce a phenomenon or to perform a larger number of experiments. Here we expose results of the application of cloud computing resources for climate modeling using cloud computing infrastructures of three major vendors and two climate models. We show how the cloud infrastructure compares in performance to traditional supercomputers and how it provides the capability to complete experiments in shorter periods of time. The monetary cost associated is also analyzed. Finally we discuss the future potential of this technology for meteorological and climatological applications, both from the point of view of operational use and research.

  6. International Symposium on Grids and Clouds (ISGC) 2016

    NASA Astrophysics Data System (ADS)

    The International Symposium on Grids and Clouds (ISGC) 2016 will be held at Academia Sinica in Taipei, Taiwan from 13-18 March 2016, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC). The theme of ISGC 2016 focuses on“Ubiquitous e-infrastructures and Applications”. Contemporary research is impossible without a strong IT component - researchers rely on the existence of stable and widely available e-infrastructures and their higher level functions and properties. As a result of these expectations, e-Infrastructures are becoming ubiquitous, providing an environment that supports large scale collaborations that deal with global challenges as well as smaller and temporal research communities focusing on particular scientific problems. To support those diversified communities and their needs, the e-Infrastructures themselves are becoming more layered and multifaceted, supporting larger groups of applications. Following the call for the last year conference, ISGC 2016 continues its aim to bring together users and application developers with those responsible for the development and operation of multi-purpose ubiquitous e-Infrastructures. Topics of discussion include Physics (including HEP) and Engineering Applications, Biomedicine & Life Sciences Applications, Earth & Environmental Sciences & Biodiversity Applications, Humanities, Arts, and Social Sciences (HASS) Applications, Virtual Research Environment (including Middleware, tools, services, workflow, etc.), Data Management, Big Data, Networking & Security, Infrastructure & Operations, Infrastructure Clouds and Virtualisation, Interoperability, Business Models & Sustainability, Highly Distributed Computing Systems, and High Performance & Technical Computing (HPTC), etc.

  7. Scalable Analysis Methods and In Situ Infrastructure for Extreme Scale Knowledge Discovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Duque, Earl P.N.; Whitlock, Brad J.

    High performance computers have for many years been on a trajectory that gives them extraordinary compute power with the addition of more and more compute cores. At the same time, other system parameters such as the amount of memory per core and bandwidth to storage have remained constant or have barely increased. This creates an imbalance in the computer, giving it the ability to compute a lot of data that it cannot reasonably save out due to time and storage constraints. While technologies have been invented to mitigate this problem (burst buffers, etc.), software has been adapting to employ inmore » situ libraries which perform data analysis and visualization on simulation data while it is still resident in memory. This avoids the need to ever have to pay the costs of writing many terabytes of data files. Instead, in situ enables the creation of more concentrated data products such as statistics, plots, and data extracts, which are all far smaller than the full-sized volume data. With the increasing popularity of in situ, multiple in situ infrastructures have been created, each with its own mechanism for integrating with a simulation. To make it easier to instrument a simulation with multiple in situ infrastructures and include custom analysis algorithms, this project created the SENSEI framework.« less

  8. Using high-performance networks to enable computational aerosciences applications

    NASA Technical Reports Server (NTRS)

    Johnson, Marjory J.

    1992-01-01

    One component of the U.S. Federal High Performance Computing and Communications Program (HPCCP) is the establishment of a gigabit network to provide a communications infrastructure for researchers across the nation. This gigabit network will provide new services and capabilities, in addition to increased bandwidth, to enable future applications. An understanding of these applications is necessary to guide the development of the gigabit network and other high-performance networks of the future. In this paper we focus on computational aerosciences applications run remotely using the Numerical Aerodynamic Simulation (NAS) facility located at NASA Ames Research Center. We characterize these applications in terms of network-related parameters and relate user experiences that reveal limitations imposed by the current wide-area networking infrastructure. Then we investigate how the development of a nationwide gigabit network would enable users of the NAS facility to work in new, more productive ways.

  9. Perm State University HPC-hardware and software services: capabilities for aircraft engine aeroacoustics problems solving

    NASA Astrophysics Data System (ADS)

    Demenev, A. G.

    2018-02-01

    The present work is devoted to analyze high-performance computing (HPC) infrastructure capabilities for aircraft engine aeroacoustics problems solving at Perm State University. We explore here the ability to develop new computational aeroacoustics methods/solvers for computer-aided engineering (CAE) systems to handle complicated industrial problems of engine noise prediction. Leading aircraft engine engineering company, including “UEC-Aviadvigatel” JSC (our industrial partners in Perm, Russia), require that methods/solvers to optimize geometry of aircraft engine for fan noise reduction. We analysed Perm State University HPC-hardware resources and software services to use efficiently. The performed results demonstrate that Perm State University HPC-infrastructure are mature enough to face out industrial-like problems of development CAE-system with HPC-method and CFD-solvers.

  10. In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms, a State-of-the-art (STAR) Report

    DOE PAGES

    Bethel, EW; Bauer, A; Abbasi, H; ...

    2016-06-10

    The considerable interest in the high performance computing (HPC) community regarding analyzing and visualization data without first writing to disk, i.e., in situ processing, is due to several factors. First is an I/O cost savings, where data is analyzed /visualized while being generated, without first storing to a filesystem. Second is the potential for increased accuracy, where fine temporal sampling of transient analysis might expose some complex behavior missed in coarse temporal sampling. Third is the ability to use all available resources, CPU’s and accelerators, in the computation of analysis products. This STAR paper brings together researchers, developers and practitionersmore » using in situ methods in extreme-scale HPC with the goal to present existing methods, infrastructures, and a range of computational science and engineering applications using in situ analysis and visualization.« less

  11. Building a Community Infrastructure for Scalable On-Line Performance Analysis Tools around Open|Speedshop

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, Barton

    2014-06-30

    Peta-scale computing environments pose significant challenges for both system and application developers and addressing them required more than simply scaling up existing tera-scale solutions. Performance analysis tools play an important role in gaining this understanding, but previous monolithic tools with fixed feature sets have not sufficed. Instead, this project worked on the design, implementation, and evaluation of a general, flexible tool infrastructure supporting the construction of performance tools as “pipelines” of high-quality tool building blocks. These tool building blocks provide common performance tool functionality, and are designed for scalability, lightweight data acquisition and analysis, and interoperability. For this project, wemore » built on Open|SpeedShop, a modular and extensible open source performance analysis tool set. The design and implementation of such a general and reusable infrastructure targeted for petascale systems required us to address several challenging research issues. All components needed to be designed for scale, a task made more difficult by the need to provide general modules. The infrastructure needed to support online data aggregation to cope with the large amounts of performance and debugging data. We needed to be able to map any combination of tool components to each target architecture. And we needed to design interoperable tool APIs and workflows that were concrete enough to support the required functionality, yet provide the necessary flexibility to address a wide range of tools. A major result of this project is the ability to use this scalable infrastructure to quickly create tools that match with a machine architecture and a performance problem that needs to be understood. Another benefit is the ability for application engineers to use the highly scalable, interoperable version of Open|SpeedShop, which are reassembled from the tool building blocks into a flexible, multi-user interface set of tools. This set of tools targeted at Office of Science Leadership Class computer systems and selected Office of Science application codes. We describe the contributions made by the team at the University of Wisconsin. The project built on the efforts in Open|SpeedShop funded by DOE/NNSA and the DOE/NNSA Tri-Lab community, extended Open|Speedshop to the Office of Science Leadership Class Computing Facilities, and addressed new challenges found on these cutting edge systems. Work done under this project at Wisconsin can be divided into two categories, new algorithms and techniques for debugging, and foundation infrastructure work on our Dyninst binary analysis and instrumentation toolkits and MRNet scalability infrastructure.« less

  12. Signal and image processing algorithm performance in a virtual and elastic computing environment

    NASA Astrophysics Data System (ADS)

    Bennett, Kelly W.; Robertson, James

    2013-05-01

    The U.S. Army Research Laboratory (ARL) supports the development of classification, detection, tracking, and localization algorithms using multiple sensing modalities including acoustic, seismic, E-field, magnetic field, PIR, and visual and IR imaging. Multimodal sensors collect large amounts of data in support of algorithm development. The resulting large amount of data, and their associated high-performance computing needs, increases and challenges existing computing infrastructures. Purchasing computer power as a commodity using a Cloud service offers low-cost, pay-as-you-go pricing models, scalability, and elasticity that may provide solutions to develop and optimize algorithms without having to procure additional hardware and resources. This paper provides a detailed look at using a commercial cloud service provider, such as Amazon Web Services (AWS), to develop and deploy simple signal and image processing algorithms in a cloud and run the algorithms on a large set of data archived in the ARL Multimodal Signatures Database (MMSDB). Analytical results will provide performance comparisons with existing infrastructure. A discussion on using cloud computing with government data will discuss best security practices that exist within cloud services, such as AWS.

  13. Business Models of High Performance Computing Centres in Higher Education in Europe

    ERIC Educational Resources Information Center

    Eurich, Markus; Calleja, Paul; Boutellier, Roman

    2013-01-01

    High performance computing (HPC) service centres are a vital part of the academic infrastructure of higher education organisations. However, despite their importance for research and the necessary high capital expenditures, business research on HPC service centres is mostly missing. From a business perspective, it is important to find an answer to…

  14. VERA 3.6 Release Notes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williamson, Richard L.; Kochunas, Brendan; Adams, Brian M.

    The Virtual Environment for Reactor Applications components included in this distribution include selected computational tools and supporting infrastructure that solve neutronics, thermal-hydraulics, fuel performance, and coupled neutronics-thermal hydraulics problems. The infrastructure components provide a simplified common user input capability and provide for the physics integration with data transfer and coupled-physics iterative solution algorithms.

  15. Cloud Computing for Complex Performance Codes.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Appel, Gordon John; Hadgu, Teklu; Klein, Brandon Thorin

    This report describes the use of cloud computing services for running complex public domain performance assessment problems. The work consisted of two phases: Phase 1 was to demonstrate complex codes, on several differently configured servers, could run and compute trivial small scale problems in a commercial cloud infrastructure. Phase 2 focused on proving non-trivial large scale problems could be computed in the commercial cloud environment. The cloud computing effort was successfully applied using codes of interest to the geohydrology and nuclear waste disposal modeling community.

  16. Grids, virtualization, and clouds at Fermilab

    DOE PAGES

    Timm, S.; Chadwick, K.; Garzoglio, G.; ...

    2014-06-11

    Fermilab supports a scientific program that includes experiments and scientists located across the globe. To better serve this community, in 2004, the (then) Computing Division undertook the strategy of placing all of the High Throughput Computing (HTC) resources in a Campus Grid known as FermiGrid, supported by common shared services. In 2007, the FermiGrid Services group deployed a service infrastructure that utilized Xen virtualization, LVS network routing and MySQL circular replication to deliver highly available services that offered significant performance, reliability and serviceability improvements. This deployment was further enhanced through the deployment of a distributed redundant network core architecture andmore » the physical distribution of the systems that host the virtual machines across multiple buildings on the Fermilab Campus. In 2010, building on the experience pioneered by FermiGrid in delivering production services in a virtual infrastructure, the Computing Sector commissioned the FermiCloud, General Physics Computing Facility and Virtual Services projects to serve as platforms for support of scientific computing (FermiCloud 6 GPCF) and core computing (Virtual Services). Lastly, this work will present the evolution of the Fermilab Campus Grid, Virtualization and Cloud Computing infrastructure together with plans for the future.« less

  17. Grids, virtualization, and clouds at Fermilab

    NASA Astrophysics Data System (ADS)

    Timm, S.; Chadwick, K.; Garzoglio, G.; Noh, S.

    2014-06-01

    Fermilab supports a scientific program that includes experiments and scientists located across the globe. To better serve this community, in 2004, the (then) Computing Division undertook the strategy of placing all of the High Throughput Computing (HTC) resources in a Campus Grid known as FermiGrid, supported by common shared services. In 2007, the FermiGrid Services group deployed a service infrastructure that utilized Xen virtualization, LVS network routing and MySQL circular replication to deliver highly available services that offered significant performance, reliability and serviceability improvements. This deployment was further enhanced through the deployment of a distributed redundant network core architecture and the physical distribution of the systems that host the virtual machines across multiple buildings on the Fermilab Campus. In 2010, building on the experience pioneered by FermiGrid in delivering production services in a virtual infrastructure, the Computing Sector commissioned the FermiCloud, General Physics Computing Facility and Virtual Services projects to serve as platforms for support of scientific computing (FermiCloud 6 GPCF) and core computing (Virtual Services). This work will present the evolution of the Fermilab Campus Grid, Virtualization and Cloud Computing infrastructure together with plans for the future.

  18. Infrastructure sensing.

    PubMed

    Soga, Kenichi; Schooling, Jennifer

    2016-08-06

    Design, construction, maintenance and upgrading of civil engineering infrastructure requires fresh thinking to minimize use of materials, energy and labour. This can only be achieved by understanding the performance of the infrastructure, both during its construction and throughout its design life, through innovative monitoring. Advances in sensor systems offer intriguing possibilities to radically alter methods of condition assessment and monitoring of infrastructure. In this paper, it is hypothesized that the future of infrastructure relies on smarter information; the rich information obtained from embedded sensors within infrastructure will act as a catalyst for new design, construction, operation and maintenance processes for integrated infrastructure systems linked directly with user behaviour patterns. Some examples of emerging sensor technologies for infrastructure sensing are given. They include distributed fibre-optics sensors, computer vision, wireless sensor networks, low-power micro-electromechanical systems, energy harvesting and citizens as sensors.

  19. Infrastructure sensing

    PubMed Central

    Soga, Kenichi; Schooling, Jennifer

    2016-01-01

    Design, construction, maintenance and upgrading of civil engineering infrastructure requires fresh thinking to minimize use of materials, energy and labour. This can only be achieved by understanding the performance of the infrastructure, both during its construction and throughout its design life, through innovative monitoring. Advances in sensor systems offer intriguing possibilities to radically alter methods of condition assessment and monitoring of infrastructure. In this paper, it is hypothesized that the future of infrastructure relies on smarter information; the rich information obtained from embedded sensors within infrastructure will act as a catalyst for new design, construction, operation and maintenance processes for integrated infrastructure systems linked directly with user behaviour patterns. Some examples of emerging sensor technologies for infrastructure sensing are given. They include distributed fibre-optics sensors, computer vision, wireless sensor networks, low-power micro-electromechanical systems, energy harvesting and citizens as sensors. PMID:27499845

  20. Computational Infrastructure for Engine Structural Performance Simulation

    NASA Technical Reports Server (NTRS)

    Chamis, Christos C.

    1997-01-01

    Select computer codes developed over the years to simulate specific aspects of engine structures are described. These codes include blade impact integrated multidisciplinary analysis and optimization, progressive structural fracture, quantification of uncertainties for structural reliability and risk, benefits estimation of new technology insertion and hierarchical simulation of engine structures made from metal matrix and ceramic matrix composites. Collectively these codes constitute a unique infrastructure readiness to credibly evaluate new and future engine structural concepts throughout the development cycle from initial concept, to design and fabrication, to service performance and maintenance and repairs, and to retirement for cause and even to possible recycling. Stated differently, they provide 'virtual' concurrent engineering for engine structures total-life-cycle-cost.

  1. Autonomic Management of Application Workflows on Hybrid Computing Infrastructure

    DOE PAGES

    Kim, Hyunjoo; el-Khamra, Yaakoub; Rodero, Ivan; ...

    2011-01-01

    In this paper, we present a programming and runtime framework that enables the autonomic management of complex application workflows on hybrid computing infrastructures. The framework is designed to address system and application heterogeneity and dynamics to ensure that application objectives and constraints are satisfied. The need for such autonomic system and application management is becoming critical as computing infrastructures become increasingly heterogeneous, integrating different classes of resources from high-end HPC systems to commodity clusters and clouds. For example, the framework presented in this paper can be used to provision the appropriate mix of resources based on application requirements and constraints.more » The framework also monitors the system/application state and adapts the application and/or resources to respond to changing requirements or environment. To demonstrate the operation of the framework and to evaluate its ability, we employ a workflow used to characterize an oil reservoir executing on a hybrid infrastructure composed of TeraGrid nodes and Amazon EC2 instances of various types. Specifically, we show how different applications objectives such as acceleration, conservation and resilience can be effectively achieved while satisfying deadline and budget constraints, using an appropriate mix of dynamically provisioned resources. Our evaluations also demonstrate that public clouds can be used to complement and reinforce the scheduling and usage of traditional high performance computing infrastructure.« less

  2. Ubiquitous Green Computing Techniques for High Demand Applications in Smart Environments

    PubMed Central

    Zapater, Marina; Sanchez, Cesar; Ayala, Jose L.; Moya, Jose M.; Risco-Martín, José L.

    2012-01-01

    Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time. PMID:23112621

  3. GSDC: A Unique Data Center in Korea for HEP research

    NASA Astrophysics Data System (ADS)

    Ahn, Sang-Un

    2017-04-01

    Global Science experimental Data hub Center (GSDC) at Korea Institute of Science and Technology Information (KISTI) is a unique data center in South Korea established for promoting the fundamental research fields by supporting them with the expertise on Information and Communication Technology (ICT) and the infrastructure for High Performance Computing (HPC), High Throughput Computing (HTC) and Networking. GSDC has supported various research fields in South Korea dealing with the large scale of data, e.g. RENO experiment for neutrino research, LIGO experiment for gravitational wave detection, Genome sequencing project for bio-medical, and HEP experiments such as CDF at FNAL, Belle at KEK, and STAR at BNL. In particular, GSDC has run a Tier-1 center for ALICE experiment using the LHC at CERN since 2013. In this talk, we present the overview on computing infrastructure that GSDC runs for the research fields and we discuss on the data center infrastructure management system deployed at GSDC.

  4. First results from a combined analysis of CERN computing infrastructure metrics

    NASA Astrophysics Data System (ADS)

    Duellmann, Dirk; Nieke, Christian

    2017-10-01

    The IT Analysis Working Group (AWG) has been formed at CERN across individual computing units and the experiments to attempt a cross cutting analysis of computing infrastructure and application metrics. In this presentation we will describe the first results obtained using medium/long term data (1 months — 1 year) correlating box level metrics, job level metrics from LSF and HTCondor, IO metrics from the physics analysis disk pools (EOS) and networking and application level metrics from the experiment dashboards. We will cover in particular the measurement of hardware performance and prediction of job duration, the latency sensitivity of different job types and a search for bottlenecks with the production job mix in the current infrastructure. The presentation will conclude with the proposal of a small set of metrics to simplify drawing conclusions also in the more constrained environment of public cloud deployments.

  5. Ubiquitous green computing techniques for high demand applications in Smart environments.

    PubMed

    Zapater, Marina; Sanchez, Cesar; Ayala, Jose L; Moya, Jose M; Risco-Martín, José L

    2012-01-01

    Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time.

  6. SAME4HPC: A Promising Approach in Building a Scalable and Mobile Environment for High-Performance Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karthik, Rajasekar

    2014-01-01

    In this paper, an architecture for building Scalable And Mobile Environment For High-Performance Computing with spatial capabilities called SAME4HPC is described using cutting-edge technologies and standards such as Node.js, HTML5, ECMAScript 6, and PostgreSQL 9.4. Mobile devices are increasingly becoming powerful enough to run high-performance apps. At the same time, there exist a significant number of low-end and older devices that rely heavily on the server or the cloud infrastructure to do the heavy lifting. Our architecture aims to support both of these types of devices to provide high-performance and rich user experience. A cloud infrastructure consisting of OpenStack withmore » Ubuntu, GeoServer, and high-performance JavaScript frameworks are some of the key open-source and industry standard practices that has been adopted in this architecture.« less

  7. New project to support scientific collaboration electronically

    NASA Astrophysics Data System (ADS)

    Clauer, C. R.; Rasmussen, C. E.; Niciejewski, R. J.; Killeen, T. L.; Kelly, J. D.; Zambre, Y.; Rosenberg, T. J.; Stauning, P.; Friis-Christensen, E.; Mende, S. B.; Weymouth, T. E.; Prakash, A.; McDaniel, S. E.; Olson, G. M.; Finholt, T. A.; Atkins, D. E.

    A new multidisciplinary effort is linking research in the upper atmospheric and space, computer, and behavioral sciences to develop a prototype electronic environment for conducting team science worldwide. A real-world electronic collaboration testbed has been established to support scientific work centered around the experimental operations being conducted with instruments from the Sondrestrom Upper Atmospheric Research Facility in Kangerlussuaq, Greenland. Such group computing environments will become an important component of the National Information Infrastructure initiative, which is envisioned as the high-performance communications infrastructure to support national scientific research.

  8. Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters

    ERIC Educational Resources Information Center

    Younge, Andrew J.

    2016-01-01

    With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for…

  9. The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences

    USDA-ARS?s Scientific Manuscript database

    The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identify management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning m...

  10. The Computing and Data Grid Approach: Infrastructure for Distributed Science Applications

    NASA Technical Reports Server (NTRS)

    Johnston, William E.

    2002-01-01

    With the advent of Grids - infrastructure for using and managing widely distributed computing and data resources in the science environment - there is now an opportunity to provide a standard, large-scale, computing, data, instrument, and collaboration environment for science that spans many different projects and provides the required infrastructure and services in a relatively uniform and supportable way. Grid technology has evolved over the past several years to provide the services and infrastructure needed for building 'virtual' systems and organizations. We argue that Grid technology provides an excellent basis for the creation of the integrated environments that can combine the resources needed to support the large- scale science projects located at multiple laboratories and universities. We present some science case studies that indicate that a paradigm shift in the process of science will come about as a result of Grids providing transparent and secure access to advanced and integrated information and technologies infrastructure: powerful computing systems, large-scale data archives, scientific instruments, and collaboration tools. These changes will be in the form of services that can be integrated with the user's work environment, and that enable uniform and highly capable access to these computers, data, and instruments, regardless of the location or exact nature of these resources. These services will integrate transient-use resources like computing systems, scientific instruments, and data caches (e.g., as they are needed to perform a simulation or analyze data from a single experiment); persistent-use resources. such as databases, data catalogues, and archives, and; collaborators, whose involvement will continue for the lifetime of a project or longer. While we largely address large-scale science in this paper, Grids, particularly when combined with Web Services, will address a broad spectrum of science scenarios. both large and small scale.

  11. A Framework for Debugging Geoscience Projects in a High Performance Computing Environment

    NASA Astrophysics Data System (ADS)

    Baxter, C.; Matott, L.

    2012-12-01

    High performance computing (HPC) infrastructure has become ubiquitous in today's world with the emergence of commercial cloud computing and academic supercomputing centers. Teams of geoscientists, hydrologists and engineers can take advantage of this infrastructure to undertake large research projects - for example, linking one or more site-specific environmental models with soft computing algorithms, such as heuristic global search procedures, to perform parameter estimation and predictive uncertainty analysis, and/or design least-cost remediation systems. However, the size, complexity and distributed nature of these projects can make identifying failures in the associated numerical experiments using conventional ad-hoc approaches both time- consuming and ineffective. To address these problems a multi-tiered debugging framework has been developed. The framework allows for quickly isolating and remedying a number of potential experimental failures, including: failures in the HPC scheduler; bugs in the soft computing code; bugs in the modeling code; and permissions and access control errors. The utility of the framework is demonstrated via application to a series of over 200,000 numerical experiments involving a suite of 5 heuristic global search algorithms and 15 mathematical test functions serving as cheap analogues for the simulation-based optimization of pump-and-treat subsurface remediation systems.

  12. Security and Cloud Outsourcing Framework for Economic Dispatch

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi

    The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less

  13. Security and Cloud Outsourcing Framework for Economic Dispatch

    DOE PAGES

    Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi; ...

    2017-04-24

    The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less

  14. A scalable infrastructure for CMS data analysis based on OpenStack Cloud and Gluster file system

    NASA Astrophysics Data System (ADS)

    Toor, S.; Osmani, L.; Eerola, P.; Kraemer, O.; Lindén, T.; Tarkoma, S.; White, J.

    2014-06-01

    The challenge of providing a resilient and scalable computational and data management solution for massive scale research environments requires continuous exploration of new technologies and techniques. In this project the aim has been to design a scalable and resilient infrastructure for CERN HEP data analysis. The infrastructure is based on OpenStack components for structuring a private Cloud with the Gluster File System. We integrate the state-of-the-art Cloud technologies with the traditional Grid middleware infrastructure. Our test results show that the adopted approach provides a scalable and resilient solution for managing resources without compromising on performance and high availability.

  15. Quantifying the Digital Divide: A Scientific Overview of Network Connectivity and Grid Infrastructure in South Asian Countries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Khan, Shahryar Muhammad; /SLAC /NUST, Rawalpindi; Cottrell, R.Les

    2007-10-30

    The future of Computing in High Energy Physics (HEP) applications depends on both the Network and Grid infrastructure. South Asian countries such as India and Pakistan are making significant progress by building clusters as well as improving their network infrastructure However to facilitate the use of these resources, they need to manage the issues of network connectivity to be among the leading participants in Computing for HEP experiments. In this paper we classify the connectivity for academic and research institutions of South Asia. The quantitative measurements are carried out using the PingER methodology; an approach that induces minimal ICMP trafficmore » to gather active end-to-end network statistics. The PingER project has been measuring the Internet performance for the last decade. Currently the measurement infrastructure comprises of over 700 hosts in more than 130 countries which collectively represents approximately 99% of the world's Internet-connected population. Thus, we are well positioned to characterize the world's connectivity. Here we present the current state of the National Research and Educational Networks (NRENs) and Grid Infrastructure in the South Asian countries and identify the areas of concern. We also present comparisons between South Asia and other developing as well as developed regions. We show that there is a strong correlation between the Network performance and several Human Development indices.« less

  16. Model-as-a-service (MaaS) using the cloud service innovation platform (CSIP)

    USDA-ARS?s Scientific Manuscript database

    Cloud infrastructures for modelling activities such as data processing, performing environmental simulations, or conducting model calibrations/optimizations provide a cost effective alternative to traditional high performance computing approaches. Cloud-based modelling examples emerged into the more...

  17. Grid computing technology for hydrological applications

    NASA Astrophysics Data System (ADS)

    Lecca, G.; Petitdidier, M.; Hluchy, L.; Ivanovic, M.; Kussul, N.; Ray, N.; Thieron, V.

    2011-06-01

    SummaryAdvances in e-Infrastructure promise to revolutionize sensing systems and the way in which data are collected and assimilated, and complex water systems are simulated and visualized. According to the EU Infrastructure 2010 work-programme, data and compute infrastructures and their underlying technologies, either oriented to tackle scientific challenges or complex problem solving in engineering, are expected to converge together into the so-called knowledge infrastructures, leading to a more effective research, education and innovation in the next decade and beyond. Grid technology is recognized as a fundamental component of e-Infrastructures. Nevertheless, this emerging paradigm highlights several topics, including data management, algorithm optimization, security, performance (speed, throughput, bandwidth, etc.), and scientific cooperation and collaboration issues that require further examination to fully exploit it and to better inform future research policies. The paper illustrates the results of six different surface and subsurface hydrology applications that have been deployed on the Grid. All the applications aim to answer to strong requirements from the Civil Society at large, relatively to natural and anthropogenic risks. Grid technology has been successfully tested to improve flood prediction, groundwater resources management and Black Sea hydrological survey, by providing large computing resources. It is also shown that Grid technology facilitates e-cooperation among partners by means of services for authentication and authorization, seamless access to distributed data sources, data protection and access right, and standardization.

  18. A parallel-processing approach to computing for the geographic sciences; applications and systems enhancements

    USGS Publications Warehouse

    Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Liu, Shu-Guang; Nichols, Erin; Haga, Jim; Maddox, Brian; Bilderback, Chris; Feller, Mark; Homer, George

    2001-01-01

    The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost, personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting information science research into parallel computing systems and applications.

  19. Software Infrastructure for Computer-aided Drug Discovery and Development, a Practical Example with Guidelines.

    PubMed

    Moretti, Loris; Sartori, Luca

    2016-09-01

    In the field of Computer-Aided Drug Discovery and Development (CADDD) the proper software infrastructure is essential for everyday investigations. The creation of such an environment should be carefully planned and implemented with certain features in order to be productive and efficient. Here we describe a solution to integrate standard computational services into a functional unit that empowers modelling applications for drug discovery. This system allows users with various level of expertise to run in silico experiments automatically and without the burden of file formatting for different software, managing the actual computation, keeping track of the activities and graphical rendering of the structural outcomes. To showcase the potential of this approach, performances of five different docking programs on an Hiv-1 protease test set are presented. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Integrating multiple scientific computing needs via a Private Cloud infrastructure

    NASA Astrophysics Data System (ADS)

    Bagnasco, S.; Berzano, D.; Brunetti, R.; Lusso, S.; Vallero, S.

    2014-06-01

    In a typical scientific computing centre, diverse applications coexist and share a single physical infrastructure. An underlying Private Cloud facility eases the management and maintenance of heterogeneous use cases such as multipurpose or application-specific batch farms, Grid sites catering to different communities, parallel interactive data analysis facilities and others. It allows to dynamically and efficiently allocate resources to any application and to tailor the virtual machines according to the applications' requirements. Furthermore, the maintenance of large deployments of complex and rapidly evolving middleware and application software is eased by the use of virtual images and contextualization techniques; for example, rolling updates can be performed easily and minimizing the downtime. In this contribution we describe the Private Cloud infrastructure at the INFN-Torino Computer Centre, that hosts a full-fledged WLCG Tier-2 site and a dynamically expandable PROOF-based Interactive Analysis Facility for the ALICE experiment at the CERN LHC and several smaller scientific computing applications. The Private Cloud building blocks include the OpenNebula software stack, the GlusterFS filesystem (used in two different configurations for worker- and service-class hypervisors) and the OpenWRT Linux distribution (used for network virtualization). A future integration into a federated higher-level infrastructure is made possible by exposing commonly used APIs like EC2 and by using mainstream contextualization tools like CloudInit.

  1. UNH Data Cooperative: A Cyber Infrastructure for Earth System Studies

    NASA Astrophysics Data System (ADS)

    Braswell, B. H.; Fekete, B. M.; Prusevich, A.; Gliden, S.; Magill, A.; Vorosmarty, C. J.

    2007-12-01

    Earth system scientists and managers have a continuously growing demand for a wide array of earth observations derived from various data sources including (a) modern satellite retrievals, (b) "in-situ" records, (c) various simulation outputs, and (d) assimilated data products combining model results with observational records. The sheer quantity of data, and formatting inconsistencies make it difficult for users to take full advantage of this important information resource. Thus the system could benefit from a thorough retooling of our current data processing procedures and infrastructure. Emerging technologies, like OPeNDAP and OGC map services, open standard data formats (NetCDF, HDF) data cataloging systems (NASA-Echo, Global Change Master Directory, etc.) are providing the basis for a new approach in data management and processing, where web- services are increasingly designed to serve computer-to-computer communications without human interactions and complex analysis can be carried out over distributed computer resources interconnected via cyber infrastructure. The UNH Earth System Data Collaborative is designed to utilize the aforementioned emerging web technologies to offer new means of access to earth system data. While the UNH Data Collaborative serves a wide array of data ranging from weather station data (Climate Portal) to ocean buoy records and ship tracks (Portsmouth Harbor Initiative) to land cover characteristics, etc. the underlaying data architecture shares common components for data mining and data dissemination via web-services. Perhaps the most unique element of the UNH Data Cooperative's IT infrastructure is its prototype modeling environment for regional ecosystem surveillance over the Northeast corridor, which allows the integration of complex earth system model components with the Cooperative's data services. While the complexity of the IT infrastructure to perform complex computations is continuously increasing, scientists are often forced to spend considerable amount of time to solve basic data management and preprocessing tasks and deal with low level computational design problems like parallelization of model codes. Our modeling infrastructure is designed to take care the bulk of the common tasks found in complex earth system models like I/O handling, computational domain and time management, parallel execution of the modeling tasks, etc. The modeling infrastructure allows scientists to focus on the numerical implementation of the physical processes on a single computational objects(typically grid cells) while the framework takes care of the preprocessing of input data, establishing of the data exchange between computation objects and the execution of the science code. In our presentation, we will discuss the key concepts of our modeling infrastructure. We will demonstrate integration of our modeling framework with data services offered by the UNH Earth System Data Collaborative via web interfaces. We will layout the road map to turn our prototype modeling environment into a truly community framework for wide range of earth system scientists and environmental managers.

  2. Dynamic Collaboration Infrastructure for Hydrologic Science

    NASA Astrophysics Data System (ADS)

    Tarboton, D. G.; Idaszak, R.; Castillo, C.; Yi, H.; Jiang, F.; Jones, N.; Goodall, J. L.

    2016-12-01

    Data and modeling infrastructure is becoming increasingly accessible to water scientists. HydroShare is a collaborative environment that currently offers water scientists the ability to access modeling and data infrastructure in support of data intensive modeling and analysis. It supports the sharing of and collaboration around "resources" which are social objects defined to include both data and models in a structured standardized format. Users collaborate around these objects via comments, ratings, and groups. HydroShare also supports web services and cloud based computation for the execution of hydrologic models and analysis and visualization of hydrologic data. However, the quantity and variety of data and modeling infrastructure available that can be accessed from environments like HydroShare is increasing. Storage infrastructure can range from one's local PC to campus or organizational storage to storage in the cloud. Modeling or computing infrastructure can range from one's desktop to departmental clusters to national HPC resources to grid and cloud computing resources. How does one orchestrate this vast number of data and computing infrastructure without needing to correspondingly learn each new system? A common limitation across these systems is the lack of efficient integration between data transport mechanisms and the corresponding high-level services to support large distributed data and compute operations. A scientist running a hydrology model from their desktop may require processing a large collection of files across the aforementioned storage and compute resources and various national databases. To address these community challenges a proof-of-concept prototype was created integrating HydroShare with RADII (Resource Aware Data-centric collaboration Infrastructure) to provide software infrastructure to enable the comprehensive and rapid dynamic deployment of what we refer to as "collaborative infrastructure." In this presentation we discuss the results of this proof-of-concept prototype which enabled HydroShare users to readily instantiate virtual infrastructure marshaling arbitrary combinations, varieties, and quantities of distributed data and computing infrastructure in addressing big problems in hydrology.

  3. A Cross-Platform Infrastructure for Scalable Runtime Application Performance Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jack Dongarra; Shirley Moore; Bart Miller, Jeffrey Hollingsworth

    2005-03-15

    The purpose of this project was to build an extensible cross-platform infrastructure to facilitate the development of accurate and portable performance analysis tools for current and future high performance computing (HPC) architectures. Major accomplishments include tools and techniques for multidimensional performance analysis, as well as improved support for dynamic performance monitoring of multithreaded and multiprocess applications. Previous performance tool development has been limited by the burden of having to re-write a platform-dependent low-level substrate for each architecture/operating system pair in order to obtain the necessary performance data from the system. Manual interpretation of performance data is not scalable for large-scalemore » long-running applications. The infrastructure developed by this project provides a foundation for building portable and scalable performance analysis tools, with the end goal being to provide application developers with the information they need to analyze, understand, and tune the performance of terascale applications on HPC architectures. The backend portion of the infrastructure provides runtime instrumentation capability and access to hardware performance counters, with thread-safety for shared memory environments and a communication substrate to support instrumentation of multiprocess and distributed programs. Front end interfaces provides tool developers with a well-defined, platform-independent set of calls for requesting performance data. End-user tools have been developed that demonstrate runtime data collection, on-line and off-line analysis of performance data, and multidimensional performance analysis. The infrastructure is based on two underlying performance instrumentation technologies. These technologies are the PAPI cross-platform library interface to hardware performance counters and the cross-platform Dyninst library interface for runtime modification of executable images. The Paradyn and KOJAK projects have made use of this infrastructure to build performance measurement and analysis tools that scale to long-running programs on large parallel and distributed systems and that automate much of the search for performance bottlenecks.« less

  4. Characterizing Crowd Participation and Productivity of Foldit Through Web Scraping

    DTIC Science & Technology

    2016-03-01

    Berkeley Open Infrastructure for Network Computing CDF Cumulative Distribution Function CPU Central Processing Unit CSSG Crowdsourced Serious Game...computers at once can create a similar capacity. According to Anderson [6], principal investigator for the Berkeley Open Infrastructure for Network...extraterrestrial life. From this project, a software-based distributed computing platform called the Berkeley Open Infrastructure for Network Computing

  5. Complete distributed computing environment for a HEP experiment: experience with ARC-connected infrastructure for ATLAS

    NASA Astrophysics Data System (ADS)

    Read, A.; Taga, A.; O-Saada, F.; Pajchel, K.; Samset, B. H.; Cameron, D.

    2008-07-01

    Computing and storage resources connected by the Nordugrid ARC middleware in the Nordic countries, Switzerland and Slovenia are a part of the ATLAS computing Grid. This infrastructure is being commissioned with the ongoing ATLAS Monte Carlo simulation production in preparation for the commencement of data taking in 2008. The unique non-intrusive architecture of ARC, its straightforward interplay with the ATLAS Production System via the Dulcinea executor, and its performance during the commissioning exercise is described. ARC support for flexible and powerful end-user analysis within the GANGA distributed analysis framework is also shown. Whereas the storage solution for this Grid was earlier based on a large, distributed collection of GridFTP-servers, the ATLAS computing design includes a structured SRM-based system with a limited number of storage endpoints. The characteristics, integration and performance of the old and new storage solutions are presented. Although the hardware resources in this Grid are quite modest, it has provided more than double the agreed contribution to the ATLAS production with an efficiency above 95% during long periods of stable operation.

  6. A Screen Space GPGPU Surface LIC Algorithm for Distributed Memory Data Parallel Sort Last Rendering Infrastructures

    NASA Astrophysics Data System (ADS)

    Loring, B.; Karimabadi, H.; Rortershteyn, V.

    2015-10-01

    The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not. We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.

  7. A Screen Space GPGPU Surface LIC Algorithm for Distributed Memory Data Parallel Sort Last Rendering Infrastructures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Loring, Burlen; Karimabadi, Homa; Rortershteyn, Vadim

    2014-07-01

    The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not.more » We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.« less

  8. LEMON - LHC Era Monitoring for Large-Scale Infrastructures

    NASA Astrophysics Data System (ADS)

    Marian, Babik; Ivan, Fedorko; Nicholas, Hook; Hector, Lansdale Thomas; Daniel, Lenkes; Miroslav, Siket; Denis, Waldron

    2011-12-01

    At the present time computer centres are facing a massive rise in virtualization and cloud computing as these solutions bring advantages to service providers and consolidate the computer centre resources. However, as a result the monitoring complexity is increasing. Computer centre management requires not only to monitor servers, network equipment and associated software but also to collect additional environment and facilities data (e.g. temperature, power consumption, cooling efficiency, etc.) to have also a good overview of the infrastructure performance. The LHC Era Monitoring (Lemon) system is addressing these requirements for a very large scale infrastructure. The Lemon agent that collects data on every client and forwards the samples to the central measurement repository provides a flexible interface that allows rapid development of new sensors. The system allows also to report on behalf of remote devices such as switches and power supplies. Online and historical data can be visualized via a web-based interface or retrieved via command-line tools. The Lemon Alarm System component can be used for notifying the operator about error situations. In this article, an overview of the Lemon monitoring is provided together with a description of the CERN LEMON production instance. No direct comparison is made with other monitoring tool.

  9. @neurIST: infrastructure for advanced disease management through integration of heterogeneous data, computing, and complex processing services.

    PubMed

    Benkner, Siegfried; Arbona, Antonio; Berti, Guntram; Chiarini, Alessandro; Dunlop, Robert; Engelbrecht, Gerhard; Frangi, Alejandro F; Friedrich, Christoph M; Hanser, Susanne; Hasselmeyer, Peer; Hose, Rod D; Iavindrasana, Jimison; Köhler, Martin; Iacono, Luigi Lo; Lonsdale, Guy; Meyer, Rodolphe; Moore, Bob; Rajasekaran, Hariharan; Summers, Paul E; Wöhrer, Alexander; Wood, Steven

    2010-11-01

    The increasing volume of data describing human disease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the @neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system's architecture is generic enough that it could be adapted to the treatment of other diseases. Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers clinicians the tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medical researchers gain access to a critical mass of aneurysm related data due to the system's ability to federate distributed information sources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access and work on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand for performing computationally intensive simulations for treatment planning and research.

  10. Design and implementation of a reliable and cost-effective cloud computing infrastructure: the INFN Napoli experience

    NASA Astrophysics Data System (ADS)

    Capone, V.; Esposito, R.; Pardi, S.; Taurino, F.; Tortone, G.

    2012-12-01

    Over the last few years we have seen an increasing number of services and applications needed to manage and maintain cloud computing facilities. This is particularly true for computing in high energy physics, which often requires complex configurations and distributed infrastructures. In this scenario a cost effective rationalization and consolidation strategy is the key to success in terms of scalability and reliability. In this work we describe an IaaS (Infrastructure as a Service) cloud computing system, with high availability and redundancy features, which is currently in production at INFN-Naples and ATLAS Tier-2 data centre. The main goal we intended to achieve was a simplified method to manage our computing resources and deliver reliable user services, reusing existing hardware without incurring heavy costs. A combined usage of virtualization and clustering technologies allowed us to consolidate our services on a small number of physical machines, reducing electric power costs. As a result of our efforts we developed a complete solution for data and computing centres that can be easily replicated using commodity hardware. Our architecture consists of 2 main subsystems: a clustered storage solution, built on top of disk servers running GlusterFS file system, and a virtual machines execution environment. GlusterFS is a network file system able to perform parallel writes on multiple disk servers, providing this way live replication of data. High availability is also achieved via a network configuration using redundant switches and multiple paths between hypervisor hosts and disk servers. We also developed a set of management scripts to easily perform basic system administration tasks such as automatic deployment of new virtual machines, adaptive scheduling of virtual machines on hypervisor hosts, live migration and automated restart in case of hypervisor failures.

  11. The Next Generation of Lab and Classroom Computing - The Silver Lining

    DTIC Science & Technology

    2016-12-01

    desktop infrastructure (VDI) solution, as well as the computing solutions at three universities, was selected as the basis for comparison. The research... infrastructure , VDI, hardware cost, software cost, manpower, availability, cloud computing, private cloud, bring your own device, BYOD, thin client...virtual desktop infrastructure (VDI) solution, as well as the computing solutions at three universities, was selected as the basis for comparison. The

  12. SBSI: an extensible distributed software infrastructure for parameter estimation in systems biology.

    PubMed

    Adams, Richard; Clark, Allan; Yamaguchi, Azusa; Hanlon, Neil; Tsorman, Nikos; Ali, Shakir; Lebedeva, Galina; Goltsov, Alexey; Sorokin, Anatoly; Akman, Ozgur E; Troein, Carl; Millar, Andrew J; Goryanin, Igor; Gilmore, Stephen

    2013-03-01

    Complex computational experiments in Systems Biology, such as fitting model parameters to experimental data, can be challenging to perform. Not only do they frequently require a high level of computational power, but the software needed to run the experiment needs to be usable by scientists with varying levels of computational expertise, and modellers need to be able to obtain up-to-date experimental data resources easily. We have developed a software suite, the Systems Biology Software Infrastructure (SBSI), to facilitate the parameter-fitting process. SBSI is a modular software suite composed of three major components: SBSINumerics, a high-performance library containing parallelized algorithms for performing parameter fitting; SBSIDispatcher, a middleware application to track experiments and submit jobs to back-end servers; and SBSIVisual, an extensible client application used to configure optimization experiments and view results. Furthermore, we have created a plugin infrastructure to enable project-specific modules to be easily installed. Plugin developers can take advantage of the existing user-interface and application framework to customize SBSI for their own uses, facilitated by SBSI's use of standard data formats. All SBSI binaries and source-code are freely available from http://sourceforge.net/projects/sbsi under an Apache 2 open-source license. The server-side SBSINumerics runs on any Unix-based operating system; both SBSIVisual and SBSIDispatcher are written in Java and are platform independent, allowing use on Windows, Linux and Mac OS X. The SBSI project website at http://www.sbsi.ed.ac.uk provides documentation and tutorials.

  13. A Queue Simulation Tool for a High Performance Scientific Computing Center

    NASA Technical Reports Server (NTRS)

    Spear, Carrie; McGalliard, James

    2007-01-01

    The NASA Center for Computational Sciences (NCCS) at the Goddard Space Flight Center provides high performance highly parallel processors, mass storage, and supporting infrastructure to a community of computational Earth and space scientists. Long running (days) and highly parallel (hundreds of CPUs) jobs are common in the workload. NCCS management structures batch queues and allocates resources to optimize system use and prioritize workloads. NCCS technical staff use a locally developed discrete event simulation tool to model the impacts of evolving workloads, potential system upgrades, alternative queue structures and resource allocation policies.

  14. Modeling, Simulation and Analysis of Public Key Infrastructure

    NASA Technical Reports Server (NTRS)

    Liu, Yuan-Kwei; Tuey, Richard; Ma, Paul (Technical Monitor)

    1998-01-01

    Security is an essential part of network communication. The advances in cryptography have provided solutions to many of the network security requirements. Public Key Infrastructure (PKI) is the foundation of the cryptography applications. The main objective of this research is to design a model to simulate a reliable, scalable, manageable, and high-performance public key infrastructure. We build a model to simulate the NASA public key infrastructure by using SimProcess and MatLab Software. The simulation is from top level all the way down to the computation needed for encryption, decryption, digital signature, and secure web server. The application of secure web server could be utilized in wireless communications. The results of the simulation are analyzed and confirmed by using queueing theory.

  15. The International Symposium on Grids and Clouds

    NASA Astrophysics Data System (ADS)

    The International Symposium on Grids and Clouds (ISGC) 2012 will be held at Academia Sinica in Taipei from 26 February to 2 March 2012, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC). 2012 is the decennium anniversary of the ISGC which over the last decade has tracked the convergence, collaboration and innovation of individual researchers across the Asia Pacific region to a coherent community. With the continuous support and dedication from the delegates, ISGC has provided the primary international distributed computing platform where distinguished researchers and collaboration partners from around the world share their knowledge and experiences. The last decade has seen the wide-scale emergence of e-Infrastructure as a critical asset for the modern e-Scientist. The emergence of large-scale research infrastructures and instruments that has produced a torrent of electronic data is forcing a generational change in the scientific process and the mechanisms used to analyse the resulting data deluge. No longer can the processing of these vast amounts of data and production of relevant scientific results be undertaken by a single scientist. Virtual Research Communities that span organisations around the world, through an integrated digital infrastructure that connects the trust and administrative domains of multiple resource providers, have become critical in supporting these analyses. Topics covered in ISGC 2012 include: High Energy Physics, Biomedicine & Life Sciences, Earth Science, Environmental Changes and Natural Disaster Mitigation, Humanities & Social Sciences, Operations & Management, Middleware & Interoperability, Security and Networking, Infrastructure Clouds & Virtualisation, Business Models & Sustainability, Data Management, Distributed Volunteer & Desktop Grid Computing, High Throughput Computing, and High Performance, Manycore & GPU Computing.

  16. PRACE - The European HPC Infrastructure

    NASA Astrophysics Data System (ADS)

    Stadelmeyer, Peter

    2014-05-01

    The mission of PRACE (Partnership for Advanced Computing in Europe) is to enable high impact scientific discovery and engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. PRACE seeks to realize this mission by offering world class computing and data management resources and services through a peer review process. This talk gives a general overview about PRACE and the PRACE research infrastructure (RI). PRACE is established as an international not-for-profit association and the PRACE RI is a pan-European supercomputing infrastructure which offers access to computing and data management resources at partner sites distributed throughout Europe. Besides a short summary about the organization, history, and activities of PRACE, it is explained how scientists and researchers from academia and industry from around the world can access PRACE systems and which education and training activities are offered by PRACE. The overview also contains a selection of PRACE contributions to societal challenges and ongoing activities. Examples of the latter are beside others petascaling, application benchmark suite, best practice guides for efficient use of key architectures, application enabling / scaling, new programming models, and industrial applications. The Partnership for Advanced Computing in Europe (PRACE) is an international non-profit association with its seat in Brussels. The PRACE Research Infrastructure provides a persistent world-class high performance computing service for scientists and researchers from academia and industry in Europe. The computer systems and their operations accessible through PRACE are provided by 4 PRACE members (BSC representing Spain, CINECA representing Italy, GCS representing Germany and GENCI representing France). The Implementation Phase of PRACE receives funding from the EU's Seventh Framework Programme (FP7/2007-2013) under grant agreements RI-261557, RI-283493 and RI-312763. For more information, see www.prace-ri.eu

  17. The dependence of educational infrastructure on clinical infrastructure.

    PubMed Central

    Cimino, C.

    1998-01-01

    The Albert Einstein College of Medicine needed to assess the growth of its infrastructure for educational computing as a first step to determining if student needs were being met. Included in computing infrastructure are space, equipment, software, and computing services. The infrastructure was assessed by reviewing purchasing and support logs for a six year period from 1992 to 1998. This included equipment, software, and e-mail accounts provided to students and to faculty for educational purposes. Student space has grown at a constant rate (averaging 14% increase each year respectively). Student equipment on campus has grown by a constant amount each year (average 8.3 computers each year). Student infrastructure off campus and educational support of faculty has not kept pace. It has either declined or remained level over the six year period. The availability of electronic mail clearly demonstrates this with accounts being used by 99% of students, 78% of Basic Science Course Leaders, 38% of Clerkship Directors, 18% of Clerkship Site Directors, and 8% of Clinical Elective Directors. The collection of the initial descriptive infrastructure data has revealed problems that may generalize to other medical schools. The discrepancy between infrastructure available to students and faculty on campus and students and faculty off campus creates a setting where students perceive a paradoxical declining support for computer use as they progress through medical school. While clinical infrastructure may be growing, it is at the expense of educational infrastructure at affiliate hospitals. PMID:9929262

  18. Advanced Computational Methods for Optimization of Non-Periodic Inspection Intervals for Aging Infrastructure

    DTIC Science & Technology

    2017-01-05

    AFRL-AFOSR-JP-TR-2017-0002 Advanced Computational Methods for Optimization of Non-Periodic Inspection Intervals for Aging Infrastructure Manabu...Computational Methods for Optimization of Non-Periodic Inspection Intervals for Aging Infrastructure 5a.  CONTRACT NUMBER 5b.  GRANT NUMBER FA2386...UNLIMITED: PB Public Release 13. SUPPLEMENTARY NOTES 14. ABSTRACT This report for the project titled ’Advanced Computational Methods for Optimization of

  19. Resilient workflows for computational mechanics platforms

    NASA Astrophysics Data System (ADS)

    Nguyên, Toàn; Trifan, Laurentiu; Désidéri, Jean-Antoine

    2010-06-01

    Workflow management systems have recently been the focus of much interest and many research and deployment for scientific applications worldwide [26, 27]. Their ability to abstract the applications by wrapping application codes have also stressed the usefulness of such systems for multidiscipline applications [23, 24]. When complex applications need to provide seamless interfaces hiding the technicalities of the computing infrastructures, their high-level modeling, monitoring and execution functionalities help giving production teams seamless and effective facilities [25, 31, 33]. Software integration infrastructures based on programming paradigms such as Python, Mathlab and Scilab have also provided evidence of the usefulness of such approaches for the tight coupling of multidisciplne application codes [22, 24]. Also high-performance computing based on multi-core multi-cluster infrastructures open new opportunities for more accurate, more extensive and effective robust multi-discipline simulations for the decades to come [28]. This supports the goal of full flight dynamics simulation for 3D aircraft models within the next decade, opening the way to virtual flight-tests and certification of aircraft in the future [23, 24, 29].

  20. A simple grid implementation with Berkeley Open Infrastructure for Network Computing using BLAST as a model

    PubMed Central

    Pinthong, Watthanai; Muangruen, Panya

    2016-01-01

    Development of high-throughput technologies, such as Next-generation sequencing, allows thousands of experiments to be performed simultaneously while reducing resource requirement. Consequently, a massive amount of experiment data is now rapidly generated. Nevertheless, the data are not readily usable or meaningful until they are further analysed and interpreted. Due to the size of the data, a high performance computer (HPC) is required for the analysis and interpretation. However, the HPC is expensive and difficult to access. Other means were developed to allow researchers to acquire the power of HPC without a need to purchase and maintain one such as cloud computing services and grid computing system. In this study, we implemented grid computing in a computer training center environment using Berkeley Open Infrastructure for Network Computing (BOINC) as a job distributor and data manager combining all desktop computers to virtualize the HPC. Fifty desktop computers were used for setting up a grid system during the off-hours. In order to test the performance of the grid system, we adapted the Basic Local Alignment Search Tools (BLAST) to the BOINC system. Sequencing results from Illumina platform were aligned to the human genome database by BLAST on the grid system. The result and processing time were compared to those from a single desktop computer and HPC. The estimated durations of BLAST analysis for 4 million sequence reads on a desktop PC, HPC and the grid system were 568, 24 and 5 days, respectively. Thus, the grid implementation of BLAST by BOINC is an efficient alternative to the HPC for sequence alignment. The grid implementation by BOINC also helped tap unused computing resources during the off-hours and could be easily modified for other available bioinformatics software. PMID:27547555

  1. Research infrastructure support to address ecosystem dynamics

    NASA Astrophysics Data System (ADS)

    Los, Wouter

    2014-05-01

    Predicting the evolution of ecosystems to climate change or human pressures is a challenge. Even understanding past or current processes is complicated as a result of the many interactions and feedbacks that occur within and between components of the system. This talk will present an example of current research on changes in landscape evolution, hydrology, soil biogeochemical processes, zoological food webs, and plant community succession, and how these affect feedbacks to components of the systems, including the climate system. Multiple observations, experiments, and simulations provide a wealth of data, but not necessarily understanding. Model development on the coupled processes on different spatial and temporal scales is sensitive for variations in data and of parameter change. Fast high performance computing may help to visualize the effect of these changes and the potential stability (and reliability) of the models. This may than allow for iteration between data production and models towards stable models reducing uncertainty and improving the prediction of change. The role of research infrastructures becomes crucial is overcoming barriers for such research. Environmental infrastructures are covering physical site facilities, dedicated instrumentation and e-infrastructure. The LifeWatch infrastructure for biodiversity and ecosystem research will provide services for data integration, analysis and modeling. But it has to cooperate intensively with the other kinds of infrastructures in order to support the iteration between data production and model computation. The cooperation in the ENVRI project (Common operations of environmental research infrastructures) is one of the initiatives to foster such multidisciplinary research.

  2. Controlling Infrastructure Costs: Right-Sizing the Mission Control Facility

    NASA Technical Reports Server (NTRS)

    Martin, Keith; Sen-Roy, Michael; Heiman, Jennifer

    2009-01-01

    Johnson Space Center's Mission Control Center is a space vehicle, space program agnostic facility. The current operational design is essentially identical to the original facility architecture that was developed and deployed in the mid-90's. In an effort to streamline the support costs of the mission critical facility, the Mission Operations Division (MOD) of Johnson Space Center (JSC) has sponsored an exploratory project to evaluate and inject current state-of-the-practice Information Technology (IT) tools, processes and technology into legacy operations. The general push in the IT industry has been trending towards a data-centric computer infrastructure for the past several years. Organizations facing challenges with facility operations costs are turning to creative solutions combining hardware consolidation, virtualization and remote access to meet and exceed performance, security, and availability requirements. The Operations Technology Facility (OTF) organization at the Johnson Space Center has been chartered to build and evaluate a parallel Mission Control infrastructure, replacing the existing, thick-client distributed computing model and network architecture with a data center model utilizing virtualization to provide the MCC Infrastructure as a Service. The OTF will design a replacement architecture for the Mission Control Facility, leveraging hardware consolidation through the use of blade servers, increasing utilization rates for compute platforms through virtualization while expanding connectivity options through the deployment of secure remote access. The architecture demonstrates the maturity of the technologies generally available in industry today and the ability to successfully abstract the tightly coupled relationship between thick-client software and legacy hardware into a hardware agnostic "Infrastructure as a Service" capability that can scale to meet future requirements of new space programs and spacecraft. This paper discusses the benefits and difficulties that a migration to cloud-based computing philosophies has uncovered when compared to the legacy Mission Control Center architecture. The team consists of system and software engineers with extensive experience with the MCC infrastructure and software currently used to support the International Space Station (ISS) and Space Shuttle program (SSP).

  3. Global information infrastructure.

    PubMed

    Lindberg, D A

    1994-01-01

    The High Performance Computing and Communications Program (HPCC) is a multiagency federal initiative under the leadership of the White House Office of Science and Technology Policy, established by the High Performance Computing Act of 1991. It has been assigned a critical role in supporting the international collaboration essential to science and to health care. Goals of the HPCC are to extend USA leadership in high performance computing and networking technologies; to improve technology transfer for economic competitiveness, education, and national security; and to provide a key part of the foundation for the National Information Infrastructure. The first component of the National Institutes of Health to participate in the HPCC, the National Library of Medicine (NLM), recently issued a solicitation for proposals to address a range of issues, from privacy to 'testbed' networks, 'virtual reality,' and more. These efforts will build upon the NLM's extensive outreach program and other initiatives, including the Unified Medical Language System (UMLS), MEDLARS, and Grateful Med. New Internet search tools are emerging, such as Gopher and 'Knowbots'. Medicine will succeed in developing future intelligent agents to assist in utilizing computer networks. Our ability to serve patients is so often restricted by lack of information and knowledge at the time and place of medical decision-making. The new technologies, properly employed, will also greatly enhance our ability to serve the patient.

  4. A Development of Lightweight Grid Interface

    NASA Astrophysics Data System (ADS)

    Iwai, G.; Kawai, Y.; Sasaki, T.; Watase, Y.

    2011-12-01

    In order to help a rapid development of Grid/Cloud aware applications, we have developed API to abstract the distributed computing infrastructures based on SAGA (A Simple API for Grid Applications). SAGA, which is standardized in the OGF (Open Grid Forum), defines API specifications to access distributed computing infrastructures, such as Grid, Cloud and local computing resources. The Universal Grid API (UGAPI), which is a set of command line interfaces (CLI) and APIs, aims to offer simpler API to combine several SAGA interfaces with richer functionalities. These CLIs of the UGAPI offer typical functionalities required by end users for job management and file access to the different distributed computing infrastructures as well as local computing resources. We have also built a web interface for the particle therapy simulation and demonstrated the large scale calculation using the different infrastructures at the same time. In this paper, we would like to present how the web interface based on UGAPI and SAGA achieve more efficient utilization of computing resources over the different infrastructures with technical details and practical experiences.

  5. SCALING AN URBAN EMERGENCY EVACUATION FRAMEWORK: CHALLENGES AND PRACTICES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karthik, Rajasekar; Lu, Wei

    2014-01-01

    Critical infrastructure disruption, caused by severe weather events, natural disasters, terrorist attacks, etc., has significant impacts on urban transportation systems. We built a computational framework to simulate urban transportation systems under critical infrastructure disruption in order to aid real-time emergency evacuation. This framework will use large scale datasets to provide a scalable tool for emergency planning and management. Our framework, World-Wide Emergency Evacuation (WWEE), integrates population distribution and urban infrastructure networks to model travel demand in emergency situations at global level. Also, a computational model of agent-based traffic simulation is used to provide an optimal evacuation plan for traffic operationmore » purpose [1]. In addition, our framework provides a web-based high resolution visualization tool for emergency evacuation modelers and practitioners. We have successfully tested our framework with scenarios in both United States (Alexandria, VA) and Europe (Berlin, Germany) [2]. However, there are still some major drawbacks for scaling this framework to handle big data workloads in real time. On our back-end, lack of proper infrastructure limits us in ability to process large amounts of data, run the simulation efficiently and quickly, and provide fast retrieval and serving of data. On the front-end, the visualization performance of microscopic evacuation results is still not efficient enough due to high volume data communication between server and client. We are addressing these drawbacks by using cloud computing and next-generation web technologies, namely Node.js, NoSQL, WebGL, Open Layers 3 and HTML5 technologies. We will describe briefly about each one and how we are using and leveraging these technologies to provide an efficient tool for emergency management organizations. Our early experimentation demonstrates that using above technologies is a promising approach to build a scalable and high performance urban emergency evacuation framework that can improve traffic mobility and safety under critical infrastructure disruption in today s socially connected world.« less

  6. Accelerating epistasis analysis in human genetics with consumer graphics hardware.

    PubMed

    Sinnott-Armstrong, Nicholas A; Greene, Casey S; Cancare, Fabio; Moore, Jason H

    2009-07-24

    Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500. Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.

  7. Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds

    NASA Astrophysics Data System (ADS)

    Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.

    In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.

  8. e-Infrastructures for Astronomy: An Integrated View

    NASA Astrophysics Data System (ADS)

    Pasian, F.; Longo, G.

    2010-12-01

    As for other disciplines, the capability of performing “Big Science” in astrophysics requires the availability of large facilities. In the field of ICT, computational resources (e.g. HPC) are important, but are far from being enough for the community: as a matter of fact, the whole set of e-infrastructures (network, computing nodes, data repositories, applications) need to work in an interoperable way. This implies the development of common (or at least compatible) user interfaces to computing resources, transparent access to observations and numerical simulations through the Virtual Observatory, integrated data processing pipelines, data mining and semantic web applications. Achieving this interoperability goal is a must to build a real “Knowledge Infrastructure” in the astrophysical domain. Also, the emergence of new professional profiles (e.g. the “astro-informatician”) is necessary to allow defining and implementing properly this conceptual schema.

  9. SWMM LID Module Validation Study

    EPA Science Inventory

    EPA’s Storm Water Management Model (SWMM) is a computational code heavily relied upon by industry for the simulation of wastewater and stormwater infrastructure performance. Many municipalities are relying on SWMM results to design multi-billion-dollar, multi-decade infrastructu...

  10. A bioinformatics knowledge discovery in text application for grid computing

    PubMed Central

    Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco

    2009-01-01

    Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities. PMID:19534749

  11. A bioinformatics knowledge discovery in text application for grid computing.

    PubMed

    Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco

    2009-06-16

    A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.

  12. Developing an Index to Measure Health System Performance: Measurement for Districts of Nepal.

    PubMed

    Kandel, N; Fric, A; Lamichhane, J

    2014-01-01

    Various frameworks for measuring health system performance have been proposed and discussed. The scope of using performance indicators are broad, ranging from examining national health system to individual patients at various levels of health system. Development of innovative and easy index is essential to measure multidimensionality of health systems. We used indicators, which also serve as proxy to the set of activities, whose primary goal is to maintain and improve health. We used eleven indicators of MDGs, which represent all dimensions of health to develop index. These indicators are computed with similar methodology that of human development index. We used published data of Nepal for computation of the index for districts of Nepal as an illustration. To validate our finding, we compared the indices of these districts with other development indices of Nepal. An index for each district has been computed from eleven indicators. Then indices are compared with that of human development index, socio-economic and infrastructure development indices and findings has shown the similarity on distribution of districts. Categories of low and high performing districts on health system performance are also having low and high human development, socio-economic, and infrastructure indices respectively. This methodology of computing index from various indicators could assist policy makers and program managers to prioritize activities based on their performance. Validation of the findings with that of other development indicators show that this can be one of the tools, which can assist on assessing health system performance for policy makers, program managers and others.

  13. Heads in the Cloud: A Primer on Neuroimaging Applications of High Performance Computing.

    PubMed

    Shatil, Anwar S; Younas, Sohail; Pourreza, Hossein; Figley, Chase R

    2015-01-01

    With larger data sets and more sophisticated analyses, it is becoming increasingly common for neuroimaging researchers to push (or exceed) the limitations of standalone computer workstations. Nonetheless, although high-performance computing platforms such as clusters, grids and clouds are already in routine use by a small handful of neuroimaging researchers to increase their storage and/or computational power, the adoption of such resources by the broader neuroimaging community remains relatively uncommon. Therefore, the goal of the current manuscript is to: 1) inform prospective users about the similarities and differences between computing clusters, grids and clouds; 2) highlight their main advantages; 3) discuss when it may (and may not) be advisable to use them; 4) review some of their potential problems and barriers to access; and finally 5) give a few practical suggestions for how interested new users can start analyzing their neuroimaging data using cloud resources. Although the aim of cloud computing is to hide most of the complexity of the infrastructure management from end-users, we recognize that this can still be an intimidating area for cognitive neuroscientists, psychologists, neurologists, radiologists, and other neuroimaging researchers lacking a strong computational background. Therefore, with this in mind, we have aimed to provide a basic introduction to cloud computing in general (including some of the basic terminology, computer architectures, infrastructure and service models, etc.), a practical overview of the benefits and drawbacks, and a specific focus on how cloud resources can be used for various neuroimaging applications.

  14. Heads in the Cloud: A Primer on Neuroimaging Applications of High Performance Computing

    PubMed Central

    Shatil, Anwar S.; Younas, Sohail; Pourreza, Hossein; Figley, Chase R.

    2015-01-01

    With larger data sets and more sophisticated analyses, it is becoming increasingly common for neuroimaging researchers to push (or exceed) the limitations of standalone computer workstations. Nonetheless, although high-performance computing platforms such as clusters, grids and clouds are already in routine use by a small handful of neuroimaging researchers to increase their storage and/or computational power, the adoption of such resources by the broader neuroimaging community remains relatively uncommon. Therefore, the goal of the current manuscript is to: 1) inform prospective users about the similarities and differences between computing clusters, grids and clouds; 2) highlight their main advantages; 3) discuss when it may (and may not) be advisable to use them; 4) review some of their potential problems and barriers to access; and finally 5) give a few practical suggestions for how interested new users can start analyzing their neuroimaging data using cloud resources. Although the aim of cloud computing is to hide most of the complexity of the infrastructure management from end-users, we recognize that this can still be an intimidating area for cognitive neuroscientists, psychologists, neurologists, radiologists, and other neuroimaging researchers lacking a strong computational background. Therefore, with this in mind, we have aimed to provide a basic introduction to cloud computing in general (including some of the basic terminology, computer architectures, infrastructure and service models, etc.), a practical overview of the benefits and drawbacks, and a specific focus on how cloud resources can be used for various neuroimaging applications. PMID:27279746

  15. The Path to Convergence: Design, Coordination and Social Issues in the Implementation of a Middleware Data Broker.

    NASA Astrophysics Data System (ADS)

    Slota, S.; Khalsa, S. J. S.

    2015-12-01

    Infrastructures are the result of systems, networks, and inter-networks that accrete, overlay and segment one another over time. As a result, working infrastructures represent a broad heterogeneity of elements - data types, computational resources, material substrates (computing hardware, physical infrastructure, labs, physical information resources, etc.) as well as organizational and social functions which result in divergent outputs and goals. Cyber infrastructure's engineering often defaults to a separation of the social from the technical that results in the engineering succeeding in limited ways, or the exposure of unanticipated points of failure within the system. Studying the development of middleware intended to mediate interactions among systems within an earth systems science infrastructure exposes organizational, technical and standards-focused negotiations endemic to a fundamental trait of infrastructure: its characteristic invisibility in use. Intended to perform a core function within the EarthCube cyberinfrastructure, the development, governance and maintenance of an automated brokering system is a microcosm of large-scale infrastructural efforts. Points of potential system failure, regardless of the extent to which they are more social or more technical in nature, can be considered in terms of the reverse salient: a point of social and material configuration that momentarily lags behind the progress of an emerging or maturing infrastructure. The implementation of the BCube data broker has exposed reverse salients in regards to the overall EarthCube infrastructure (and the role of middleware brokering) in the form of organizational factors such as infrastructural alignment, maintenance and resilience; differing and incompatible practices of data discovery and evaluation among users and stakeholders; and a preponderance of local variations in the implementation of standards and authentication in data access. These issues are characterized by their role in increasing tension or friction among components that are on the path to convergence and may help to predict otherwise-occluded endogenous points of failure or non-adoption in the infrastructure.

  16. Cloud Computing in Support of Applied Learning: A Baseline Study of Infrastructure Design at Southern Polytechnic State University

    ERIC Educational Resources Information Center

    Conn, Samuel S.; Reichgelt, Han

    2013-01-01

    Cloud computing represents an architecture and paradigm of computing designed to deliver infrastructure, platforms, and software as constructible computing resources on demand to networked users. As campuses are challenged to better accommodate academic needs for applications and computing environments, cloud computing can provide an accommodating…

  17. Cloud Computing and Its Applications in GIS

    NASA Astrophysics Data System (ADS)

    Kang, Cao

    2011-12-01

    Cloud computing is a novel computing paradigm that offers highly scalable and highly available distributed computing services. The objectives of this research are to: 1. analyze and understand cloud computing and its potential for GIS; 2. discover the feasibilities of migrating truly spatial GIS algorithms to distributed computing infrastructures; 3. explore a solution to host and serve large volumes of raster GIS data efficiently and speedily. These objectives thus form the basis for three professional articles. The first article is entitled "Cloud Computing and Its Applications in GIS". This paper introduces the concept, structure, and features of cloud computing. Features of cloud computing such as scalability, parallelization, and high availability make it a very capable computing paradigm. Unlike High Performance Computing (HPC), cloud computing uses inexpensive commodity computers. The uniform administration systems in cloud computing make it easier to use than GRID computing. Potential advantages of cloud-based GIS systems such as lower barrier to entry are consequently presented. Three cloud-based GIS system architectures are proposed: public cloud- based GIS systems, private cloud-based GIS systems and hybrid cloud-based GIS systems. Public cloud-based GIS systems provide the lowest entry barriers for users among these three architectures, but their advantages are offset by data security and privacy related issues. Private cloud-based GIS systems provide the best data protection, though they have the highest entry barriers. Hybrid cloud-based GIS systems provide a compromise between these extremes. The second article is entitled "A cloud computing algorithm for the calculation of Euclidian distance for raster GIS". Euclidean distance is a truly spatial GIS algorithm. Classical algorithms such as the pushbroom and growth ring techniques require computational propagation through the entire raster image, which makes it incompatible with the distributed nature of cloud computing. This paper presents a parallel Euclidean distance algorithm that works seamlessly with the distributed nature of cloud computing infrastructures. The mechanism of this algorithm is to subdivide a raster image into sub-images and wrap them with a one pixel deep edge layer of individually computed distance information. Each sub-image is then processed by a separate node, after which the resulting sub-images are reassembled into the final output. It is shown that while any rectangular sub-image shape can be used, those approximating squares are computationally optimal. This study also serves as a demonstration of this subdivide and layer-wrap strategy, which would enable the migration of many truly spatial GIS algorithms to cloud computing infrastructures. However, this research also indicates that certain spatial GIS algorithms such as cost distance cannot be migrated by adopting this mechanism, which presents significant challenges for the development of cloud-based GIS systems. The third article is entitled "A Distributed Storage Schema for Cloud Computing based Raster GIS Systems". This paper proposes a NoSQL Database Management System (NDDBMS) based raster GIS data storage schema. NDDBMS has good scalability and is able to use distributed commodity computers, which make it superior to Relational Database Management Systems (RDBMS) in a cloud computing environment. In order to provide optimized data service performance, the proposed storage schema analyzes the nature of commonly used raster GIS data sets. It discriminates two categories of commonly used data sets, and then designs corresponding data storage models for both categories. As a result, the proposed storage schema is capable of hosting and serving enormous volumes of raster GIS data speedily and efficiently on cloud computing infrastructures. In addition, the scheme also takes advantage of the data compression characteristics of Quadtrees, thus promoting efficient data storage. Through this assessment of cloud computing technology, the exploration of the challenges and solutions to the migration of GIS algorithms to cloud computing infrastructures, and the examination of strategies for serving large amounts of GIS data in a cloud computing infrastructure, this dissertation lends support to the feasibility of building a cloud-based GIS system. However, there are still challenges that need to be addressed before a full-scale functional cloud-based GIS system can be successfully implemented. (Abstract shortened by UMI.)

  18. Phenomenology tools on cloud infrastructures using OpenStack

    NASA Astrophysics Data System (ADS)

    Campos, I.; Fernández-del-Castillo, E.; Heinemeyer, S.; Lopez-Garcia, A.; Pahlen, F.; Borges, G.

    2013-04-01

    We present a new environment for computations in particle physics phenomenology employing recent developments in cloud computing. On this environment users can create and manage "virtual" machines on which the phenomenology codes/tools can be deployed easily in an automated way. We analyze the performance of this environment based on "virtual" machines versus the utilization of physical hardware. In this way we provide a qualitative result for the influence of the host operating system on the performance of a representative set of applications for phenomenology calculations.

  19. Detecting Abnormal Machine Characteristics in Cloud Infrastructures

    NASA Technical Reports Server (NTRS)

    Bhaduri, Kanishka; Das, Kamalika; Matthews, Bryan L.

    2011-01-01

    In the cloud computing environment resources are accessed as services rather than as a product. Monitoring this system for performance is crucial because of typical pay-peruse packages bought by the users for their jobs. With the huge number of machines currently in the cloud system, it is often extremely difficult for system administrators to keep track of all machines using distributed monitoring programs such as Ganglia1 which lacks system health assessment and summarization capabilities. To overcome this problem, we propose a technique for automated anomaly detection using machine performance data in the cloud. Our algorithm is entirely distributed and runs locally on each computing machine on the cloud in order to rank the machines in order of their anomalous behavior for given jobs. There is no need to centralize any of the performance data for the analysis and at the end of the analysis, our algorithm generates error reports, thereby allowing the system administrators to take corrective actions. Experiments performed on real data sets collected for different jobs validate the fact that our algorithm has a low overhead for tracking anomalous machines in a cloud infrastructure.

  20. Technology Policy: Information Infrastructure [Information Superhighways and High Performance Computing]. Hearings before the Subcommittee on Technology, Environment and Aviation of the Committee on Science, Space, and Technology. House of Representatives, One Hundred Third Congress, First Session (March 23 and 25, 1993).

    ERIC Educational Resources Information Center

    Congress of the U.S., Washington, DC. House Committee on Science, Space and Technology.

    This document contains transcriptions of testimony and prepared statements on national technology policy with a focus on President Clinton and Vice President Gore's initiatives to support the development of a national information infrastructure. On the first day of the hearing testimony was received from Edward H. Salmon, Chairman, New Jersey…

  1. Explicit Content Caching at Mobile Edge Networks with Cross-Layer Sensing

    PubMed Central

    Chen, Lingyu; Su, Youxing; Luo, Wenbin; Hong, Xuemin; Shi, Jianghong

    2018-01-01

    The deployment density and computational power of small base stations (BSs) are expected to increase significantly in the next generation mobile communication networks. These BSs form the mobile edge network, which is a pervasive and distributed infrastructure that can empower a variety of edge/fog computing applications. This paper proposes a novel edge-computing application called explicit caching, which stores selective contents at BSs and exposes such contents to local users for interactive browsing and download. We formulate the explicit caching problem as a joint content recommendation, caching, and delivery problem, which aims to maximize the expected user quality-of-experience (QoE) with varying degrees of cross-layer sensing capability. Optimal and effective heuristic algorithms are presented to solve the problem. The theoretical performance bounds of the explicit caching system are derived in simplified scenarios. The impacts of cache storage space, BS backhaul capacity, cross-layer information, and user mobility on the system performance are simulated and discussed in realistic scenarios. Results suggest that, compared with conventional implicit caching schemes, explicit caching can better exploit the mobile edge network infrastructure for personalized content dissemination. PMID:29565313

  2. Explicit Content Caching at Mobile Edge Networks with Cross-Layer Sensing.

    PubMed

    Chen, Lingyu; Su, Youxing; Luo, Wenbin; Hong, Xuemin; Shi, Jianghong

    2018-03-22

    The deployment density and computational power of small base stations (BSs) are expected to increase significantly in the next generation mobile communication networks. These BSs form the mobile edge network, which is a pervasive and distributed infrastructure that can empower a variety of edge/fog computing applications. This paper proposes a novel edge-computing application called explicit caching, which stores selective contents at BSs and exposes such contents to local users for interactive browsing and download. We formulate the explicit caching problem as a joint content recommendation, caching, and delivery problem, which aims to maximize the expected user quality-of-experience (QoE) with varying degrees of cross-layer sensing capability. Optimal and effective heuristic algorithms are presented to solve the problem. The theoretical performance bounds of the explicit caching system are derived in simplified scenarios. The impacts of cache storage space, BS backhaul capacity, cross-layer information, and user mobility on the system performance are simulated and discussed in realistic scenarios. Results suggest that, compared with conventional implicit caching schemes, explicit caching can better exploit the mobile edge network infrastructure for personalized content dissemination.

  3. EGI-EUDAT integration activity - Pair data and high-throughput computing resources together

    NASA Astrophysics Data System (ADS)

    Scardaci, Diego; Viljoen, Matthew; Vitlacil, Dejan; Fiameni, Giuseppe; Chen, Yin; sipos, Gergely; Ferrari, Tiziana

    2016-04-01

    EGI (www.egi.eu) is a publicly funded e-infrastructure put together to give scientists access to more than 530,000 logical CPUs, 200 PB of disk capacity and 300 PB of tape storage to drive research and innovation in Europe. The infrastructure provides both high throughput computing and cloud compute/storage capabilities. Resources are provided by about 350 resource centres which are distributed across 56 countries in Europe, the Asia-Pacific region, Canada and Latin America. EUDAT (www.eudat.eu) is a collaborative Pan-European infrastructure providing research data services, training and consultancy for researchers, research communities, research infrastructures and data centres. EUDAT's vision is to enable European researchers and practitioners from any research discipline to preserve, find, access, and process data in a trusted environment, as part of a Collaborative Data Infrastructure (CDI) conceived as a network of collaborating, cooperating centres, combining the richness of numerous community-specific data repositories with the permanence and persistence of some of Europe's largest scientific data centres. EGI and EUDAT, in the context of their flagship projects, EGI-Engage and EUDAT2020, started in March 2015 a collaboration to harmonise the two infrastructures, including technical interoperability, authentication, authorisation and identity management, policy and operations. The main objective of this work is to provide end-users with a seamless access to an integrated infrastructure offering both EGI and EUDAT services and, then, pairing data and high-throughput computing resources together. To define the roadmap of this collaboration, EGI and EUDAT selected a set of relevant user communities, already collaborating with both infrastructures, which could bring requirements and help to assign the right priorities to each of them. In this way, from the beginning, this activity has been really driven by the end users. The identified user communities are relevant European Research infrastructure in the field of Earth Science (EPOS and ICOS), Bioinformatics (BBMRI and ELIXIR) and Space Physics (EISCAT-3D). The first outcome of this activity has been the definition of a generic use case that captures the typical user scenario with respect the integrated use of the EGI and EUDAT infrastructures. This generic use case allows a user to instantiate a set of Virtual Machine images on the EGI Federated Cloud to perform computational jobs that analyse data previously stored on EUDAT long-term storage systems. The results of such analysis can be staged back to EUDAT storages, and if needed, allocated with Permanent identifyers (PIDs) for future use. The implementation of this generic use case requires the following integration activities between EGI and EUDAT: (1) harmonisation of the user authentication and authorisation models, (2) implementing interface connectors between the relevant EGI and EUDAT services, particularly EGI Cloud compute facilities and EUDAT long-term storage and PID systems. In the presentation, the collected user requirements and the implementation status of the universal use case will be showed. Furthermore, how the universal use case is currently applied to satisfy EPOS and ICOS needs will be described.

  4. Partnership in Computational Science

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huray, Paul G.

    1999-02-24

    This is the final report for the "Partnership in Computational Science" (PICS) award in an amount of $500,000 for the period January 1, 1993 through December 31, 1993. A copy of the proposal with its budget is attached as Appendix A. This report first describes the consequent significance of the DOE award in building infrastructure of high performance computing in the Southeast and then describes the work accomplished under this grant and a list of publications resulting from it.

  5. Sustaining a Community Computing Infrastructure for Online Teacher Professional Development: A Case Study of Designing Tapped In

    NASA Astrophysics Data System (ADS)

    Farooq, Umer; Schank, Patricia; Harris, Alexandra; Fusco, Judith; Schlager, Mark

    Community computing has recently grown to become a major research area in human-computer interaction. One of the objectives of community computing is to support computer-supported cooperative work among distributed collaborators working toward shared professional goals in online communities of practice. A core issue in designing and developing community computing infrastructures — the underlying sociotechnical layer that supports communitarian activities — is sustainability. Many community computing initiatives fail because the underlying infrastructure does not meet end user requirements; the community is unable to maintain a critical mass of users consistently over time; it generates insufficient social capital to support significant contributions by members of the community; or, as typically happens with funded initiatives, financial and human capital resource become unavailable to further maintain the infrastructure. On the basis of more than 9 years of design experience with Tapped In-an online community of practice for education professionals — we present a case study that discusses four design interventions that have sustained the Tapped In infrastructure and its community to date. These interventions represent broader design strategies for developing online environments for professional communities of practice.

  6. Configuration Management and Infrastructure Monitoring Using CFEngine and Icinga for Real-time Heterogeneous Data Taking Environment

    NASA Astrophysics Data System (ADS)

    Poat, M. D.; Lauret, J.; Betts, W.

    2015-12-01

    The STAR online computing environment is an intensive ever-growing system used for real-time data collection and analysis. Composed of heterogeneous and sometimes groups of custom-tuned machines, the computing infrastructure was previously managed by manual configurations and inconsistently monitored by a combination of tools. This situation led to configuration inconsistency and an overload of repetitive tasks along with lackluster communication between personnel and machines. Globally securing this heterogeneous cyberinfrastructure was tedious at best and an agile, policy-driven system ensuring consistency, was pursued. Three configuration management tools, Chef, Puppet, and CFEngine have been compared in reliability, versatility and performance along with a comparison of infrastructure monitoring tools Nagios and Icinga. STAR has selected the CFEngine configuration management tool and the Icinga infrastructure monitoring system leading to a versatile and sustainable solution. By leveraging these two tools STAR can now swiftly upgrade and modify the environment to its needs with ease as well as promptly react to cyber-security requests. By creating a sustainable long term monitoring solution, the detection of failures was reduced from days to minutes, allowing rapid actions before the issues become dire problems, potentially causing loss of precious experimental data or uptime.

  7. A Cloud-based Infrastructure and Architecture for Environmental System Research

    NASA Astrophysics Data System (ADS)

    Wang, D.; Wei, Y.; Shankar, M.; Quigley, J.; Wilson, B. E.

    2016-12-01

    The present availability of high-capacity networks, low-cost computers and storage devices, and the widespread adoption of hardware virtualization and service-oriented architecture provide a great opportunity to enable data and computing infrastructure sharing between closely related research activities. By taking advantage of these approaches, along with the world-class high computing and data infrastructure located at Oak Ridge National Laboratory, a cloud-based infrastructure and architecture has been developed to efficiently deliver essential data and informatics service and utilities to the environmental system research community, and will provide unique capabilities that allows terrestrial ecosystem research projects to share their software utilities (tools), data and even data submission workflow in a straightforward fashion. The infrastructure will minimize large disruptions from current project-based data submission workflows for better acceptances from existing projects, since many ecosystem research projects already have their own requirements or preferences for data submission and collection. The infrastructure will eliminate scalability problems with current project silos by provide unified data services and infrastructure. The Infrastructure consists of two key components (1) a collection of configurable virtual computing environments and user management systems that expedite data submission and collection from environmental system research community, and (2) scalable data management services and system, originated and development by ORNL data centers.

  8. The ATLAS Simulation Infrastructure

    DOE PAGES

    Aad, G.; Abbott, B.; Abdallah, J.; ...

    2010-09-25

    The simulation software for the ATLAS Experiment at the Large Hadron Collider is being used for large-scale production of events on the LHC Computing Grid. This simulation requires many components, from the generators that simulate particle collisions, through packages simulating the response of the various detectors and triggers. All of these components come together under the ATLAS simulation infrastructure. In this paper, that infrastructure is discussed, including that supporting the detector description, interfacing the event generation, and combining the GEANT4 simulation of the response of the individual detectors. Also described are the tools allowing the software validation, performance testing, andmore » the validation of the simulated output against known physics processes.« less

  9. The Human Physiome: how standards, software and innovative service infrastructures are providing the building blocks to make it achievable

    PubMed Central

    2016-01-01

    Reconstructing and understanding the Human Physiome virtually is a complex mathematical problem, and a highly demanding computational challenge. Mathematical models spanning from the molecular level through to whole populations of individuals must be integrated, then personalized. This requires interoperability with multiple disparate and geographically separated data sources, and myriad computational software tools. Extracting and producing knowledge from such sources, even when the databases and software are readily available, is a challenging task. Despite the difficulties, researchers must frequently perform these tasks so that available knowledge can be continually integrated into the common framework required to realize the Human Physiome. Software and infrastructures that support the communities that generate these, together with their underlying standards to format, describe and interlink the corresponding data and computer models, are pivotal to the Human Physiome being realized. They provide the foundations for integrating, exchanging and re-using data and models efficiently, and correctly, while also supporting the dissemination of growing knowledge in these forms. In this paper, we explore the standards, software tooling, repositories and infrastructures that support this work, and detail what makes them vital to realizing the Human Physiome. PMID:27051515

  10. The Human Physiome: how standards, software and innovative service infrastructures are providing the building blocks to make it achievable.

    PubMed

    Nickerson, David; Atalag, Koray; de Bono, Bernard; Geiger, Jörg; Goble, Carole; Hollmann, Susanne; Lonien, Joachim; Müller, Wolfgang; Regierer, Babette; Stanford, Natalie J; Golebiewski, Martin; Hunter, Peter

    2016-04-06

    Reconstructing and understanding the Human Physiome virtually is a complex mathematical problem, and a highly demanding computational challenge. Mathematical models spanning from the molecular level through to whole populations of individuals must be integrated, then personalized. This requires interoperability with multiple disparate and geographically separated data sources, and myriad computational software tools. Extracting and producing knowledge from such sources, even when the databases and software are readily available, is a challenging task. Despite the difficulties, researchers must frequently perform these tasks so that available knowledge can be continually integrated into the common framework required to realize the Human Physiome. Software and infrastructures that support the communities that generate these, together with their underlying standards to format, describe and interlink the corresponding data and computer models, are pivotal to the Human Physiome being realized. They provide the foundations for integrating, exchanging and re-using data and models efficiently, and correctly, while also supporting the dissemination of growing knowledge in these forms. In this paper, we explore the standards, software tooling, repositories and infrastructures that support this work, and detail what makes them vital to realizing the Human Physiome.

  11. Building Nationally-Focussed, Globally Federated, High Performance Earth Science Platforms to Solve Next Generation Social and Economic Issues.

    NASA Astrophysics Data System (ADS)

    Wyborn, Lesley; Evans, Ben; Foster, Clinton; Pugh, Timothy; Uhlherr, Alfred

    2015-04-01

    Digital geoscience data and information are integral to informing decisions on the social, economic and environmental management of natural resources. Traditionally, such decisions were focused on regional or national viewpoints only, but it is increasingly being recognised that global perspectives are required to meet new challenges such as predicting impacts of climate change; sustainably exploiting scarce water, mineral and energy resources; and protecting our communities through better prediction of the behaviour of natural hazards. In recent years, technical advances in scientific instruments have resulted in a surge in data volumes, with data now being collected at unprecedented rates and at ever increasing resolutions. The size of many earth science data sets now exceed the computational capacity of many government and academic organisations to locally store and dynamically access the data sets; to internally process and analyse them to high resolutions; and then to deliver them online to clients, partners and stakeholders. Fortunately, at the same time, computational capacities have commensurately increased (both cloud and HPC): these can now provide the capability to effectively access the ever-growing data assets within realistic time frames. However, to achieve this, data and computing need to be co-located: bandwidth limits the capacity to move the large data sets; the data transfers are too slow; and latencies to access them are too high. These scenarios are driving the move towards more centralised High Performance (HP) Infrastructures. The rapidly increasing scale of data, the growing complexity of software and hardware environments, combined with the energy costs of running such infrastructures is creating a compelling economic argument for just having one or two major national (or continental) HP facilities that can be federated internationally to enable earth and environmental issues to be tackled at global scales. But at the same time, if properly constructed, these infrastructures can also service very small-scale research projects. The National Computational Infrastructure (NCI) at the Australian National University (ANU) has built such an HP infrastructure as part of the Australian Government's National Collaborative Research Infrastructure Strategy. NCI operates as a formal partnership between the ANU and the three major Australian National Government Scientific Agencies: the Commonwealth Scientific and Industrial Research Organisation (CSIRO), the Bureau of Meteorology and Geoscience Australia. The government partners agreed to explore the new opportunities offered within the partnership with NCI, rather than each running their own separate agenda independently. The data from these national agencies, as well as from collaborating overseas organisations (e.g., NASA, NOAA, USGS, CMIP, etc.) are either replicated to, or produced at, NCI. By co-locating and harmonising these vast data collections within the integrated HP computing environments at NCI, new opportunities have arisen for Data-intensive Interdisciplinary Science at scales and resolutions not hitherto possible. The new NCI infrastructure has also enabled the blending of research by the university sector with the more operational business of government science agencies, with the fundamental shift being that researchers from both sectors work and collaborate within a federated data and computational environment that contains both national and international data collections.

  12. The multi-modal Australian ScienceS Imaging and Visualization Environment (MASSIVE) high performance computing infrastructure: applications in neuroscience and neuroinformatics research

    PubMed Central

    Goscinski, Wojtek J.; McIntosh, Paul; Felzmann, Ulrich; Maksimenko, Anton; Hall, Christopher J.; Gureyev, Timur; Thompson, Darren; Janke, Andrew; Galloway, Graham; Killeen, Neil E. B.; Raniga, Parnesh; Kaluza, Owen; Ng, Amanda; Poudel, Govinda; Barnes, David G.; Nguyen, Toan; Bonnington, Paul; Egan, Gary F.

    2014-01-01

    The Multi-modal Australian ScienceS Imaging and Visualization Environment (MASSIVE) is a national imaging and visualization facility established by Monash University, the Australian Synchrotron, the Commonwealth Scientific Industrial Research Organization (CSIRO), and the Victorian Partnership for Advanced Computing (VPAC), with funding from the National Computational Infrastructure and the Victorian Government. The MASSIVE facility provides hardware, software, and expertise to drive research in the biomedical sciences, particularly advanced brain imaging research using synchrotron x-ray and infrared imaging, functional and structural magnetic resonance imaging (MRI), x-ray computer tomography (CT), electron microscopy and optical microscopy. The development of MASSIVE has been based on best practice in system integration methodologies, frameworks, and architectures. The facility has: (i) integrated multiple different neuroimaging analysis software components, (ii) enabled cross-platform and cross-modality integration of neuroinformatics tools, and (iii) brought together neuroimaging databases and analysis workflows. MASSIVE is now operational as a nationally distributed and integrated facility for neuroinfomatics and brain imaging research. PMID:24734019

  13. Methodologies and systems for heterogeneous concurrent computing

    NASA Technical Reports Server (NTRS)

    Sunderam, V. S.

    1994-01-01

    Heterogeneous concurrent computing is gaining increasing acceptance as an alternative or complementary paradigm to multiprocessor-based parallel processing as well as to conventional supercomputing. While algorithmic and programming aspects of heterogeneous concurrent computing are similar to their parallel processing counterparts, system issues, partitioning and scheduling, and performance aspects are significantly different. In this paper, we discuss critical design and implementation issues in heterogeneous concurrent computing, and describe techniques for enhancing its effectiveness. In particular, we highlight the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments. We also present recent developments and experiences in the context of the PVM system and comment on ongoing and future work.

  14. Heterogeneous concurrent computing with exportable services

    NASA Technical Reports Server (NTRS)

    Sunderam, Vaidy

    1995-01-01

    Heterogeneous concurrent computing, based on the traditional process-oriented model, is approaching its functionality and performance limits. An alternative paradigm, based on the concept of services, supporting data driven computation, and built on a lightweight process infrastructure, is proposed to enhance the functional capabilities and the operational efficiency of heterogeneous network-based concurrent computing. TPVM is an experimental prototype system supporting exportable services, thread-based computation, and remote memory operations that is built as an extension of and an enhancement to the PVM concurrent computing system. TPVM offers a significantly different computing paradigm for network-based computing, while maintaining a close resemblance to the conventional PVM model in the interest of compatibility and ease of transition Preliminary experiences have demonstrated that the TPVM framework presents a natural yet powerful concurrent programming interface, while being capable of delivering performance improvements of upto thirty percent.

  15. Managing a tier-2 computer centre with a private cloud infrastructure

    NASA Astrophysics Data System (ADS)

    Bagnasco, Stefano; Berzano, Dario; Brunetti, Riccardo; Lusso, Stefano; Vallero, Sara

    2014-06-01

    In a typical scientific computing centre, several applications coexist and share a single physical infrastructure. An underlying Private Cloud infrastructure eases the management and maintenance of such heterogeneous applications (such as multipurpose or application-specific batch farms, Grid sites, interactive data analysis facilities and others), allowing dynamic allocation resources to any application. Furthermore, the maintenance of large deployments of complex and rapidly evolving middleware and application software is eased by the use of virtual images and contextualization techniques. Such infrastructures are being deployed in some large centres (see e.g. the CERN Agile Infrastructure project), but with several open-source tools reaching maturity this is becoming viable also for smaller sites. In this contribution we describe the Private Cloud infrastructure at the INFN-Torino Computer Centre, that hosts a full-fledged WLCG Tier-2 centre, an Interactive Analysis Facility for the ALICE experiment at the CERN LHC and several smaller scientific computing applications. The private cloud building blocks include the OpenNebula software stack, the GlusterFS filesystem and the OpenWRT Linux distribution (used for network virtualization); a future integration into a federated higher-level infrastructure is made possible by exposing commonly used APIs like EC2 and OCCI.

  16. GISpark: A Geospatial Distributed Computing Platform for Spatiotemporal Big Data

    NASA Astrophysics Data System (ADS)

    Wang, S.; Zhong, E.; Wang, E.; Zhong, Y.; Cai, W.; Li, S.; Gao, S.

    2016-12-01

    Geospatial data are growing exponentially because of the proliferation of cost effective and ubiquitous positioning technologies such as global remote-sensing satellites and location-based devices. Analyzing large amounts of geospatial data can provide great value for both industrial and scientific applications. Data- and compute- intensive characteristics inherent in geospatial big data increasingly pose great challenges to technologies of data storing, computing and analyzing. Such challenges require a scalable and efficient architecture that can store, query, analyze, and visualize large-scale spatiotemporal data. Therefore, we developed GISpark - a geospatial distributed computing platform for processing large-scale vector, raster and stream data. GISpark is constructed based on the latest virtualized computing infrastructures and distributed computing architecture. OpenStack and Docker are used to build multi-user hosting cloud computing infrastructure for GISpark. The virtual storage systems such as HDFS, Ceph, MongoDB are combined and adopted for spatiotemporal data storage management. Spark-based algorithm framework is developed for efficient parallel computing. Within this framework, SuperMap GIScript and various open-source GIS libraries can be integrated into GISpark. GISpark can also integrated with scientific computing environment (e.g., Anaconda), interactive computing web applications (e.g., Jupyter notebook), and machine learning tools (e.g., TensorFlow/Orange). The associated geospatial facilities of GISpark in conjunction with the scientific computing environment, exploratory spatial data analysis tools, temporal data management and analysis systems make up a powerful geospatial computing tool. GISpark not only provides spatiotemporal big data processing capacity in the geospatial field, but also provides spatiotemporal computational model and advanced geospatial visualization tools that deals with other domains related with spatial property. We tested the performance of the platform based on taxi trajectory analysis. Results suggested that GISpark achieves excellent run time performance in spatiotemporal big data applications.

  17. Commissioning the CERN IT Agile Infrastructure with experiment workloads

    NASA Astrophysics Data System (ADS)

    Medrano Llamas, Ramón; Harald Barreiro Megino, Fernando; Kucharczyk, Katarzyna; Kamil Denis, Marek; Cinquilli, Mattia

    2014-06-01

    In order to ease the management of their infrastructure, most of the WLCG sites are adopting cloud based strategies. In the case of CERN, the Tier 0 of the WLCG, is completely restructuring the resource and configuration management of their computing center under the codename Agile Infrastructure. Its goal is to manage 15,000 Virtual Machines by means of an OpenStack middleware in order to unify all the resources in CERN's two datacenters: the one placed in Meyrin and the new on in Wigner, Hungary. During the commissioning of this infrastructure, CERN IT is offering an attractive amount of computing resources to the experiments (800 cores for ATLAS and CMS) through a private cloud interface. ATLAS and CMS have joined forces to exploit them by running stress tests and simulation workloads since November 2012. This work will describe the experience of the first deployments of the current experiment workloads on the CERN private cloud testbed. The paper is organized as follows: the first section will explain the integration of the experiment workload management systems (WMS) with the cloud resources. The second section will revisit the performance and stress testing performed with HammerCloud in order to evaluate and compare the suitability for the experiment workloads. The third section will go deeper into the dynamic provisioning techniques, such as the use of the cloud APIs directly by the WMS. The paper finishes with a review of the conclusions and the challenges ahead.

  18. Evaluating Commercial and Private Cloud Services for Facility-Scale Geodetic Data Access, Analysis, and Services

    NASA Astrophysics Data System (ADS)

    Meertens, C. M.; Boler, F. M.; Ertz, D. J.; Mencin, D.; Phillips, D.; Baker, S.

    2017-12-01

    UNAVCO, in its role as a NSF facility for geodetic infrastructure and data, has succeeded for over two decades using on-premises infrastructure, and while the promise of cloud-based infrastructure is well-established, significant questions about suitability of such infrastructure for facility-scale services remain. Primarily through the GeoSciCloud award from NSF EarthCube, UNAVCO is investigating the costs, advantages, and disadvantages of providing its geodetic data and services in the cloud versus using UNAVCO's on-premises infrastructure. (IRIS is a collaborator on the project and is performing its own suite of investigations). In contrast to the 2-3 year time scale for the research cycle, the time scale of operation and planning for NSF facilities is for a minimum of five years and for some services extends to a decade or more. Planning for on-premises infrastructure is deliberate, and migrations typically take months to years to fully implement. Migrations to a cloud environment can only go forward with similar deliberate planning and understanding of all costs and benefits. The EarthCube GeoSciCloud project is intended to address the uncertainties of facility-level operations in the cloud. Investigations are being performed in a commercial cloud environment (Amazon AWS) during the first year of the project and in a private cloud environment (NSF XSEDE resource at the Texas Advanced Computing Center) during the second year. These investigations are expected to illuminate the potential as well as the limitations of running facility scale production services in the cloud. The work includes running parallel equivalent cloud-based services to on premises services and includes: data serving via ftp from a large data store, operation of a metadata database, production scale processing of multiple months of geodetic data, web services delivery of quality checked data and products, large-scale compute services for event post-processing, and serving real time data from a network of 700-plus GPS stations. The evaluation is based on a suite of metrics that we have developed to elucidate the effectiveness of cloud-based services in price, performance, and management. Services are currently running in AWS and evaluation is underway.

  19. Marginal adaptation and CAD-CAM technology: A systematic review of restorative material and fabrication techniques.

    PubMed

    Papadiochou, Sofia; Pissiotis, Argirios L

    2018-04-01

    The comparative assessment of computer-aided design and computer-aided manufacturing (CAD-CAM) technology and other fabrication techniques pertaining to marginal adaptation should be documented. Limited evidence exists on the effect of restorative material on the performance of a CAD-CAM system relative to marginal adaptation. The purpose of this systematic review was to investigate whether the marginal adaptation of CAD-CAM single crowns, fixed dental prostheses, and implant-retained fixed dental prostheses or their infrastructures differs from that obtained by other fabrication techniques using a similar restorative material and whether it depends on the type of restorative material. An electronic search of English-language literature published between January 1, 2000, and June 30, 2016, was conducted of the Medline/PubMed database. Of the 55 included comparative studies, 28 compared CAD-CAM technology with conventional fabrication techniques, 12 contrasted CAD-CAM technology and copy milling, 4 compared CAD-CAM milling with direct metal laser sintering (DMLS), and 22 investigated the performance of a CAD-CAM system regarding marginal adaptation in restorations/infrastructures produced with different restorative materials. Most of the CAD-CAM restorations/infrastructures were within the clinically acceptable marginal discrepancy (MD) range. The performance of a CAD-CAM system relative to marginal adaptation is influenced by the restorative material. Compared with CAD-CAM, most of the heat-pressed lithium disilicate crowns displayed equal or smaller MD values. Slip-casting crowns exhibited similar or better marginal accuracy than those fabricated with CAD-CAM. Cobalt-chromium and titanium implant infrastructures produced using a CAD-CAM system elicited smaller MD values than zirconia. The majority of cobalt-chromium restorations/infrastructures produced by DMLS displayed better marginal accuracy than those fabricated with the casting technique. Compared with copy milling, the majority of zirconia restorations/infrastructures produced by CAD-CAM milling exhibited better marginal adaptation. No clear conclusions can be drawn about the superiority of CAD-CAM milling over the casting technique and DMLS regarding marginal adaptation. Copyright © 2017 Editorial Council for the Journal of Prosthetic Dentistry. Published by Elsevier Inc. All rights reserved.

  20. WPS mediation: An approach to process geospatial data on different computing backends

    NASA Astrophysics Data System (ADS)

    Giuliani, Gregory; Nativi, Stefano; Lehmann, Anthony; Ray, Nicolas

    2012-10-01

    The OGC Web Processing Service (WPS) specification allows generating information by processing distributed geospatial data made available through Spatial Data Infrastructures (SDIs). However, current SDIs have limited analytical capacities and various problems emerge when trying to use them in data and computing-intensive domains such as environmental sciences. These problems are usually not or only partially solvable using single computing resources. Therefore, the Geographic Information (GI) community is trying to benefit from the superior storage and computing capabilities offered by distributed computing (e.g., Grids, Clouds) related methods and technologies. Currently, there is no commonly agreed approach to grid-enable WPS. No implementation allows one to seamlessly execute a geoprocessing calculation following user requirements on different computing backends, ranging from a stand-alone GIS server up to computer clusters and large Grid infrastructures. Considering this issue, this paper presents a proof of concept by mediating different geospatial and Grid software packages, and by proposing an extension of WPS specification through two optional parameters. The applicability of this approach will be demonstrated using a Normalized Difference Vegetation Index (NDVI) mediated WPS process, highlighting benefits, and issues that need to be further investigated to improve performances.

  1. Real-time performance monitoring and management system

    DOEpatents

    Budhraja, Vikram S [Los Angeles, CA; Dyer, James D [La Mirada, CA; Martinez Morales, Carlos A [Upland, CA

    2007-06-19

    A real-time performance monitoring system for monitoring an electric power grid. The electric power grid has a plurality of grid portions, each grid portion corresponding to one of a plurality of control areas. The real-time performance monitoring system includes a monitor computer for monitoring at least one of reliability metrics, generation metrics, transmission metrics, suppliers metrics, grid infrastructure security metrics, and markets metrics for the electric power grid. The data for metrics being monitored by the monitor computer are stored in a data base, and a visualization of the metrics is displayed on at least one display computer having a monitor. The at least one display computer in one said control area enables an operator to monitor the grid portion corresponding to a different said control area.

  2. Genomic cloud computing: legal and ethical points to consider

    PubMed Central

    Dove, Edward S; Joly, Yann; Tassé, Anne-Marie; Burton, Paul; Chisholm, Rex; Fortier, Isabel; Goodwin, Pat; Harris, Jennifer; Hveem, Kristian; Kaye, Jane; Kent, Alistair; Knoppers, Bartha Maria; Lindpaintner, Klaus; Little, Julian; Riegman, Peter; Ripatti, Samuli; Stolk, Ronald; Bobrow, Martin; Cambon-Thomsen, Anne; Dressler, Lynn; Joly, Yann; Kato, Kazuto; Knoppers, Bartha Maria; Rodriguez, Laura Lyman; McPherson, Treasa; Nicolás, Pilar; Ouellette, Francis; Romeo-Casabona, Carlos; Sarin, Rajiv; Wallace, Susan; Wiesner, Georgia; Wilson, Julia; Zeps, Nikolajs; Simkevitz, Howard; De Rienzo, Assunta; Knoppers, Bartha M

    2015-01-01

    The biggest challenge in twenty-first century data-intensive genomic science, is developing vast computer infrastructure and advanced software tools to perform comprehensive analyses of genomic data sets for biomedical research and clinical practice. Researchers are increasingly turning to cloud computing both as a solution to integrate data from genomics, systems biology and biomedical data mining and as an approach to analyze data to solve biomedical problems. Although cloud computing provides several benefits such as lower costs and greater efficiency, it also raises legal and ethical issues. In this article, we discuss three key ‘points to consider' (data control; data security, confidentiality and transfer; and accountability) based on a preliminary review of several publicly available cloud service providers' Terms of Service. These ‘points to consider' should be borne in mind by genomic research organizations when negotiating legal arrangements to store genomic data on a large commercial cloud service provider's servers. Diligent genomic cloud computing means leveraging security standards and evaluation processes as a means to protect data and entails many of the same good practices that researchers should always consider in securing their local infrastructure. PMID:25248396

  3. Genomic cloud computing: legal and ethical points to consider.

    PubMed

    Dove, Edward S; Joly, Yann; Tassé, Anne-Marie; Knoppers, Bartha M

    2015-10-01

    The biggest challenge in twenty-first century data-intensive genomic science, is developing vast computer infrastructure and advanced software tools to perform comprehensive analyses of genomic data sets for biomedical research and clinical practice. Researchers are increasingly turning to cloud computing both as a solution to integrate data from genomics, systems biology and biomedical data mining and as an approach to analyze data to solve biomedical problems. Although cloud computing provides several benefits such as lower costs and greater efficiency, it also raises legal and ethical issues. In this article, we discuss three key 'points to consider' (data control; data security, confidentiality and transfer; and accountability) based on a preliminary review of several publicly available cloud service providers' Terms of Service. These 'points to consider' should be borne in mind by genomic research organizations when negotiating legal arrangements to store genomic data on a large commercial cloud service provider's servers. Diligent genomic cloud computing means leveraging security standards and evaluation processes as a means to protect data and entails many of the same good practices that researchers should always consider in securing their local infrastructure.

  4. Analysis of CERN computing infrastructure and monitoring data

    NASA Astrophysics Data System (ADS)

    Nieke, C.; Lassnig, M.; Menichetti, L.; Motesnitsalis, E.; Duellmann, D.

    2015-12-01

    Optimizing a computing infrastructure on the scale of LHC requires a quantitative understanding of a complex network of many different resources and services. For this purpose the CERN IT department and the LHC experiments are collecting a large multitude of logs and performance probes, which are already successfully used for short-term analysis (e.g. operational dashboards) within each group. The IT analytics working group has been created with the goal to bring data sources from different services and on different abstraction levels together and to implement a suitable infrastructure for mid- to long-term statistical analysis. It further provides a forum for joint optimization across single service boundaries and the exchange of analysis methods and tools. To simplify access to the collected data, we implemented an automated repository for cleaned and aggregated data sources based on the Hadoop ecosystem. This contribution describes some of the challenges encountered, such as dealing with heterogeneous data formats, selecting an efficient storage format for map reduce and external access, and will describe the repository user interface. Using this infrastructure we were able to quantitatively analyze the relationship between CPU/wall fraction, latency/throughput constraints of network and disk and the effective job throughput. In this contribution we will first describe the design of the shared analysis infrastructure and then present a summary of first analysis results from the combined data sources.

  5. Control System Applicable Use Assessment of the Secure Computing Corporation - Secure Firewall (Sidewinder)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hadley, Mark D.; Clements, Samuel L.

    2009-01-01

    Battelle’s National Security & Defense objective is, “applying unmatched expertise and unique facilities to deliver homeland security solutions. From detection and protection against weapons of mass destruction to emergency preparedness/response and protection of critical infrastructure, we are working with industry and government to integrate policy, operational, technological, and logistical parameters that will secure a safe future”. In an ongoing effort to meet this mission, engagements with industry that are intended to improve operational and technical attributes of commercial solutions that are related to national security initiatives are necessary. This necessity will ensure that capabilities for protecting critical infrastructure assets aremore » considered by commercial entities in their development, design, and deployment lifecycles thus addressing the alignment of identified deficiencies and improvements needed to support national cyber security initiatives. The Secure Firewall (Sidewinder) appliance by Secure Computing was assessed for applicable use in critical infrastructure control system environments, such as electric power, nuclear and other facilities containing critical systems that require augmented protection from cyber threat. The testing was performed in the Pacific Northwest National Laboratory’s (PNNL) Electric Infrastructure Operations Center (EIOC). The Secure Firewall was tested in a network configuration that emulates a typical control center network and then evaluated. A number of observations and recommendations are included in this report relating to features currently included in the Secure Firewall that support critical infrastructure security needs.« less

  6. SBSI: an extensible distributed software infrastructure for parameter estimation in systems biology

    PubMed Central

    Adams, Richard; Clark, Allan; Yamaguchi, Azusa; Hanlon, Neil; Tsorman, Nikos; Ali, Shakir; Lebedeva, Galina; Goltsov, Alexey; Sorokin, Anatoly; Akman, Ozgur E.; Troein, Carl; Millar, Andrew J.; Goryanin, Igor; Gilmore, Stephen

    2013-01-01

    Summary: Complex computational experiments in Systems Biology, such as fitting model parameters to experimental data, can be challenging to perform. Not only do they frequently require a high level of computational power, but the software needed to run the experiment needs to be usable by scientists with varying levels of computational expertise, and modellers need to be able to obtain up-to-date experimental data resources easily. We have developed a software suite, the Systems Biology Software Infrastructure (SBSI), to facilitate the parameter-fitting process. SBSI is a modular software suite composed of three major components: SBSINumerics, a high-performance library containing parallelized algorithms for performing parameter fitting; SBSIDispatcher, a middleware application to track experiments and submit jobs to back-end servers; and SBSIVisual, an extensible client application used to configure optimization experiments and view results. Furthermore, we have created a plugin infrastructure to enable project-specific modules to be easily installed. Plugin developers can take advantage of the existing user-interface and application framework to customize SBSI for their own uses, facilitated by SBSI’s use of standard data formats. Availability and implementation: All SBSI binaries and source-code are freely available from http://sourceforge.net/projects/sbsi under an Apache 2 open-source license. The server-side SBSINumerics runs on any Unix-based operating system; both SBSIVisual and SBSIDispatcher are written in Java and are platform independent, allowing use on Windows, Linux and Mac OS X. The SBSI project website at http://www.sbsi.ed.ac.uk provides documentation and tutorials. Contact: stg@inf.ed.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23329415

  7. Flexible services for the support of research.

    PubMed

    Turilli, Matteo; Wallom, David; Williams, Chris; Gough, Steve; Curran, Neal; Tarrant, Richard; Bretherton, Dan; Powell, Andy; Johnson, Matt; Harmer, Terry; Wright, Peter; Gordon, John

    2013-01-28

    Cloud computing has been increasingly adopted by users and providers to promote a flexible, scalable and tailored access to computing resources. Nonetheless, the consolidation of this paradigm has uncovered some of its limitations. Initially devised by corporations with direct control over large amounts of computational resources, cloud computing is now being endorsed by organizations with limited resources or with a more articulated, less direct control over these resources. The challenge for these organizations is to leverage the benefits of cloud computing while dealing with limited and often widely distributed computing resources. This study focuses on the adoption of cloud computing by higher education institutions and addresses two main issues: flexible and on-demand access to a large amount of storage resources, and scalability across a heterogeneous set of cloud infrastructures. The proposed solutions leverage a federated approach to cloud resources in which users access multiple and largely independent cloud infrastructures through a highly customizable broker layer. This approach allows for a uniform authentication and authorization infrastructure, a fine-grained policy specification and the aggregation of accounting and monitoring. Within a loosely coupled federation of cloud infrastructures, users can access vast amount of data without copying them across cloud infrastructures and can scale their resource provisions when the local cloud resources become insufficient.

  8. Climate simulations and services on HPC, Cloud and Grid infrastructures

    NASA Astrophysics Data System (ADS)

    Cofino, Antonio S.; Blanco, Carlos; Minondo Tshuma, Antonio

    2017-04-01

    Cloud, Grid and High Performance Computing have changed the accessibility and availability of computing resources for Earth Science research communities, specially for Climate community. These paradigms are modifying the way how climate applications are being executed. By using these technologies the number, variety and complexity of experiments and resources are increasing substantially. But, although computational capacity is increasing, traditional applications and tools used by the community are not good enough to manage this large volume and variety of experiments and computing resources. In this contribution, we evaluate the challenges to run climate simulations and services on Grid, Cloud and HPC infrestructures and how to tackle them. The Grid and Cloud infrastructures provided by EGI's VOs ( esr , earth.vo.ibergrid and fedcloud.egi.eu) will be evaluated, as well as HPC resources from PRACE infrastructure and institutional clusters. To solve those challenges, solutions using DRM4G framework will be shown. DRM4G provides a good framework to manage big volume and variety of computing resources for climate experiments. This work has been supported by the Spanish National R&D Plan under projects WRF4G (CGL2011-28864), INSIGNIA (CGL2016-79210-R) and MULTI-SDM (CGL2015-66583-R) ; the IS-ENES2 project from the 7FP of the European Commission (grant agreement no. 312979); the European Regional Development Fund—ERDF and the Programa de Personal Investigador en Formación Predoctoral from Universidad de Cantabria and Government of Cantabria.

  9. Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simulation

    NASA Technical Reports Server (NTRS)

    Salmon, Ellen; Duffy, Daniel; Spear, Carrie; Sinno, Scott; Vaughan, Garrison; Bowen, Michael

    2018-01-01

    This talk will describe recent developments at the NASA Center for Climate Simulation, which is funded by NASAs Science Mission Directorate, and supports the specialized data storage and computational needs of weather, ocean, and climate researchers, as well as astrophysicists, heliophysicists, and planetary scientists. To meet requirements for higher-resolution, higher-fidelity simulations, the NCCS augments its High Performance Computing (HPC) and storage retrieval environment. As the petabytes of model and observational data grow, the NCCS is broadening data services offerings and deploying and expanding virtualization resources for high performance analytics.

  10. An adaptive process-based cloud infrastructure for space situational awareness applications

    NASA Astrophysics Data System (ADS)

    Liu, Bingwei; Chen, Yu; Shen, Dan; Chen, Genshe; Pham, Khanh; Blasch, Erik; Rubin, Bruce

    2014-06-01

    Space situational awareness (SSA) and defense space control capabilities are top priorities for groups that own or operate man-made spacecraft. Also, with the growing amount of space debris, there is an increase in demand for contextual understanding that necessitates the capability of collecting and processing a vast amount sensor data. Cloud computing, which features scalable and flexible storage and computing services, has been recognized as an ideal candidate that can meet the large data contextual challenges as needed by SSA. Cloud computing consists of physical service providers and middleware virtual machines together with infrastructure, platform, and software as service (IaaS, PaaS, SaaS) models. However, the typical Virtual Machine (VM) abstraction is on a per operating systems basis, which is at too low-level and limits the flexibility of a mission application architecture. In responding to this technical challenge, a novel adaptive process based cloud infrastructure for SSA applications is proposed in this paper. In addition, the details for the design rationale and a prototype is further examined. The SSA Cloud (SSAC) conceptual capability will potentially support space situation monitoring and tracking, object identification, and threat assessment. Lastly, the benefits of a more granular and flexible cloud computing resources allocation are illustrated for data processing and implementation considerations within a representative SSA system environment. We show that the container-based virtualization performs better than hypervisor-based virtualization technology in an SSA scenario.

  11. Implementing Computer-Aided Instruction in Distance Education: An Infrastructure. RR/89-06.

    ERIC Educational Resources Information Center

    Kotze, Paula

    The infrastructure required for the implementation of computer aided instruction is described with particular reference to the distance education environment at the University of South Africa. A review of the state of the art of online distance education in the United States and Europe is followed by an outline of the proposed infrastructure for…

  12. Data Center Consolidation: A Step towards Infrastructure Clouds

    NASA Astrophysics Data System (ADS)

    Winter, Markus

    Application service providers face enormous challenges and rising costs in managing and operating a growing number of heterogeneous system and computing landscapes. Limitations of traditional computing environments force IT decision-makers to reorganize computing resources within the data center, as continuous growth leads to an inefficient utilization of the underlying hardware infrastructure. This paper discusses a way for infrastructure providers to improve data center operations based on the findings of a case study on resource utilization of very large business applications and presents an outlook beyond server consolidation endeavors, transforming corporate data centers into compute clouds.

  13. A proto-Data Processing Center for LISA

    NASA Astrophysics Data System (ADS)

    Cavet, Cécile; Petiteau, Antoine; Le Jeune, Maude; Plagnol, Eric; Marin-Martholaz, Etienne; Bayle, Jean-Baptiste

    2017-05-01

    The LISA project preparation requires to study and define a new data analysis framework, capable of dealing with highly heterogeneous CPU needs and of exploiting the emergent information technologies. In this context, a prototype of the mission’s Data Processing Center (DPC) has been initiated. The DPC is designed to efficiently manage computing constraints and to offer a common infrastructure where the whole collaboration can contribute to development work. Several tools such as continuous integration (CI) have already been delivered to the collaboration and are presently used for simulations and performance studies. This article presents the progress made regarding this collaborative environment and discusses also the possible next steps towards an on-demand computing infrastructure. This activity is supported by CNES as part of the French contribution to LISA.

  14. The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences.

    PubMed

    Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker

    2016-01-01

    The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant's platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.

  15. Network and computing infrastructure for scientific applications in Georgia

    NASA Astrophysics Data System (ADS)

    Kvatadze, R.; Modebadze, Z.

    2016-09-01

    Status of network and computing infrastructure and available services for research and education community of Georgia are presented. Research and Educational Networking Association - GRENA provides the following network services: Internet connectivity, network services, cyber security, technical support, etc. Computing resources used by the research teams are located at GRENA and at major state universities. GE-01-GRENA site is included in European Grid infrastructure. Paper also contains information about programs of Learning Center and research and development projects in which GRENA is participating.

  16. Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment.

    PubMed

    Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Abdulhamid, Shafi'i Muhammad; Usman, Mohammed Joda

    2017-01-01

    Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing.

  17. Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment

    PubMed Central

    Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Usman, Mohammed Joda

    2017-01-01

    Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing. PMID:28467505

  18. National research and education network

    NASA Technical Reports Server (NTRS)

    Villasenor, Tony

    1991-01-01

    Some goals of this network are as follows: Extend U.S. technological leadership in high performance computing and computer communications; Provide wide dissemination and application of the technologies both to the speed and the pace of innovation and to serve the national economy, national security, education, and the global environment; and Spur gains in the U.S. productivity and industrial competitiveness by making high performance computing and networking technologies an integral part of the design and production process. Strategies for achieving these goals are as follows: Support solutions to important scientific and technical challenges through a vigorous R and D effort; Reduce the uncertainties to industry for R and D and use of this technology through increased cooperation between government, industry, and universities and by the continued use of government and government funded facilities as a prototype user for early commercial HPCC products; and Support underlying research, network, and computational infrastructures on which U.S. high performance computing technology is based.

  19. Real-time contaminant sensing and control in civil infrastructure systems

    NASA Astrophysics Data System (ADS)

    Rimer, Sara; Katopodes, Nikolaos

    2014-11-01

    A laboratory-scale prototype has been designed and implemented to test the feasibility of real-time contaminant sensing and control in civil infrastructure systems. A blower wind tunnel is the basis of the prototype design, with propylene glycol smoke as the ``contaminant.'' A camera sensor and compressed-air vacuum nozzle system is set up at the test section portion of the prototype to visually sense and then control the contaminant; a real-time controller is programmed to read in data from the camera sensor and administer pressure to regulators controlling the compressed air operating the vacuum nozzles. A computational fluid dynamics model is being integrated in with this prototype to inform the correct pressure to supply to the regulators in order to optimally control the contaminant's removal from the prototype. The performance of the prototype has been evaluated against the computational fluid dynamics model and is discussed in this presentation. Furthermore, the initial performance of the sensor-control system implemented in the test section of the prototype is discussed. NSF-CMMI 0856438.

  20. A new algorithm for grid-based hydrologic analysis by incorporating stormwater infrastructure

    NASA Astrophysics Data System (ADS)

    Choi, Yosoon; Yi, Huiuk; Park, Hyeong-Dong

    2011-08-01

    We developed a new algorithm, the Adaptive Stormwater Infrastructure (ASI) algorithm, to incorporate ancillary data sets related to stormwater infrastructure into the grid-based hydrologic analysis. The algorithm simultaneously considers the effects of the surface stormwater collector network (e.g., diversions, roadside ditches, and canals) and underground stormwater conveyance systems (e.g., waterway tunnels, collector pipes, and culverts). The surface drainage flows controlled by the surface runoff collector network are superimposed onto the flow directions derived from a DEM. After examining the connections between inlets and outfalls in the underground stormwater conveyance system, the flow accumulation and delineation of watersheds are calculated based on recursive computations. Application of the algorithm to the Sangdong tailings dam in Korea revealed superior performance to that of a conventional D8 single-flow algorithm in terms of providing reasonable hydrologic information on watersheds with stormwater infrastructure.

  1. Public-Private Partnership: Joint recommendations to improve downloads of large Earth observation data

    NASA Astrophysics Data System (ADS)

    Ramachandran, R.; Murphy, K. J.; Baynes, K.; Lynnes, C.

    2016-12-01

    With the volume of Earth observation data expanding rapidly, cloud computing is quickly changing the way Earth observation data is processed, analyzed, and visualized. The cloud infrastructure provides the flexibility to scale up to large volumes of data and handle high velocity data streams efficiently. Having freely available Earth observation data collocated on a cloud infrastructure creates opportunities for innovation and value-added data re-use in ways unforeseen by the original data provider. These innovations spur new industries and applications and spawn new scientific pathways that were previously limited due to data volume and computational infrastructure issues. NASA, in collaboration with Amazon, Google, and Microsoft, have jointly developed a set of recommendations to enable efficient transfer of Earth observation data from existing data systems to a cloud computing infrastructure. The purpose of these recommendations is to provide guidelines against which all data providers can evaluate existing data systems and be used to improve any issues uncovered to enable efficient search, access, and use of large volumes of data. Additionally, these guidelines ensure that all cloud providers utilize a common methodology for bulk-downloading data from data providers thus preventing the data providers from building custom capabilities to meet the needs of individual cloud providers. The intent is to share these recommendations with other Federal agencies and organizations that serve Earth observation to enable efficient search, access, and use of large volumes of data. Additionally, the adoption of these recommendations will benefit data users interested in moving large volumes of data from data systems to any other location. These data users include the cloud providers, cloud users such as scientists, and other users working in a high performance computing environment who need to move large volumes of data.

  2. Robust, Optimal Water Infrastructure Planning Under Deep Uncertainty Using Metamodels

    NASA Astrophysics Data System (ADS)

    Maier, H. R.; Beh, E. H. Y.; Zheng, F.; Dandy, G. C.; Kapelan, Z.

    2015-12-01

    Optimal long-term planning plays an important role in many water infrastructure problems. However, this task is complicated by deep uncertainty about future conditions, such as the impact of population dynamics and climate change. One way to deal with this uncertainty is by means of robustness, which aims to ensure that water infrastructure performs adequately under a range of plausible future conditions. However, as robustness calculations require computationally expensive system models to be run for a large number of scenarios, it is generally computationally intractable to include robustness as an objective in the development of optimal long-term infrastructure plans. In order to overcome this shortcoming, an approach is developed that uses metamodels instead of computationally expensive simulation models in robustness calculations. The approach is demonstrated for the optimal sequencing of water supply augmentation options for the southern portion of the water supply for Adelaide, South Australia. A 100-year planning horizon is subdivided into ten equal decision stages for the purpose of sequencing various water supply augmentation options, including desalination, stormwater harvesting and household rainwater tanks. The objectives include the minimization of average present value of supply augmentation costs, the minimization of average present value of greenhouse gas emissions and the maximization of supply robustness. The uncertain variables are rainfall, per capita water consumption and population. Decision variables are the implementation stages of the different water supply augmentation options. Artificial neural networks are used as metamodels to enable all objectives to be calculated in a computationally efficient manner at each of the decision stages. The results illustrate the importance of identifying optimal staged solutions to ensure robustness and sustainability of water supply into an uncertain long-term future.

  3. Cloudbursting - Solving the 3-body problem

    NASA Astrophysics Data System (ADS)

    Chang, G.; Heistand, S.; Vakhnin, A.; Huang, T.; Zimdars, P.; Hua, H.; Hood, R.; Koenig, J.; Mehrotra, P.; Little, M. M.; Law, E.

    2014-12-01

    Many science projects in the future will be accomplished through collaboration among 2 or more NASA centers along with, potentially, external scientists. Science teams will be composed of more geographically dispersed individuals and groups. However, the current computing environment does not make this easy and seamless. By being able to share computing resources among members of a multi-center team working on a science/ engineering project, limited pre-competition funds could be more efficiently applied and technical work could be conducted more effectively with less time spent moving data or waiting for computing resources to free up. Based on the work from an NASA CIO IT Labs task, this presentation will highlight our prototype work in identifying the feasibility and identify the obstacles, both technical and management, to perform "Cloudbursting" among private clouds located at three different centers. We will demonstrate the use of private cloud computing infrastructure at the Jet Propulsion Laboratory, Langley Research Center, and Ames Research Center to provide elastic computation to each other to perform parallel Earth Science data imaging. We leverage elastic load balancing and auto-scaling features at each data center so that each location can independently define how many resources to allocate to a particular job that was "bursted" from another data center and demonstrate that compute capacity scales up and down with the job. We will also discuss future work in the area, which could include the use of cloud infrastructure from different cloud framework providers as well as other cloud service providers.

  4. Towards a Multi-Mission, Airborne Science Data System Environment

    NASA Astrophysics Data System (ADS)

    Crichton, D. J.; Hardman, S.; Law, E.; Freeborn, D.; Kay-Im, E.; Lau, G.; Oswald, J.

    2011-12-01

    NASA earth science instruments are increasingly relying on airborne missions. However, traditionally, there has been limited common infrastructure support available to principal investigators in the area of science data systems. As a result, each investigator has been required to develop their own computing infrastructures for the science data system. Typically there is little software reuse and many projects lack sufficient resources to provide a robust infrastructure to capture, process, distribute and archive the observations acquired from airborne flights. At NASA's Jet Propulsion Laboratory (JPL), we have been developing a multi-mission data system infrastructure for airborne instruments called the Airborne Cloud Computing Environment (ACCE). ACCE encompasses the end-to-end lifecycle covering planning, provisioning of data system capabilities, and support for scientific analysis in order to improve the quality, cost effectiveness, and capabilities to enable new scientific discovery and research in earth observation. This includes improving data system interoperability across each instrument. A principal characteristic is being able to provide an agile infrastructure that is architected to allow for a variety of configurations of the infrastructure from locally installed compute and storage services to provisioning those services via the "cloud" from cloud computer vendors such as Amazon.com. Investigators often have different needs that require a flexible configuration. The data system infrastructure is built on the Apache's Object Oriented Data Technology (OODT) suite of components which has been used for a number of spaceborne missions and provides a rich set of open source software components and services for constructing science processing and data management systems. In 2010, a partnership was formed between the ACCE team and the Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE) mission to support the data processing and data management needs. A principal goal is to provide support for the Fourier Transform Spectrometer (FTS) instrument which will produce over 700,000 soundings over the life of their three-year mission. The cost to purchase and operate a cluster-based system in order to generate Level 2 Full Physics products from this data was prohibitive. Through an evaluation of cloud computing solutions, Amazon's Elastic Compute Cloud (EC2) was selected for the CARVE deployment. As the ACCE infrastructure is developed and extended to form an infrastructure for airborne missions, the experience of working with CARVE has provided a number of lessons learned and has proven to be important in reinforcing the unique aspects of airborne missions and the importance of the ACCE infrastructure in developing a cost effective, flexible multi-mission capability that leverages emerging capabilities in cloud computing, workflow management, and distributed computing.

  5. High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away

    NASA Astrophysics Data System (ADS)

    Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.

    2012-09-01

    By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.

  6. A parallel-processing approach to computing for the geographic sciences

    USGS Publications Warehouse

    Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Haga, Jim; Maddox, Brian; Feller, Mark

    2001-01-01

    The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting research into various areas, such as advanced computer architecture, algorithms to meet the processing needs for real-time image and data processing, the creation of custom datasets from seamless source data, rapid turn-around of products for emergency response, and support for computationally intense spatial and temporal modeling.

  7. A hybrid computational strategy to address WGS variant analysis in >5000 samples.

    PubMed

    Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan; Carroll, Andrew; Gibbs, Richard; Boerwinkle, Eric; Venkata, Manjunath Gorentla; Yu, Fuli

    2016-09-10

    The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.

  8. Final Report Scalable Analysis Methods and In Situ Infrastructure for Extreme Scale Knowledge Discovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    O'Leary, Patrick

    The primary challenge motivating this project is the widening gap between the ability to compute information and to store it for subsequent analysis. This gap adversely impacts science code teams, who can perform analysis only on a small fraction of the data they calculate, resulting in the substantial likelihood of lost or missed science, when results are computed but not analyzed. Our approach is to perform as much analysis or visualization processing on data while it is still resident in memory, which is known as in situ processing. The idea in situ processing was not new at the time ofmore » the start of this effort in 2014, but efforts in that space were largely ad hoc, and there was no concerted effort within the research community that aimed to foster production-quality software tools suitable for use by Department of Energy (DOE) science projects. Our objective was to produce and enable the use of production-quality in situ methods and infrastructure, at scale, on DOE high-performance computing (HPC) facilities, though we expected to have an impact beyond DOE due to the widespread nature of the challenges, which affect virtually all large-scale computational science efforts. To achieve this objective, we engaged in software technology research and development (R&D), in close partnerships with DOE science code teams, to produce software technologies that were shown to run efficiently at scale on DOE HPC platforms.« less

  9. Infrastructure Systems for Advanced Computing in E-science applications

    NASA Astrophysics Data System (ADS)

    Terzo, Olivier

    2013-04-01

    In the e-science field are growing needs for having computing infrastructure more dynamic and customizable with a model of use "on demand" that follow the exact request in term of resources and storage capacities. The integration of grid and cloud infrastructure solutions allows us to offer services that can adapt the availability in terms of up scaling and downscaling resources. The main challenges for e-sciences domains will on implement infrastructure solutions for scientific computing that allow to adapt dynamically the demands of computing resources with a strong emphasis on optimizing the use of computing resources for reducing costs of investments. Instrumentation, data volumes, algorithms, analysis contribute to increase the complexity for applications who require high processing power and storage for a limited time and often exceeds the computational resources that equip the majority of laboratories, research Unit in an organization. Very often it is necessary to adapt or even tweak rethink tools, algorithms, and consolidate existing applications through a phase of reverse engineering in order to adapt them to a deployment on Cloud infrastructure. For example, in areas such as rainfall monitoring, meteorological analysis, Hydrometeorology, Climatology Bioinformatics Next Generation Sequencing, Computational Electromagnetic, Radio occultation, the complexity of the analysis raises several issues such as the processing time, the scheduling of tasks of processing, storage of results, a multi users environment. For these reasons, it is necessary to rethink the writing model of E-Science applications in order to be already adapted to exploit the potentiality of cloud computing services through the uses of IaaS, PaaS and SaaS layer. An other important focus is on create/use hybrid infrastructure typically a federation between Private and public cloud, in fact in this way when all resources owned by the organization are all used it will be easy with a federate cloud infrastructure to add some additional resources form the Public cloud for following the needs in term of computational and storage resources and release them where process are finished. Following the hybrid model, the scheduling approach is important for managing both cloud models. Thanks to this model infrastructure every time resources are available for additional request in term of IT capacities that can used "on demand" for a limited time without having to proceed to purchase additional servers.

  10. Multi-objective optimization of GENIE Earth system models.

    PubMed

    Price, Andrew R; Myerscough, Richard J; Voutchkov, Ivan I; Marsh, Robert; Cox, Simon J

    2009-07-13

    The tuning of parameters in climate models is essential to provide reliable long-term forecasts of Earth system behaviour. We apply a multi-objective optimization algorithm to the problem of parameter estimation in climate models. This optimization process involves the iterative evaluation of response surface models (RSMs), followed by the execution of multiple Earth system simulations. These computations require an infrastructure that provides high-performance computing for building and searching the RSMs and high-throughput computing for the concurrent evaluation of a large number of models. Grid computing technology is therefore essential to make this algorithm practical for members of the GENIE project.

  11. Cloud Infrastructure & Applications - CloudIA

    NASA Astrophysics Data System (ADS)

    Sulistio, Anthony; Reich, Christoph; Doelitzscher, Frank

    The idea behind Cloud Computing is to deliver Infrastructure-as-a-Services and Software-as-a-Service over the Internet on an easy pay-per-use business model. To harness the potentials of Cloud Computing for e-Learning and research purposes, and to small- and medium-sized enterprises, the Hochschule Furtwangen University establishes a new project, called Cloud Infrastructure & Applications (CloudIA). The CloudIA project is a market-oriented cloud infrastructure that leverages different virtualization technologies, by supporting Service-Level Agreements for various service offerings. This paper describes the CloudIA project in details and mentions our early experiences in building a private cloud using an existing infrastructure.

  12. Quantum Accelerators for High-performance Computing Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Humble, Travis S.; Britt, Keith A.; Mohiyaddin, Fahd A.

    We define some of the programming and system-level challenges facing the application of quantum processing to high-performance computing. Alongside barriers to physical integration, prominent differences in the execution of quantum and conventional programs challenges the intersection of these computational models. Following a brief overview of the state of the art, we discuss recent advances in programming and execution models for hybrid quantum-classical computing. We discuss a novel quantum-accelerator framework that uses specialized kernels to offload select workloads while integrating with existing computing infrastructure. We elaborate on the role of the host operating system to manage these unique accelerator resources, themore » prospects for deploying quantum modules, and the requirements placed on the language hierarchy connecting these different system components. We draw on recent advances in the modeling and simulation of quantum computing systems with the development of architectures for hybrid high-performance computing systems and the realization of software stacks for controlling quantum devices. Finally, we present simulation results that describe the expected system-level behavior of high-performance computing systems composed from compute nodes with quantum processing units. We describe performance for these hybrid systems in terms of time-to-solution, accuracy, and energy consumption, and we use simple application examples to estimate the performance advantage of quantum acceleration.« less

  13. Lowering the barriers to computational modeling of Earth's surface: coupling Jupyter Notebooks with Landlab, HydroShare, and CyberGIS for research and education.

    NASA Astrophysics Data System (ADS)

    Bandaragoda, C.; Castronova, A. M.; Phuong, J.; Istanbulluoglu, E.; Strauch, R. L.; Nudurupati, S. S.; Tarboton, D. G.; Wang, S. W.; Yin, D.; Barnhart, K. R.; Tucker, G. E.; Hutton, E.; Hobley, D. E. J.; Gasparini, N. M.; Adams, J. M.

    2017-12-01

    The ability to test hypotheses about hydrology, geomorphology and atmospheric processes is invaluable to research in the era of big data. Although community resources are available, there remain significant educational, logistical and time investment barriers to their use. Knowledge infrastructure is an emerging intellectual framework to understand how people are creating, sharing and distributing knowledge - which has been dramatically transformed by Internet technologies. In addition to the technical and social components in a cyberinfrastructure system, knowledge infrastructure considers educational, institutional, and open source governance components required to advance knowledge. We are designing an infrastructure environment that lowers common barriers to reproducing modeling experiments for earth surface investigation. Landlab is an open-source modeling toolkit for building, coupling, and exploring two-dimensional numerical models. HydroShare is an online collaborative environment for sharing hydrologic data and models. CyberGIS-Jupyter is an innovative cyberGIS framework for achieving data-intensive, reproducible, and scalable geospatial analytics using the Jupyter Notebook based on ROGER - the first cyberGIS supercomputer, so that models that can be elastically reproduced through cloud computing approaches. Our team of geomorphologists, hydrologists, and computer geoscientists has created a new infrastructure environment that combines these three pieces of software to enable knowledge discovery. Through this novel integration, any user can interactively execute and explore their shared data and model resources. Landlab on HydroShare with CyberGIS-Jupyter supports the modeling continuum from fully developed modelling applications, prototyping new science tools, hands on research demonstrations for training workshops, and classroom applications. Computational geospatial models based on big data and high performance computing can now be more efficiently developed, improved, scaled, and seamlessly reproduced among multidisciplinary users, thereby expanding the active learning curriculum and research opportunities for students in earth surface modeling and informatics.

  14. The performance of low-cost commercial cloud computing as an alternative in computational chemistry.

    PubMed

    Thackston, Russell; Fortenberry, Ryan C

    2015-05-05

    The growth of commercial cloud computing (CCC) as a viable means of computational infrastructure is largely unexplored for the purposes of quantum chemistry. In this work, the PSI4 suite of computational chemistry programs is installed on five different types of Amazon World Services CCC platforms. The performance for a set of electronically excited state single-point energies is compared between these CCC platforms and typical, "in-house" physical machines. Further considerations are made for the number of cores or virtual CPUs (vCPUs, for the CCC platforms), but no considerations are made for full parallelization of the program (even though parallelization of the BLAS library is implemented), complete high-performance computing cluster utilization, or steal time. Even with this most pessimistic view of the computations, CCC resources are shown to be more cost effective for significant numbers of typical quantum chemistry computations. Large numbers of large computations are still best utilized by more traditional means, but smaller-scale research may be more effectively undertaken through CCC services. © 2015 Wiley Periodicals, Inc.

  15. A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vineyard, Craig Michael; Verzi, Stephen Joseph

    As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilizemore » memory.« less

  16. Cloud access to interoperable IVOA-compliant VOSpace storage

    NASA Astrophysics Data System (ADS)

    Bertocco, S.; Dowler, P.; Gaudet, S.; Major, B.; Pasian, F.; Taffoni, G.

    2018-07-01

    Handling, processing and archiving the huge amount of data produced by the new generation of experiments and instruments in Astronomy and Astrophysics are among the more exciting challenges to address in designing the future data management infrastructures and computing services. We investigated the feasibility of a data management and computation infrastructure, available world-wide, with the aim of merging the FAIR data management provided by IVOA standards with the efficiency and reliability of a cloud approach. Our work involved the Canadian Advanced Network for Astronomy Research (CANFAR) infrastructure and the European EGI federated cloud (EFC). We designed and deployed a pilot data management and computation infrastructure that provides IVOA-compliant VOSpace storage resources and wide access to interoperable federated clouds. In this paper, we detail the main user requirements covered, the technical choices and the implemented solutions and we describe the resulting Hybrid cloud Worldwide infrastructure, its benefits and limitations.

  17. The EPOS Vision for the Open Science Cloud

    NASA Astrophysics Data System (ADS)

    Jeffery, Keith; Harrison, Matt; Cocco, Massimo

    2016-04-01

    Cloud computing offers dynamic elastic scalability for data processing on demand. For much research activity, demand for computing is uneven over time and so CLOUD computing offers both cost-effectiveness and capacity advantages. However, as reported repeatedly by the EC Cloud Expert Group, there are barriers to the uptake of Cloud Computing: (1) security and privacy; (2) interoperability (avoidance of lock-in); (3) lack of appropriate systems development environments for application programmers to characterise their applications to allow CLOUD middleware to optimize their deployment and execution. From CERN, the Helix-Nebula group has proposed the architecture for the European Open Science Cloud. They are discussing with other e-Infrastructure groups such as EGI (GRIDs), EUDAT (data curation), AARC (network authentication and authorisation) and also with the EIROFORUM group of 'international treaty' RIs (Research Infrastructures) and the ESFRI (European Strategic Forum for Research Infrastructures) RIs including EPOS. Many of these RIs are either e-RIs (electronic-RIs) or have an e-RI interface for access and use. The EPOS architecture is centred on a portal: ICS (Integrated Core Services). The architectural design already allows for access to e-RIs (which may include any or all of data, software, users and resources such as computers or instruments). Those within any one domain (subject area) of EPOS are considered within the TCS (Thematic Core Services). Those outside, or available across multiple domains of EPOS, are ICS-d (Integrated Core Services-Distributed) since the intention is that they will be used by any or all of the TCS via the ICS. Another such service type is CES (Computational Earth Science); effectively an ICS-d specializing in high performance computation, analytics, simulation or visualization offered by a TCS for others to use. Already discussions are underway between EPOS and EGI, EUDAT, AARC and Helix-Nebula for those offerings to be considered as ICS-ds by EPOS.. Provision of access to ICS-Ds from ICS-C concerns several aspects: (a) Technical : it may be more or less difficult to connect and pass from ICS-C to the ICS-d/ CES the 'package' (probably a virtual machine) of data and software; (b) Security/privacy : including passing personal information e.g. related to AAAI (Authentication, authorization, accounting Infrastructure); (c) financial and legal : such as payment, licence conditions; Appropriate interfaces from ICS-C to ICS-d are being designed to accommodate these aspects. The Open Science Cloud is timely because it provides a framework to discuss governance and sustainability for computational resource provision as well as an effective interpretation of federated approach to HPC(High Performance Computing) -HTC (High Throughput Computing). It will be a unique opportunity to share and adopt procurement policies to provide access to computational resources for RIs. The current state of discussions and expected roadmap for the EPOS-Open Science Cloud relationship are presented.

  18. The Petascale Data Storage Institute

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gibson, Garth; Long, Darrell; Honeyman, Peter

    2013-07-01

    Petascale computing infrastructures for scientific discovery make petascale demands on information storage capacity, performance, concurrency, reliability, availability, and manageability.The Petascale Data Storage Institute focuses on the data storage problems found in petascale scientific computing environments, with special attention to community issues such as interoperability, community buy-in, and shared tools.The Petascale Data Storage Institute is a collaboration between researchers at Carnegie Mellon University, National Energy Research Scientific Computing Center, Pacific Northwest National Laboratory, Oak Ridge National Laboratory, Sandia National Laboratory, Los Alamos National Laboratory, University of Michigan, and the University of California at Santa Cruz.

  19. Processing LHC data in the UK

    PubMed Central

    Colling, D.; Britton, D.; Gordon, J.; Lloyd, S.; Doyle, A.; Gronbech, P.; Coles, J.; Sansum, A.; Patrick, G.; Jones, R.; Middleton, R.; Kelsey, D.; Cass, A.; Geddes, N.; Clark, P.; Barnby, L.

    2013-01-01

    The Large Hadron Collider (LHC) is one of the greatest scientific endeavours to date. The construction of the collider itself and the experiments that collect data from it represent a huge investment, both financially and in terms of human effort, in our hope to understand the way the Universe works at a deeper level. Yet the volumes of data produced are so large that they cannot be analysed at any single computing centre. Instead, the experiments have all adopted distributed computing models based on the LHC Computing Grid. Without the correct functioning of this grid infrastructure the experiments would not be able to understand the data that they have collected. Within the UK, the Grid infrastructure needed by the experiments is provided by the GridPP project. We report on the operations, performance and contributions made to the experiments by the GridPP project during the years of 2010 and 2011—the first two significant years of the running of the LHC. PMID:23230163

  20. Cloud Infrastructures for In Silico Drug Discovery: Economic and Practical Aspects

    PubMed Central

    Clematis, Andrea; Quarati, Alfonso; Cesini, Daniele; Milanesi, Luciano; Merelli, Ivan

    2013-01-01

    Cloud computing opens new perspectives for small-medium biotechnology laboratories that need to perform bioinformatics analysis in a flexible and effective way. This seems particularly true for hybrid clouds that couple the scalability offered by general-purpose public clouds with the greater control and ad hoc customizations supplied by the private ones. A hybrid cloud broker, acting as an intermediary between users and public providers, can support customers in the selection of the most suitable offers, optionally adding the provisioning of dedicated services with higher levels of quality. This paper analyses some economic and practical aspects of exploiting cloud computing in a real research scenario for the in silico drug discovery in terms of requirements, costs, and computational load based on the number of expected users. In particular, our work is aimed at supporting both the researchers and the cloud broker delivering an IaaS cloud infrastructure for biotechnology laboratories exposing different levels of nonfunctional requirements. PMID:24106693

  1. Validation of Storm Water Management Model Storm Control Measures Modules

    NASA Astrophysics Data System (ADS)

    Simon, M. A.; Platz, M. C.

    2017-12-01

    EPA's Storm Water Management Model (SWMM) is a computational code heavily relied upon by industry for the simulation of wastewater and stormwater infrastructure performance. Many municipalities are relying on SWMM results to design multi-billion-dollar, multi-decade infrastructure upgrades. Since the 1970's, EPA and others have developed five major releases, the most recent ones containing storm control measures modules for green infrastructure. The main objective of this study was to quantify the accuracy with which SWMM v5.1.10 simulates the hydrologic activity of previously monitored low impact developments. Model performance was evaluated with a mathematical comparison of outflow hydrographs and total outflow volumes, using empirical data and a multi-event, multi-objective calibration method. The calibration methodology utilized PEST++ Version 3, a parameter estimation tool, which aided in the selection of unmeasured hydrologic parameters. From the validation study and sensitivity analysis, several model improvements were identified to advance SWMM LID Module performance for permeable pavements, infiltration units and green roofs, and these were performed and reported herein. Overall, it was determined that SWMM can successfully simulate low impact development controls given accurate model confirmation, parameter measurement, and model calibration.

  2. Future Naval Use of COTS Networking Infrastructure

    DTIC Science & Technology

    2009-07-01

    user to benefit from Google’s vast databases and computational resources. Obviously, the ability to harness the full power of the Cloud could be... Computing Impact Findings Action Items Take-Aways Appendices: Pages 54-68 A. Terms of Reference Document B. Sample Definitions of Cloud ...and definition of Cloud Computing . While Cloud Computing is developing in many variations – including Infrastructure as a Service (IaaS), Platform as

  3. Model development, testing and experimentation in a CyberWorkstation for Brain-Machine Interface research.

    PubMed

    Rattanatamrong, Prapaporn; Matsunaga, Andrea; Raiturkar, Pooja; Mesa, Diego; Zhao, Ming; Mahmoudi, Babak; Digiovanna, Jack; Principe, Jose; Figueiredo, Renato; Sanchez, Justin; Fortes, Jose

    2010-01-01

    The CyberWorkstation (CW) is an advanced cyber-infrastructure for Brain-Machine Interface (BMI) research. It allows the development, configuration and execution of BMI computational models using high-performance computing resources. The CW's concept is implemented using a software structure in which an "experiment engine" is used to coordinate all software modules needed to capture, communicate and process brain signals and motor-control commands. A generic BMI-model template, which specifies a common interface to the CW's experiment engine, and a common communication protocol enable easy addition, removal or replacement of models without disrupting system operation. This paper reviews the essential components of the CW and shows how templates can facilitate the processes of BMI model development, testing and incorporation into the CW. It also discusses the ongoing work towards making this process infrastructure independent.

  4. SPARX, a new environment for Cryo-EM image processing.

    PubMed

    Hohn, Michael; Tang, Grant; Goodyear, Grant; Baldwin, P R; Huang, Zhong; Penczek, Pawel A; Yang, Chao; Glaeser, Robert M; Adams, Paul D; Ludtke, Steven J

    2007-01-01

    SPARX (single particle analysis for resolution extension) is a new image processing environment with a particular emphasis on transmission electron microscopy (TEM) structure determination. It includes a graphical user interface that provides a complete graphical programming environment with a novel data/process-flow infrastructure, an extensive library of Python scripts that perform specific TEM-related computational tasks, and a core library of fundamental C++ image processing functions. In addition, SPARX relies on the EMAN2 library and cctbx, the open-source computational crystallography library from PHENIX. The design of the system is such that future inclusion of other image processing libraries is a straightforward task. The SPARX infrastructure intelligently handles retention of intermediate values, even those inside programming structures such as loops and function calls. SPARX and all dependencies are free for academic use and available with complete source.

  5. The open science grid

    NASA Astrophysics Data System (ADS)

    Pordes, Ruth; OSG Consortium; Petravick, Don; Kramer, Bill; Olson, Doug; Livny, Miron; Roy, Alain; Avery, Paul; Blackburn, Kent; Wenaus, Torre; Würthwein, Frank; Foster, Ian; Gardner, Rob; Wilde, Mike; Blatecky, Alan; McGee, John; Quick, Rob

    2007-07-01

    The Open Science Grid (OSG) provides a distributed facility where the Consortium members provide guaranteed and opportunistic access to shared computing and storage resources. OSG provides support for and evolution of the infrastructure through activities that cover operations, security, software, troubleshooting, addition of new capabilities, and support for existing and engagement with new communities. The OSG SciDAC-2 project provides specific activities to manage and evolve the distributed infrastructure and support it's use. The innovative aspects of the project are the maintenance and performance of a collaborative (shared & common) petascale national facility over tens of autonomous computing sites, for many hundreds of users, transferring terabytes of data a day, executing tens of thousands of jobs a day, and providing robust and usable resources for scientific groups of all types and sizes. More information can be found at the OSG web site: www.opensciencegrid.org.

  6. The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences

    PubMed Central

    Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker

    2016-01-01

    The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant’s platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses. PMID:26752627

  7. Collaborative Working Architecture for IoT-Based Applications.

    PubMed

    Mora, Higinio; Signes-Pont, María Teresa; Gil, David; Johnsson, Magnus

    2018-05-23

    The new sensing applications need enhanced computing capabilities to handle the requirements of complex and huge data processing. The Internet of Things (IoT) concept brings processing and communication features to devices. In addition, the Cloud Computing paradigm provides resources and infrastructures for performing the computations and outsourcing the work from the IoT devices. This scenario opens new opportunities for designing advanced IoT-based applications, however, there is still much research to be done to properly gear all the systems for working together. This work proposes a collaborative model and an architecture to take advantage of the available computing resources. The resulting architecture involves a novel network design with different levels which combines sensing and processing capabilities based on the Mobile Cloud Computing (MCC) paradigm. An experiment is included to demonstrate that this approach can be used in diverse real applications. The results show the flexibility of the architecture to perform complex computational tasks of advanced applications.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lingerfelt, Eric J; Endeve, Eirik; Hui, Yawei

    Improvements in scientific instrumentation allow imaging at mesoscopic to atomic length scales, many spectroscopic modes, and now--with the rise of multimodal acquisition systems and the associated processing capability--the era of multidimensional, informationally dense data sets has arrived. Technical issues in these combinatorial scientific fields are exacerbated by computational challenges best summarized as a necessity for drastic improvement in the capability to transfer, store, and analyze large volumes of data. The Bellerophon Environment for Analysis of Materials (BEAM) platform provides material scientists the capability to directly leverage the integrated computational and analytical power of High Performance Computing (HPC) to perform scalablemore » data analysis and simulation and manage uploaded data files via an intuitive, cross-platform client user interface. This framework delivers authenticated, "push-button" execution of complex user workflows that deploy data analysis algorithms and computational simulations utilizing compute-and-data cloud infrastructures and HPC environments like Titan at the Oak Ridge Leadershp Computing Facility (OLCF).« less

  9. Advances in Spatial Data Infrastructure, Acquisition, Analysis, Archiving and Dissemination

    NASA Technical Reports Server (NTRS)

    Ramapriyan, Hampapuran K.; Rochon, Gilbert L.; Duerr, Ruth; Rank, Robert; Nativi, Stefano; Stocker, Erich Franz

    2010-01-01

    The authors review recent contributions to the state-of-thescience and benign proliferation of satellite remote sensing, spatial data infrastructure, near-real-time data acquisition, analysis on high performance computing platforms, sapient archiving, multi-modal dissemination and utilization for a wide array of scientific applications. The authors also address advances in Geoinformatics and its growing ubiquity, as evidenced by its inclusion as a focus area within the American Geophysical Union (AGU), European Geosciences Union (EGU), as well as by the evolution of the IEEE Geoscience and Remote Sensing Society's (GRSS) Data Archiving and Distribution Technical Committee (DAD TC).

  10. Wide-area, real-time monitoring and visualization system

    DOEpatents

    Budhraja, Vikram S.; Dyer, James D.; Martinez Morales, Carlos A.

    2013-03-19

    A real-time performance monitoring system for monitoring an electric power grid. The electric power grid has a plurality of grid portions, each grid portion corresponding to one of a plurality of control areas. The real-time performance monitoring system includes a monitor computer for monitoring at least one of reliability metrics, generation metrics, transmission metrics, suppliers metrics, grid infrastructure security metrics, and markets metrics for the electric power grid. The data for metrics being monitored by the monitor computer are stored in a data base, and a visualization of the metrics is displayed on at least one display computer having a monitor. The at least one display computer in one said control area enables an operator to monitor the grid portion corresponding to a different said control area.

  11. Wide-area, real-time monitoring and visualization system

    DOEpatents

    Budhraja, Vikram S [Los Angeles, CA; Dyer, James D [La Mirada, CA; Martinez Morales, Carlos A [Upland, CA

    2011-11-15

    A real-time performance monitoring system for monitoring an electric power grid. The electric power grid has a plurality of grid portions, each grid portion corresponding to one of a plurality of control areas. The real-time performance monitoring system includes a monitor computer for monitoring at least one of reliability metrics, generation metrics, transmission metrics, suppliers metrics, grid infrastructure security metrics, and markets metrics for the electric power grid. The data for metrics being monitored by the monitor computer are stored in a data base, and a visualization of the metrics is displayed on at least one display computer having a monitor. The at least one display computer in one said control area enables an operator to monitor the grid portion corresponding to a different said control area.

  12. National Laboratory for Advanced Scientific Visualization at UNAM - Mexico

    NASA Astrophysics Data System (ADS)

    Manea, Marina; Constantin Manea, Vlad; Varela, Alfredo

    2016-04-01

    In 2015, the National Autonomous University of Mexico (UNAM) joined the family of Universities and Research Centers where advanced visualization and computing plays a key role to promote and advance missions in research, education, community outreach, as well as business-oriented consulting. This initiative provides access to a great variety of advanced hardware and software resources and offers a range of consulting services that spans a variety of areas related to scientific visualization, among which are: neuroanatomy, embryonic development, genome related studies, geosciences, geography, physics and mathematics related disciplines. The National Laboratory for Advanced Scientific Visualization delivers services through three main infrastructure environments: the 3D fully immersive display system Cave, the high resolution parallel visualization system Powerwall, the high resolution spherical displays Earth Simulator. The entire visualization infrastructure is interconnected to a high-performance-computing-cluster (HPCC) called ADA in honor to Ada Lovelace, considered to be the first computer programmer. The Cave is an extra large 3.6m wide room with projected images on the front, left and right, as well as floor walls. Specialized crystal eyes LCD-shutter glasses provide a strong stereo depth perception, and a variety of tracking devices allow software to track the position of a user's hand, head and wand. The Powerwall is designed to bring large amounts of complex data together through parallel computing for team interaction and collaboration. This system is composed by 24 (6x4) high-resolution ultra-thin (2 mm) bezel monitors connected to a high-performance GPU cluster. The Earth Simulator is a large (60") high-resolution spherical display used for global-scale data visualization like geophysical, meteorological, climate and ecology data. The HPCC-ADA, is a 1000+ computing core system, which offers parallel computing resources to applications that requires large quantity of memory as well as large and fast parallel storage systems. The entire system temperature is controlled by an energy and space efficient cooling solution, based on large rear door liquid cooled heat exchangers. This state-of-the-art infrastructure will boost research activities in the region, offer a powerful scientific tool for teaching at undergraduate and graduate levels, and enhance association and cooperation with business-oriented organizations.

  13. WISDOM-II: screening against multiple targets implicated in malaria using computational grid infrastructures.

    PubMed

    Kasam, Vinod; Salzemann, Jean; Botha, Marli; Dacosta, Ana; Degliesposti, Gianluca; Isea, Raul; Kim, Doman; Maass, Astrid; Kenyon, Colin; Rastelli, Giulio; Hofmann-Apitius, Martin; Breton, Vincent

    2009-05-01

    Despite continuous efforts of the international community to reduce the impact of malaria on developing countries, no significant progress has been made in the recent years and the discovery of new drugs is more than ever needed. Out of the many proteins involved in the metabolic activities of the Plasmodium parasite, some are promising targets to carry out rational drug discovery. Recent years have witnessed the emergence of grids, which are highly distributed computing infrastructures particularly well fitted for embarrassingly parallel computations like docking. In 2005, a first attempt at using grids for large-scale virtual screening focused on plasmepsins and ended up in the identification of previously unknown scaffolds, which were confirmed in vitro to be active plasmepsin inhibitors. Following this success, a second deployment took place in the fall of 2006 focussing on one well known target, dihydrofolate reductase (DHFR), and on a new promising one, glutathione-S-transferase. In silico drug design, especially vHTS is a widely and well-accepted technology in lead identification and lead optimization. This approach, therefore builds, upon the progress made in computational chemistry to achieve more accurate in silico docking and in information technology to design and operate large scale grid infrastructures. On the computational side, a sustained infrastructure has been developed: docking at large scale, using different strategies in result analysis, storing of the results on the fly into MySQL databases and application of molecular dynamics refinement are MM-PBSA and MM-GBSA rescoring. The modeling results obtained are very promising. Based on the modeling results, In vitro results are underway for all the targets against which screening is performed. The current paper describes the rational drug discovery activity at large scale, especially molecular docking using FlexX software on computational grids in finding hits against three different targets (PfGST, PfDHFR, PvDHFR (wild type and mutant forms) implicated in malaria. Grid-enabled virtual screening approach is proposed to produce focus compound libraries for other biological targets relevant to fight the infectious diseases of the developing world.

  14. Consolidation and development roadmap of the EMI middleware

    NASA Astrophysics Data System (ADS)

    Kónya, B.; Aiftimiei, C.; Cecchi, M.; Field, L.; Fuhrmann, P.; Nilsen, J. K.; White, J.

    2012-12-01

    Scientific research communities have benefited recently from the increasing availability of computing and data infrastructures with unprecedented capabilities for large scale distributed initiatives. These infrastructures are largely defined and enabled by the middleware they deploy. One of the major issues in the current usage of research infrastructures is the need to use similar but often incompatible middleware solutions. The European Middleware Initiative (EMI) is a collaboration of the major European middleware providers ARC, dCache, gLite and UNICORE. EMI aims to: deliver a consolidated set of middleware components for deployment in EGI, PRACE and other Distributed Computing Infrastructures; extend the interoperability between grids and other computing infrastructures; strengthen the reliability of the services; establish a sustainable model to maintain and evolve the middleware; fulfil the requirements of the user communities. This paper presents the consolidation and development objectives of the EMI software stack covering the last two years. The EMI development roadmap is introduced along the four technical areas of compute, data, security and infrastructure. The compute area plan focuses on consolidation of standards and agreements through a unified interface for job submission and management, a common format for accounting, the wide adoption of GLUE schema version 2.0 and the provision of a common framework for the execution of parallel jobs. The security area is working towards a unified security model and lowering the barriers to Grid usage by allowing users to gain access with their own credentials. The data area is focusing on implementing standards to ensure interoperability with other grids and industry components and to reuse already existing clients in operating systems and open source distributions. One of the highlights of the infrastructure area is the consolidation of the information system services via the creation of a common information backbone.

  15. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community

    PubMed Central

    Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J.; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius

    2016-01-01

    The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data. PMID:28785418

  16. CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community.

    PubMed

    Connor, Thomas R; Loman, Nicholas J; Thompson, Simon; Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius; Sheppard, Samuel K; Pallen, Mark J

    2016-09-01

    The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data.

  17. Minimizing Overhead for Secure Computation and Fully Homomorphic Encryption: Overhead

    DTIC Science & Technology

    2015-11-01

    many inputs. We also improved our compiler infrastructure to handle very large circuits in a more scalable way. In Jan’13, we employed the AESNI and...Amazon’s elastic compute infrastructure , and is running under a Xen hypervisor. Since we do not have direct access to the bare metal, we cannot...creating novel opportunities for compressing au- thentication overhead. It is especially compelling that existing public key infrastructures can be used

  18. Irregular Applications: Architectures & Algorithms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feo, John T.; Villa, Oreste; Tumeo, Antonino

    Irregular applications are characterized by irregular data structures, control and communication patterns. Novel irregular high performance applications which deal with large data sets and require have recently appeared. Unfortunately, current high performance systems and software infrastructures executes irregular algorithms poorly. Only coordinated efforts by end user, area specialists and computer scientists that consider both the architecture and the software stack may be able to provide solutions to the challenges of modern irregular applications.

  19. Pilot-in-the-Loop CFD Method Development

    DTIC Science & Technology

    2017-04-20

    the methods on the NAVAIR Manned Flight Simulator. Activities this period During this report period, we implemented the CRAFT CFD code on the...Penn State VLRCROE Flight simulator and performed the first Pilot-in-the-Loop PILCFD tests at Penn State using the COCOA5 clusters. The initial tests...integration of the flight simulator and Penn State computing infrastructure. Initial tests showed slower performance than real-time (3x slower than real

  20. Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers.

    PubMed

    Jordan, Jakob; Ippen, Tammo; Helias, Moritz; Kitayama, Itaru; Sato, Mitsuhisa; Igarashi, Jun; Diesmann, Markus; Kunkel, Susanne

    2018-01-01

    State-of-the-art software tools for neuronal network simulations scale to the largest computing systems available today and enable investigations of large-scale networks of up to 10 % of the human cortex at a resolution of individual neurons and synapses. Due to an upper limit on the number of incoming connections of a single neuron, network connectivity becomes extremely sparse at this scale. To manage computational costs, simulation software ultimately targeting the brain scale needs to fully exploit this sparsity. Here we present a two-tier connection infrastructure and a framework for directed communication among compute nodes accounting for the sparsity of brain-scale networks. We demonstrate the feasibility of this approach by implementing the technology in the NEST simulation code and we investigate its performance in different scaling scenarios of typical network simulations. Our results show that the new data structures and communication scheme prepare the simulation kernel for post-petascale high-performance computing facilities without sacrificing performance in smaller systems.

  1. Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers

    PubMed Central

    Jordan, Jakob; Ippen, Tammo; Helias, Moritz; Kitayama, Itaru; Sato, Mitsuhisa; Igarashi, Jun; Diesmann, Markus; Kunkel, Susanne

    2018-01-01

    State-of-the-art software tools for neuronal network simulations scale to the largest computing systems available today and enable investigations of large-scale networks of up to 10 % of the human cortex at a resolution of individual neurons and synapses. Due to an upper limit on the number of incoming connections of a single neuron, network connectivity becomes extremely sparse at this scale. To manage computational costs, simulation software ultimately targeting the brain scale needs to fully exploit this sparsity. Here we present a two-tier connection infrastructure and a framework for directed communication among compute nodes accounting for the sparsity of brain-scale networks. We demonstrate the feasibility of this approach by implementing the technology in the NEST simulation code and we investigate its performance in different scaling scenarios of typical network simulations. Our results show that the new data structures and communication scheme prepare the simulation kernel for post-petascale high-performance computing facilities without sacrificing performance in smaller systems. PMID:29503613

  2. 75 FR 70899 - Submission for OMB Review; Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-19

    ... submit to the Office of Management and Budget (OMB) for clearance the following proposal for collection... Annual Burden Hours: 2,952. Public Computer Center Reports (Quarterly and Annually) Number of Respondents... specific to Infrastructure and Comprehensive Community Infrastructure, Public Computer Center, and...

  3. Early experiences in developing and managing the neuroscience gateway.

    PubMed

    Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas T

    2015-02-01

    The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway.

  4. Early experiences in developing and managing the neuroscience gateway

    PubMed Central

    Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas. T.

    2015-01-01

    SUMMARY The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway. PMID:26523124

  5. Integrating Reconfigurable Hardware-Based Grid for High Performance Computing

    PubMed Central

    Dondo Gazzano, Julio; Sanchez Molina, Francisco; Rincon, Fernando; López, Juan Carlos

    2015-01-01

    FPGAs have shown several characteristics that make them very attractive for high performance computing (HPC). The impressive speed-up factors that they are able to achieve, the reduced power consumption, and the easiness and flexibility of the design process with fast iterations between consecutive versions are examples of benefits obtained with their use. However, there are still some difficulties when using reconfigurable platforms as accelerator that need to be addressed: the need of an in-depth application study to identify potential acceleration, the lack of tools for the deployment of computational problems in distributed hardware platforms, and the low portability of components, among others. This work proposes a complete grid infrastructure for distributed high performance computing based on dynamically reconfigurable FPGAs. Besides, a set of services designed to facilitate the application deployment is described. An example application and a comparison with other hardware and software implementations are shown. Experimental results show that the proposed architecture offers encouraging advantages for deployment of high performance distributed applications simplifying development process. PMID:25874241

  6. Grids and clouds in the Czech NGI

    NASA Astrophysics Data System (ADS)

    Kundrát, Jan; Adam, Martin; Adamová, Dagmar; Chudoba, Jiří; Kouba, Tomáš; Lokajíček, Miloš; Mikula, Alexandr; Říkal, Václav; Švec, Jan; Vohnout, Rudolf

    2016-09-01

    There are several infrastructure operators within the Czech Republic NGI (National Grid Initiative) which provide users with access to high-performance computing facilities over a grid and cloud interface. This article focuses on those where the primary author has personal first-hand experience. We cover some operational issues as well as the history of these facilities.

  7. High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis.

    PubMed

    Simonyan, Vahan; Mazumder, Raja

    2014-09-30

    The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis.

  8. High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis

    PubMed Central

    Simonyan, Vahan; Mazumder, Raja

    2014-01-01

    The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis. PMID:25271953

  9. Cloud Computing Boosts Business Intelligence of Telecommunication Industry

    NASA Astrophysics Data System (ADS)

    Xu, Meng; Gao, Dan; Deng, Chao; Luo, Zhiguo; Sun, Shaoling

    Business Intelligence becomes an attracting topic in today's data intensive applications, especially in telecommunication industry. Meanwhile, Cloud Computing providing IT supporting Infrastructure with excellent scalability, large scale storage, and high performance becomes an effective way to implement parallel data processing and data mining algorithms. BC-PDM (Big Cloud based Parallel Data Miner) is a new MapReduce based parallel data mining platform developed by CMRI (China Mobile Research Institute) to fit the urgent requirements of business intelligence in telecommunication industry. In this paper, the architecture, functionality and performance of BC-PDM are presented, together with the experimental evaluation and case studies of its applications. The evaluation result demonstrates both the usability and the cost-effectiveness of Cloud Computing based Business Intelligence system in applications of telecommunication industry.

  10. Self-service for software development projects and HPC activities

    NASA Astrophysics Data System (ADS)

    Husejko, M.; Høimyr, N.; Gonzalez, A.; Koloventzos, G.; Asbury, D.; Trzcinska, A.; Agtzidis, I.; Botrel, G.; Otto, J.

    2014-05-01

    This contribution describes how CERN has implemented several essential tools for agile software development processes, ranging from version control (Git) to issue tracking (Jira) and documentation (Wikis). Running such services in a large organisation like CERN requires many administrative actions both by users and service providers, such as creating software projects, managing access rights, users and groups, and performing tool-specific customisation. Dealing with these requests manually would be a time-consuming task. Another area of our CERN computing services that has required dedicated manual support has been clusters for specific user communities with special needs. Our aim is to move all our services to a layered approach, with server infrastructure running on the internal cloud computing infrastructure at CERN. This contribution illustrates how we plan to optimise the management of our of services by means of an end-user facing platform acting as a portal into all the related services for software projects, inspired by popular portals for open-source developments such as Sourceforge, GitHub and others. Furthermore, the contribution will discuss recent activities with tests and evaluations of High Performance Computing (HPC) applications on different hardware and software stacks, and plans to offer a dynamically scalable HPC service at CERN, based on affordable hardware.

  11. Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data

    PubMed Central

    2013-01-01

    Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made. PMID:23800020

  12. New security infrastructure model for distributed computing systems

    NASA Astrophysics Data System (ADS)

    Dubenskaya, J.; Kryukov, A.; Demichev, A.; Prikhodko, N.

    2016-02-01

    At the paper we propose a new approach to setting up a user-friendly and yet secure authentication and authorization procedure in a distributed computing system. The security concept of the most heterogeneous distributed computing systems is based on the public key infrastructure along with proxy certificates which are used for rights delegation. In practice a contradiction between the limited lifetime of the proxy certificates and the unpredictable time of the request processing is a big issue for the end users of the system. We propose to use unlimited in time hashes which are individual for each request instead of proxy certificate. Our approach allows to avoid using of the proxy certificates. Thus the security infrastructure of distributed computing system becomes easier for development, support and use.

  13. Development of a telemedicine model for emerging countries: a case study on pediatric oncology in Brazil.

    PubMed

    Hira, A Y; Nebel de Mello, A; Faria, R A; Odone Filho, V; Lopes, R D; Zuffo, M K

    2006-01-01

    This article discusses a telemedicine model for emerging countries, through the description of ONCONET, a telemedicine initiative applied to pediatric oncology in Brazil. The ONCONET core technology is a Web-based system that offers health information and other services specialized in childhood cancer such as electronic medical records and cooperative protocols for complex treatments. All Web-based services are supported by the use of high performance computing infrastructure based on clusters of commodity computers. The system was fully implemented on an open-source and free-software approach. Aspects of modeling, implementation and integration are covered. A model, both technologically and economically viable, was created through the research and development of in-house solutions adapted to the emerging countries reality and with focus on scalability both in the total number of patients and in the national infrastructure.

  14. Critical infrastructure protection : significant challenges in developing national capabilities

    DOT National Transportation Integrated Search

    2001-04-01

    To address the concerns about protecting the nation's critical computer-dependent infrastructure, this General Accounting Office (GOA) report describes the progress of the National Infrastructure Protection Center (NIPC) in (1) developing national ca...

  15. Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics

    PubMed Central

    Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A.; Caron, Christophe

    2015-01-01

    Summary: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. Availability and implementation: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). Contact: contact@workflow4metabolomics.org PMID:25527831

  16. Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics.

    PubMed

    Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A; Caron, Christophe

    2015-05-01

    The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). contact@workflow4metabolomics.org. © The Author 2014. Published by Oxford University Press.

  17. ERDC MSRC Resource. High Performance Computing for the Warfighter. Spring 2006

    DTIC Science & Technology

    2006-01-01

    named Ruby, and the HP/Compaq SC45, named Emerald , continue to add their unique sparkle to the ERDC MSRC computer infrastructure. ERDC invited the...configuration on B-52H purchased additional memory for the login nodes so that this part of the solution process could be done as a preprocessing step. On...application and system services. Of the service nodes, 10 are login nodes and 23 are input/output (I/O) server nodes for the Lustre file system (i.e., the

  18. Sandia QIS Capabilities.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Muller, Richard P.

    2017-07-01

    Sandia National Laboratories has developed a broad set of capabilities in quantum information science (QIS), including elements of quantum computing, quantum communications, and quantum sensing. The Sandia QIS program is built atop unique DOE investments at the laboratories, including the MESA microelectronics fabrication facility, the Center for Integrated Nanotechnologies (CINT) facilities (joint with LANL), the Ion Beam Laboratory, and ASC High Performance Computing (HPC) facilities. Sandia has invested $75 M of LDRD funding over 12 years to develop unique, differentiating capabilities that leverage these DOE infrastructure investments.

  19. XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Mid-year report FY17 Q2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moreland, Kenneth D.; Pugmire, David; Rogers, David

    The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less

  20. XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Year-end report FY17.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moreland, Kenneth D.; Pugmire, David; Rogers, David

    The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less

  1. XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem. Mid-year report FY16 Q2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moreland, Kenneth D.; Sewell, Christopher; Childs, Hank

    The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less

  2. XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Year-end report FY15 Q4.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moreland, Kenneth D.; Sewell, Christopher; Childs, Hank

    The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less

  3. Effective Team Support: From Modeling to Software Agents

    NASA Technical Reports Server (NTRS)

    Remington, Roger W. (Technical Monitor); John, Bonnie; Sycara, Katia

    2003-01-01

    The purpose of this research contract was to perform multidisciplinary research between CMU psychologists, computer scientists and engineers and NASA researchers to design a next generation collaborative system to support a team of human experts and intelligent agents. To achieve robust performance enhancement of such a system, we had proposed to perform task and cognitive modeling to thoroughly understand the impact technology makes on the organization and on key individual personnel. Guided by cognitively-inspired requirements, we would then develop software agents that support the human team in decision making, information filtering, information distribution and integration to enhance team situational awareness. During the period covered by this final report, we made substantial progress in modeling infrastructure and task infrastructure. Work is continuing under a different contract to complete empirical data collection, cognitive modeling, and the building of software agents to support the teams task.

  4. Waggle: A Framework for Intelligent Attentive Sensing and Actuation

    NASA Astrophysics Data System (ADS)

    Sankaran, R.; Jacob, R. L.; Beckman, P. H.; Catlett, C. E.; Keahey, K.

    2014-12-01

    Advances in sensor-driven computation and computationally steered sensing will greatly enable future research in fields including environmental and atmospheric sciences. We will present "Waggle," an open-source hardware and software infrastructure developed with two goals: (1) reducing the separation and latency between sensing and computing and (2) improving the reliability and longevity of sensing-actuation platforms in challenging and costly deployments. Inspired by "deep-space probe" systems, the Waggle platform design includes features that can support longitudinal studies, deployments with varying communication links, and remote management capabilities. Waggle lowers the barrier for scientists to incorporate real-time data from their sensors into their computations and to manipulate the sensors or provide feedback through actuators. A standardized software and hardware design allows quick addition of new sensors/actuators and associated software in the nodes and enables them to be coupled with computational codes both insitu and on external compute infrastructure. The Waggle framework currently drives the deployment of two observational systems - a portable and self-sufficient weather platform for study of small-scale effects in Chicago's urban core and an open-ended distributed instrument in Chicago that aims to support several research pursuits across a broad range of disciplines including urban planning, microbiology and computer science. Built around open-source software, hardware, and Linux OS, the Waggle system comprises two components - the Waggle field-node and Waggle cloud-computing infrastructure. Waggle field-node affords a modular, scalable, fault-tolerant, secure, and extensible platform for hosting sensors and actuators in the field. It supports insitu computation and data storage, and integration with cloud-computing infrastructure. The Waggle cloud infrastructure is designed with the goal of scaling to several hundreds of thousands of Waggle nodes. It supports aggregating data from sensors hosted by the nodes, staging computation, relaying feedback to the nodes and serving data to end-users. We will discuss the Waggle design principles and their applicability to various observational research pursuits, and demonstrate its capabilities.

  5. The Reduced Basis Method in Geosciences: Practical examples for numerical forward simulations

    NASA Astrophysics Data System (ADS)

    Degen, D.; Veroy, K.; Wellmann, F.

    2017-12-01

    Due to the highly heterogeneous character of the earth's subsurface, the complex coupling of thermal, hydrological, mechanical, and chemical processes, and the limited accessibility we have to face high-dimensional problems associated with high uncertainties in geosciences. Performing the obviously necessary uncertainty quantifications with a reasonable number of parameters is often not possible due to the high-dimensional character of the problem. Therefore, we are presenting the reduced basis (RB) method, being a model order reduction (MOR) technique, that constructs low-order approximations to, for instance, the finite element (FE) space. We use the RB method to address this computationally challenging simulations because this method significantly reduces the degrees of freedom. The RB method is decomposed into an offline and online stage, allowing to make the expensive pre-computations beforehand to get real-time results during field campaigns. Generally, the RB approach is most beneficial in the many-query and real-time context.We will illustrate the advantages of the RB method for the field of geosciences through two examples of numerical forward simulations.The first example is a geothermal conduction problem demonstrating the implementation of the RB method for a steady-state case. The second examples, a Darcy flow problem, shows the benefits for transient scenarios. In both cases, a quality evaluation of the approximations is given. Additionally, the runtimes for both the FE and the RB simulations are compared. We will emphasize the advantages of this method for repetitive simulations by showing the speed-up for the RB solution in contrast to the FE solution. Finally, we will demonstrate how the used implementation is usable in high-performance computing (HPC) infrastructures and evaluate its performance for such infrastructures. Hence, we will especially point out its scalability, yielding in an optimal usage on HPC infrastructures and normal working stations.

  6. Qualifying for the Green500: Experience with the newest generation of supercomputers at LANL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yilk, Todd

    The High Performance Computing Division of Los Alamos National Laboratory recently brought four new supercomputing platforms on line: Trinity with separate partitions built around the Haswell and Knights Landing CPU architectures for capability computing and Grizzly, Fire, and Ice for capacity computing applications. The power monitoring infrastructure of these machines is significantly enhanced over previous supercomputing generations at LANL and all were qualified at the highest level of the Green500 benchmark. Here, this paper discusses supercomputing at LANL, the Green500 benchmark, and notes on our experience meeting the Green500's reporting requirements.

  7. Qualifying for the Green500: Experience with the newest generation of supercomputers at LANL

    DOE PAGES

    Yilk, Todd

    2018-02-17

    The High Performance Computing Division of Los Alamos National Laboratory recently brought four new supercomputing platforms on line: Trinity with separate partitions built around the Haswell and Knights Landing CPU architectures for capability computing and Grizzly, Fire, and Ice for capacity computing applications. The power monitoring infrastructure of these machines is significantly enhanced over previous supercomputing generations at LANL and all were qualified at the highest level of the Green500 benchmark. Here, this paper discusses supercomputing at LANL, the Green500 benchmark, and notes on our experience meeting the Green500's reporting requirements.

  8. Pilots 2.0: DIRAC pilots for all the skies

    NASA Astrophysics Data System (ADS)

    Stagni, F.; Tsaregorodtsev, A.; McNab, A.; Luzzi, C.

    2015-12-01

    In the last few years, new types of computing infrastructures, such as IAAS (Infrastructure as a Service) and IAAC (Infrastructure as a Client), gained popularity. New resources may come as part of pledged resources, while others are opportunistic. Most of these new infrastructures are based on virtualization techniques. Meanwhile, some concepts, such as distributed queues, lost appeal, while still supporting a vast amount of resources. Virtual Organizations are therefore facing heterogeneity of the available resources and the use of an Interware software like DIRAC to hide the diversity of underlying resources has become essential. The DIRAC WMS is based on the concept of pilot jobs that was introduced back in 2004. A pilot is what creates the possibility to run jobs on a worker node. Within DIRAC, we developed a new generation of pilot jobs, that we dubbed Pilots 2.0. Pilots 2.0 are not tied to a specific infrastructure; rather they are generic, fully configurable and extendible pilots. A Pilot 2.0 can be sent, as a script to be run, or it can be fetched from a remote location. A pilot 2.0 can run on every computing resource, e.g.: on CREAM Computing elements, on DIRAC Computing elements, on Virtual Machines as part of the contextualization script, or IAAC resources, provided that these machines are properly configured, hiding all the details of the Worker Nodes (WNs) infrastructure. Pilots 2.0 can be generated server and client side. Pilots 2.0 are the “pilots to fly in all the skies”, aiming at easy use of computing power, in whatever form it is presented. Another aim is the unification and simplification of the monitoring infrastructure for all kinds of computing resources, by using pilots as a network of distributed sensors coordinated by a central resource monitoring system. Pilots 2.0 have been developed using the command pattern. VOs using DIRAC can tune pilots 2.0 as they need, and extend or replace each and every pilot command in an easy way. In this paper we describe how Pilots 2.0 work with distributed and heterogeneous resources providing the necessary abstraction to deal with different kind of computing resources.

  9. Evolution of the Virtualized HPC Infrastructure of Novosibirsk Scientific Center

    NASA Astrophysics Data System (ADS)

    Adakin, A.; Anisenkov, A.; Belov, S.; Chubarov, D.; Kalyuzhny, V.; Kaplin, V.; Korol, A.; Kuchin, N.; Lomakin, S.; Nikultsev, V.; Skovpen, K.; Sukharev, A.; Zaytsev, A.

    2012-12-01

    Novosibirsk Scientific Center (NSC), also known worldwide as Akademgorodok, is one of the largest Russian scientific centers hosting Novosibirsk State University (NSU) and more than 35 research organizations of the Siberian Branch of Russian Academy of Sciences including Budker Institute of Nuclear Physics (BINP), Institute of Computational Technologies, and Institute of Computational Mathematics and Mathematical Geophysics (ICM&MG). Since each institute has specific requirements on the architecture of computing farms involved in its research field, currently we've got several computing facilities hosted by NSC institutes, each optimized for a particular set of tasks, of which the largest are the NSU Supercomputer Center, Siberian Supercomputer Center (ICM&MG), and a Grid Computing Facility of BINP. A dedicated optical network with the initial bandwidth of 10 Gb/s connecting these three facilities was built in order to make it possible to share the computing resources among the research communities, thus increasing the efficiency of operating the existing computing facilities and offering a common platform for building the computing infrastructure for future scientific projects. Unification of the computing infrastructure is achieved by extensive use of virtualization technology based on XEN and KVM platforms. This contribution gives a thorough review of the present status and future development prospects for the NSC virtualized computing infrastructure and the experience gained while using it for running production data analysis jobs related to HEP experiments being carried out at BINP, especially the KEDR detector experiment at the VEPP-4M electron-positron collider.

  10. Low cost, high performance processing of single particle cryo-electron microscopy data in the cloud.

    PubMed

    Cianfrocco, Michael A; Leschziner, Andres E

    2015-05-08

    The advent of a new generation of electron microscopes and direct electron detectors has realized the potential of single particle cryo-electron microscopy (cryo-EM) as a technique to generate high-resolution structures. Calculating these structures requires high performance computing clusters, a resource that may be limiting to many likely cryo-EM users. To address this limitation and facilitate the spread of cryo-EM, we developed a publicly available 'off-the-shelf' computing environment on Amazon's elastic cloud computing infrastructure. This environment provides users with single particle cryo-EM software packages and the ability to create computing clusters with 16-480+ CPUs. We tested our computing environment using a publicly available 80S yeast ribosome dataset and estimate that laboratories could determine high-resolution cryo-EM structures for $50 to $1500 per structure within a timeframe comparable to local clusters. Our analysis shows that Amazon's cloud computing environment may offer a viable computing environment for cryo-EM.

  11. Editorial [Special issue on software defined networks and infrastructures, network function virtualisation, autonomous systems and network management

    DOE PAGES

    Biswas, Amitava; Liu, Chen; Monga, Inder; ...

    2016-01-01

    For last few years, there has been a tremendous growth in data traffic due to high adoption rate of mobile devices and cloud computing. Internet of things (IoT) will stimulate even further growth. This is increasing scale and complexity of telecom/internet service provider (SP) and enterprise data centre (DC) compute and network infrastructures. As a result, managing these large network-compute converged infrastructures is becoming complex and cumbersome. To cope up, network and DC operators are trying to automate network and system operations, administrations and management (OAM) functions. OAM includes all non-functional mechanisms which keep the network running.

  12. Cloud computing can simplify HIT infrastructure management.

    PubMed

    Glaser, John

    2011-08-01

    Software as a Service (SaaS), built on cloud computing technology, is emerging as the forerunner in IT infrastructure because it helps healthcare providers reduce capital investments. Cloud computing leads to predictable, monthly, fixed operating expenses for hospital IT staff. Outsourced cloud computing facilities are state-of-the-art data centers boasting some of the most sophisticated networking equipment on the market. The SaaS model helps hospitals safeguard against technology obsolescence, minimizes maintenance requirements, and simplifies management.

  13. Bringing the CMS distributed computing system into scalable operations

    NASA Astrophysics Data System (ADS)

    Belforte, S.; Fanfani, A.; Fisk, I.; Flix, J.; Hernández, J. M.; Kress, T.; Letts, J.; Magini, N.; Miccio, V.; Sciabà, A.

    2010-04-01

    Establishing efficient and scalable operations of the CMS distributed computing system critically relies on the proper integration, commissioning and scale testing of the data and workload management tools, the various computing workflows and the underlying computing infrastructure, located at more than 50 computing centres worldwide and interconnected by the Worldwide LHC Computing Grid. Computing challenges periodically undertaken by CMS in the past years with increasing scale and complexity have revealed the need for a sustained effort on computing integration and commissioning activities. The Processing and Data Access (PADA) Task Force was established at the beginning of 2008 within the CMS Computing Program with the mandate of validating the infrastructure for organized processing and user analysis including the sites and the workload and data management tools, validating the distributed production system by performing functionality, reliability and scale tests, helping sites to commission, configure and optimize the networking and storage through scale testing data transfers and data processing, and improving the efficiency of accessing data across the CMS computing system from global transfers to local access. This contribution reports on the tools and procedures developed by CMS for computing commissioning and scale testing as well as the improvements accomplished towards efficient, reliable and scalable computing operations. The activities include the development and operation of load generators for job submission and data transfers with the aim of stressing the experiment and Grid data management and workload management systems, site commissioning procedures and tools to monitor and improve site availability and reliability, as well as activities targeted to the commissioning of the distributed production, user analysis and monitoring systems.

  14. Progress in Machine Learning Studies for the CMS Computing Infrastructure

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bonacorsi, Daniele; Kuznetsov, Valentin; Magini, Nicolo

    Here, computing systems for LHC experiments developed together with Grids worldwide. While a complete description of the original Grid-based infrastructure and services for LHC experiments and its recent evolutions can be found elsewhere, it is worth to mention here the scale of the computing resources needed to fulfill the needs of LHC experiments in Run-1 and Run-2 so far.

  15. Progress in Machine Learning Studies for the CMS Computing Infrastructure

    DOE PAGES

    Bonacorsi, Daniele; Kuznetsov, Valentin; Magini, Nicolo; ...

    2017-12-06

    Here, computing systems for LHC experiments developed together with Grids worldwide. While a complete description of the original Grid-based infrastructure and services for LHC experiments and its recent evolutions can be found elsewhere, it is worth to mention here the scale of the computing resources needed to fulfill the needs of LHC experiments in Run-1 and Run-2 so far.

  16. Transformation of OODT CAS to Perform Larger Tasks

    NASA Technical Reports Server (NTRS)

    Mattmann, Chris; Freeborn, Dana; Crichton, Daniel; Hughes, John; Ramirez, Paul; Hardman, Sean; Woollard, David; Kelly, Sean

    2008-01-01

    A computer program denoted OODT CAS has been transformed to enable performance of larger tasks that involve greatly increased data volumes and increasingly intensive processing of data on heterogeneous, geographically dispersed computers. Prior to the transformation, OODT CAS (also alternatively denoted, simply, 'CAS') [wherein 'OODT' signifies 'Object-Oriented Data Technology' and 'CAS' signifies 'Catalog and Archive Service'] was a proven software component used to manage scientific data from spaceflight missions. In the transformation, CAS was split into two separate components representing its canonical capabilities: file management and workflow management. In addition, CAS was augmented by addition of a resource-management component. This third component enables CAS to manage heterogeneous computing by use of diverse resources, including high-performance clusters of computers, commodity computing hardware, and grid computing infrastructures. CAS is now more easily maintainable, evolvable, and reusable. These components can be used separately or, taking advantage of synergies, can be used together. Other elements of the transformation included addition of a separate Web presentation layer that supports distribution of data products via Really Simple Syndication (RSS) feeds, and provision for full Resource Description Framework (RDF) exports of metadata.

  17. e-Infrastructures for e-Sciences 2013 A CHAIN-REDS Workshop organised under the aegis of the European Commission

    NASA Astrophysics Data System (ADS)

    The CHAIN-REDS Project is organising a workshop on "e-Infrastructures for e-Sciences" focusing on Cloud Computing and Data Repositories under the aegis of the European Commission and in co-location with the International Conference on e-Science 2013 (IEEE2013) that will be held in Beijing, P.R. of China on October 17-22, 2013. The core objective of the CHAIN-REDS project is to promote, coordinate and support the effort of a critical mass of non-European e-Infrastructures for Research and Education to collaborate with Europe addressing interoperability and interoperation of Grids and other Distributed Computing Infrastructures (DCI). From this perspective, CHAIN-REDS will optimise the interoperation of European infrastructures with those present in 6 other regions of the world, both from a development and use point of view, and catering to different communities. Overall, CHAIN-REDS will provide input for future strategies and decision-making regarding collaboration with other regions on e-Infrastructure deployment and availability of related data; it will raise the visibility of e-Infrastructures towards intercontinental audiences, covering most of the world and will provide support to establish globally connected and interoperable infrastructures, in particular between the EU and the developing regions. Organised by IHEP, INFN and Sigma Orionis with the support of all project partners, this workshop will aim at: - Presenting the state of the art of Cloud computing in Europe and in China and discussing the opportunities offered by having interoperable and federated e-Infrastructures; - Exploring the existing initiatives of Data Infrastructures in Europe and China, and highlighting the Data Repositories of interest for the Virtual Research Communities in several domains such as Health, Agriculture, Climate, etc.

  18. Cultural and Technological Issues and Solutions for Geodynamics Software Citation

    NASA Astrophysics Data System (ADS)

    Heien, E. M.; Hwang, L.; Fish, A. E.; Smith, M.; Dumit, J.; Kellogg, L. H.

    2014-12-01

    Computational software and custom-written codes play a key role in scientific research and teaching, providing tools to perform data analysis and forward modeling through numerical computation. However, development of these codes is often hampered by the fact that there is no well-defined way for the authors to receive credit or professional recognition for their work through the standard methods of scientific publication and subsequent citation of the work. This in turn may discourage researchers from publishing their codes or making them easier for other scientists to use. We investigate the issues involved in citing software in a scientific context, and introduce features that should be components of a citation infrastructure, particularly oriented towards the codes and scientific culture in the area of geodynamics research. The codes used in geodynamics are primarily specialized numerical modeling codes for continuum mechanics problems; they may be developed by individual researchers, teams of researchers, geophysicists in collaboration with computational scientists and applied mathematicians, or by coordinated community efforts such as the Computational Infrastructure for Geodynamics. Some but not all geodynamics codes are open-source. These characteristics are common to many areas of geophysical software development and use. We provide background on the problem of software citation and discuss some of the barriers preventing adoption of such citations, including social/cultural barriers, insufficient technological support infrastructure, and an overall lack of agreement about what a software citation should consist of. We suggest solutions in an initial effort to create a system to support citation of software and promotion of scientific software development.

  19. Defense Strategies for Asymmetric Networked Systems with Discrete Components.

    PubMed

    Rao, Nageswara S V; Ma, Chris Y T; Hausken, Kjell; He, Fei; Yau, David K Y; Zhuang, Jun

    2018-05-03

    We consider infrastructures consisting of a network of systems, each composed of discrete components. The network provides the vital connectivity between the systems and hence plays a critical, asymmetric role in the infrastructure operations. The individual components of the systems can be attacked by cyber and physical means and can be appropriately reinforced to withstand these attacks. We formulate the problem of ensuring the infrastructure performance as a game between an attacker and a provider, who choose the numbers of the components of the systems and network to attack and reinforce, respectively. The costs and benefits of attacks and reinforcements are characterized using the sum-form, product-form and composite utility functions, each composed of a survival probability term and a component cost term. We present a two-level characterization of the correlations within the infrastructure: (i) the aggregate failure correlation function specifies the infrastructure failure probability given the failure of an individual system or network, and (ii) the survival probabilities of the systems and network satisfy first-order differential conditions that capture the component-level correlations using multiplier functions. We derive Nash equilibrium conditions that provide expressions for individual system survival probabilities and also the expected infrastructure capacity specified by the total number of operational components. We apply these results to derive and analyze defense strategies for distributed cloud computing infrastructures using cyber-physical models.

  20. Defense Strategies for Asymmetric Networked Systems with Discrete Components

    PubMed Central

    Rao, Nageswara S. V.; Ma, Chris Y. T.; Hausken, Kjell; He, Fei; Yau, David K. Y.

    2018-01-01

    We consider infrastructures consisting of a network of systems, each composed of discrete components. The network provides the vital connectivity between the systems and hence plays a critical, asymmetric role in the infrastructure operations. The individual components of the systems can be attacked by cyber and physical means and can be appropriately reinforced to withstand these attacks. We formulate the problem of ensuring the infrastructure performance as a game between an attacker and a provider, who choose the numbers of the components of the systems and network to attack and reinforce, respectively. The costs and benefits of attacks and reinforcements are characterized using the sum-form, product-form and composite utility functions, each composed of a survival probability term and a component cost term. We present a two-level characterization of the correlations within the infrastructure: (i) the aggregate failure correlation function specifies the infrastructure failure probability given the failure of an individual system or network, and (ii) the survival probabilities of the systems and network satisfy first-order differential conditions that capture the component-level correlations using multiplier functions. We derive Nash equilibrium conditions that provide expressions for individual system survival probabilities and also the expected infrastructure capacity specified by the total number of operational components. We apply these results to derive and analyze defense strategies for distributed cloud computing infrastructures using cyber-physical models. PMID:29751588

  1. Critical Infrastructure Protection II, The International Federation for Information Processing, Volume 290.

    NASA Astrophysics Data System (ADS)

    Papa, Mauricio; Shenoi, Sujeet

    The information infrastructure -- comprising computers, embedded devices, networks and software systems -- is vital to day-to-day operations in every sector: information and telecommunications, banking and finance, energy, chemicals and hazardous materials, agriculture, food, water, public health, emergency services, transportation, postal and shipping, government and defense. Global business and industry, governments, indeed society itself, cannot function effectively if major components of the critical information infrastructure are degraded, disabled or destroyed. Critical Infrastructure Protection II describes original research results and innovative applications in the interdisciplinary field of critical infrastructure protection. Also, it highlights the importance of weaving science, technology and policy in crafting sophisticated, yet practical, solutions that will help secure information, computer and network assets in the various critical infrastructure sectors. Areas of coverage include: - Themes and Issues - Infrastructure Security - Control Systems Security - Security Strategies - Infrastructure Interdependencies - Infrastructure Modeling and Simulation This book is the second volume in the annual series produced by the International Federation for Information Processing (IFIP) Working Group 11.10 on Critical Infrastructure Protection, an international community of scientists, engineers, practitioners and policy makers dedicated to advancing research, development and implementation efforts focused on infrastructure protection. The book contains a selection of twenty edited papers from the Second Annual IFIP WG 11.10 International Conference on Critical Infrastructure Protection held at George Mason University, Arlington, Virginia, USA in the spring of 2008.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCaskey, Alexander J.

    Hybrid programming models for beyond-CMOS technologies will prove critical for integrating new computing technologies alongside our existing infrastructure. Unfortunately the software infrastructure required to enable this is lacking or not available. XACC is a programming framework for extreme-scale, post-exascale accelerator architectures that integrates alongside existing conventional applications. It is a pluggable framework for programming languages developed for next-gen computing hardware architectures like quantum and neuromorphic computing. It lets computational scientists efficiently off-load classically intractable work to attached accelerators through user-friendly Kernel definitions. XACC makes post-exascale hybrid programming approachable for domain computational scientists.

  3. Software Reuse Methods to Improve Technological Infrastructure for e-Science

    NASA Technical Reports Server (NTRS)

    Marshall, James J.; Downs, Robert R.; Mattmann, Chris A.

    2011-01-01

    Social computing has the potential to contribute to scientific research. Ongoing developments in information and communications technology improve capabilities for enabling scientific research, including research fostered by social computing capabilities. The recent emergence of e-Science practices has demonstrated the benefits from improvements in the technological infrastructure, or cyber-infrastructure, that has been developed to support science. Cloud computing is one example of this e-Science trend. Our own work in the area of software reuse offers methods that can be used to improve new technological development, including cloud computing capabilities, to support scientific research practices. In this paper, we focus on software reuse and its potential to contribute to the development and evaluation of information systems and related services designed to support new capabilities for conducting scientific research.

  4. Parallel Processing of Images in Mobile Devices using BOINC

    NASA Astrophysics Data System (ADS)

    Curiel, Mariela; Calle, David F.; Santamaría, Alfredo S.; Suarez, David F.; Flórez, Leonardo

    2018-04-01

    Medical image processing helps health professionals make decisions for the diagnosis and treatment of patients. Since some algorithms for processing images require substantial amounts of resources, one could take advantage of distributed or parallel computing. A mobile grid can be an adequate computing infrastructure for this problem. A mobile grid is a grid that includes mobile devices as resource providers. In a previous step of this research, we selected BOINC as the infrastructure to build our mobile grid. However, parallel processing of images in mobile devices poses at least two important challenges: the execution of standard libraries for processing images and obtaining adequate performance when compared to desktop computers grids. By the time we started our research, the use of BOINC in mobile devices also involved two issues: a) the execution of programs in mobile devices required to modify the code to insert calls to the BOINC API, and b) the division of the image among the mobile devices as well as its merging required additional code in some BOINC components. This article presents answers to these four challenges.

  5. Collaboratively Architecting a Scalable and Adaptable Petascale Infrastructure to Support Transdisciplinary Scientific Research for the Australian Earth and Environmental Sciences

    NASA Astrophysics Data System (ADS)

    Wyborn, L. A.; Evans, B. J. K.; Pugh, T.; Lescinsky, D. T.; Foster, C.; Uhlherr, A.

    2014-12-01

    The National Computational Infrastructure (NCI) at the Australian National University (ANU) is a partnership between CSIRO, ANU, Bureau of Meteorology (BoM) and Geoscience Australia. Recent investments in a 1.2 PFlop Supercomputer (Raijin), ~ 20 PB data storage using Lustre filesystems and a 3000 core high performance cloud have created a hybrid platform for higher performance computing and data-intensive science to enable large scale earth and climate systems modelling and analysis. There are > 3000 users actively logging in and > 600 projects on the NCI system. Efficiently scaling and adapting data and software systems to petascale infrastructures requires the collaborative development of an architecture that is designed, programmed and operated to enable users to interactively invoke different forms of in-situ computation over complex and large scale data collections. NCI makes available major and long tail data collections from both the government and research sectors based on six themes: 1) weather, climate and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology and 6) astronomy, bio and social. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. Collections are the operational form for data management and access. Similar data types from individual custodians are managed cohesively. Use of international standards for discovery and interoperability allow complex interactions within and between the collections. This design facilitates a transdisciplinary approach to research and enables a shift from small scale, 'stove-piped' science efforts to large scale, collaborative systems science. This new and complex infrastructure requires a move to shared, globally trusted software frameworks that can be maintained and updated. Workflow engines become essential and need to integrate provenance, versioning, traceability, repeatability and publication. There are also human resource challenges as highly skilled HPC/HPD specialists, specialist programmers, and data scientists are required whose skills can support scaling to the new paradigm of effective and efficient data-intensive earth science analytics on petascale, and soon to be exascale systems.

  6. Cloud Computing with iPlant Atmosphere.

    PubMed

    McKay, Sheldon J; Skidmore, Edwin J; LaRose, Christopher J; Mercer, Andre W; Noutsos, Christos

    2013-10-15

    Cloud Computing refers to distributed computing platforms that use virtualization software to provide easy access to physical computing infrastructure and data storage, typically administered through a Web interface. Cloud-based computing provides access to powerful servers, with specific software and virtual hardware configurations, while eliminating the initial capital cost of expensive computers and reducing the ongoing operating costs of system administration, maintenance contracts, power consumption, and cooling. This eliminates a significant barrier to entry into bioinformatics and high-performance computing for many researchers. This is especially true of free or modestly priced cloud computing services. The iPlant Collaborative offers a free cloud computing service, Atmosphere, which allows users to easily create and use instances on virtual servers preconfigured for their analytical needs. Atmosphere is a self-service, on-demand platform for scientific computing. This unit demonstrates how to set up, access and use cloud computing in Atmosphere. Copyright © 2013 John Wiley & Sons, Inc.

  7. Elastic Cloud Computing Architecture and System for Heterogeneous Spatiotemporal Computing

    NASA Astrophysics Data System (ADS)

    Shi, X.

    2017-10-01

    Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs), while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC) or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA) may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.

  8. S3DB core: a framework for RDF generation and management in bioinformatics infrastructures

    PubMed Central

    2010-01-01

    Background Biomedical research is set to greatly benefit from the use of semantic web technologies in the design of computational infrastructure. However, beyond well defined research initiatives, substantial issues of data heterogeneity, source distribution, and privacy currently stand in the way towards the personalization of Medicine. Results A computational framework for bioinformatic infrastructure was designed to deal with the heterogeneous data sources and the sensitive mixture of public and private data that characterizes the biomedical domain. This framework consists of a logical model build with semantic web tools, coupled with a Markov process that propagates user operator states. An accompanying open source prototype was developed to meet a series of applications that range from collaborative multi-institution data acquisition efforts to data analysis applications that need to quickly traverse complex data structures. This report describes the two abstractions underlying the S3DB-based infrastructure, logical and numerical, and discusses its generality beyond the immediate confines of existing implementations. Conclusions The emergence of the "web as a computer" requires a formal model for the different functionalities involved in reading and writing to it. The S3DB core model proposed was found to address the design criteria of biomedical computational infrastructure, such as those supporting large scale multi-investigator research, clinical trials, and molecular epidemiology. PMID:20646315

  9. The GÉANT network: addressing current and future needs of the HEP community

    NASA Astrophysics Data System (ADS)

    Capone, Vincenzo; Usman, Mian

    2015-12-01

    The GÉANT infrastructure is the backbone that serves the scientific communities in Europe for their data movement needs and their access to international research and education networks. Using the extensive fibre footprint and infrastructure in Europe the GÉANT network delivers a portfolio of services aimed to best fit the specific needs of the users, including Authentication and Authorization Infrastructure, end-to-end performance monitoring, advanced network services (dynamic circuits, L2-L3VPN, MD-VPN). This talk will outline the factors that help the GÉANT network to respond to the needs of the High Energy Physics community, both in Europe and worldwide. The Pan-European network provides the connectivity between 40 European national research and education networks. In addition, GÉANT also connects the European NRENs to the R&E networks in other world region and has reach to over 110 NREN worldwide, making GÉANT the best connected Research and Education network, with its multiple intercontinental links to different continents e.g. North and South America, Africa and Asia-Pacific. The High Energy Physics computational needs have always had (and will keep having) a leading role among the scientific user groups of the GÉANT network: the LHCONE overlay network has been built, in collaboration with the other big world REN, specifically to address the peculiar needs of the LHC data movement. Recently, as a result of a series of coordinated efforts, the LHCONE network has been expanded to the Asia-Pacific area, and is going to include some of the main regional R&E network in the area. The LHC community is not the only one that is actively using a distributed computing model (hence the need for a high-performance network); new communities are arising, as BELLE II. GÉANT is deeply involved also with the BELLE II Experiment, to provide full support to their distributed computing model, along with a perfSONAR-based network monitoring system. GÉANT has also coordinated the setup of the network infrastructure to perform the BELLE II Trans-Atlantic Data Challenge, and has been active on helping the BELLE II community to sort out their end-to-end performance issues. In this talk we will provide information about the current GÉANT network architecture and of the international connectivity, along with the upcoming upgrades and the planned and foreseeable improvements. We will also describe the implementation of the solutions provided to support the LHC and BELLE II experiments.

  10. The Czech National Grid Infrastructure

    NASA Astrophysics Data System (ADS)

    Chudoba, J.; Křenková, I.; Mulač, M.; Ruda, M.; Sitera, J.

    2017-10-01

    The Czech National Grid Infrastructure is operated by MetaCentrum, a CESNET department responsible for coordinating and managing activities related to distributed computing. CESNET as the Czech National Research and Education Network (NREN) provides many e-infrastructure services, which are used by 94% of the scientific and research community in the Czech Republic. Computing and storage resources owned by different organizations are connected by fast enough network to provide transparent access to all resources. We describe in more detail the computing infrastructure, which is based on several different technologies and covers grid, cloud and map-reduce environment. While the largest part of CPUs is still accessible via distributed torque servers, providing environment for long batch jobs, part of infrastructure is available via standard EGI tools in EGI, subset of NGI resources is provided into EGI FedCloud environment with cloud interface and there is also Hadoop cluster provided by the same e-infrastructure.A broad spectrum of computing servers is offered; users can choose from standard 2 CPU servers to large SMP machines with up to 6 TB of RAM or servers with GPU cards. Different groups have different priorities on various resources, resource owners can even have an exclusive access. The software is distributed via AFS. Storage servers offering up to tens of terabytes of disk space to individual users are connected via NFS4 on top of GPFS and access to long term HSM storage with peta-byte capacity is also provided. Overview of available resources and recent statistics of usage will be given.

  11. An infrastructure with a unified control plane to integrate IP into optical metro networks to provide flexible and intelligent bandwidth on demand for cloud computing

    NASA Astrophysics Data System (ADS)

    Yang, Wei; Hall, Trevor

    2012-12-01

    The Internet is entering an era of cloud computing to provide more cost effective, eco-friendly and reliable services to consumer and business users and the nature of the Internet traffic will undertake a fundamental transformation. Consequently, the current Internet will no longer suffice for serving cloud traffic in metro areas. This work proposes an infrastructure with a unified control plane that integrates simple packet aggregation technology with optical express through the interoperation between IP routers and electrical traffic controllers in optical metro networks. The proposed infrastructure provides flexible, intelligent, and eco-friendly bandwidth on demand for cloud computing in metro areas.

  12. INDIGO-DataCloud solutions for Earth Sciences

    NASA Astrophysics Data System (ADS)

    Aguilar Gómez, Fernando; de Lucas, Jesús Marco; Fiore, Sandro; Monna, Stephen; Chen, Yin

    2017-04-01

    INDIGO-DataCloud (https://www.indigo-datacloud.eu/) is a European Commission funded project aiming to develop a data and computing platform targeting scientific communities, deployable on multiple hardware and provisioned over hybrid (private or public) e-infrastructures. The development of INDIGO solutions covers the different layers in cloud computing (IaaS, PaaS, SaaS), and provides tools to exploit resources like HPC or GPGPUs. INDIGO is oriented to support European Scientific research communities, that are well represented in the project. Twelve different Case Studies have been analyzed in detail from different fields: Biological & Medical sciences, Social sciences & Humanities, Environmental and Earth sciences and Physics & Astrophysics. INDIGO-DataCloud provides solutions to emerging challenges in Earth Science like: -Enabling an easy deployment of community services at different cloud sites. Many Earth Science research infrastructures often involve distributed observation stations across countries, and also have distributed data centers to support the corresponding data acquisition and curation. There is a need to easily deploy new data center services while the research infrastructure continuous spans. As an example: LifeWatch (ESFRI, Ecosystems and Biodiversity) uses INDIGO solutions to manage the deployment of services to perform complex hydrodynamics and water quality modelling over a Cloud Computing environment, predicting algae blooms, using the Docker technology: TOSCA requirement description, Docker repository, Orchestrator for deployment, AAI (AuthN, AuthZ) and OneData (Distributed Storage System). -Supporting Big Data Analysis. Nowadays, many Earth Science research communities produce large amounts of data and and are challenged by the difficulties of processing and analysing it. A climate models intercomparison data analysis case study for the European Network for Earth System Modelling (ENES) community has been setup, based on the Ophidia big data analysis framework and the Kepler workflow management system. Such services normally involve a large and distributed set of data and computing resources. In this regard, this case study exploits the INDIGO PaaS for a flexible and dynamic allocation of the resources at the infrastructural level. -Providing Distributed Data Storage Solutions. In order to allow scientific communities to perform heavy computation on huge datasets, INDIGO provides global data access solutions allowing researchers to access data in a distributed environment like fashion regardless of its location, and also to publish and share their research results with public or close communities. INDIGO solutions that support the access to distributed data storage (OneData) are being tested on EMSO infrastructure (Ocean Sciences and Geohazards) data. Another aspect of interest for the EMSO community is in efficient data processing by exploiting INDIGO services like PaaS Orchestrator. Further, for HPC exploitation, a new solution named Udocker has been implemented, enabling users to execute docker containers in supercomputers, without requiring administration privileges. This presentation will overview INDIGO solutions that are interesting and useful for Earth science communities and will show how they can be applied to other Case Studies.

  13. Conditions for Ubiquitous Computing: What Can Be Learned from a Longitudinal Study

    ERIC Educational Resources Information Center

    Lei, Jing

    2010-01-01

    Based on survey data and interview data collected over four academic years, this longitudinal study examined how a ubiquitous computing project evolved along with the changes in teachers, students, the human infrastructure, and technology infrastructure in the school. This study also investigated what conditions were necessary for successful…

  14. OOI CyberInfrastructure - Next Generation Oceanographic Research

    NASA Astrophysics Data System (ADS)

    Farcas, C.; Fox, P.; Arrott, M.; Farcas, E.; Klacansky, I.; Krueger, I.; Meisinger, M.; Orcutt, J.

    2008-12-01

    Software has become a key enabling technology for scientific discovery, observation, modeling, and exploitation of natural phenomena. New value emerges from the integration of individual subsystems into networked federations of capabilities exposed to the scientific community. Such data-intensive interoperability networks are crucial for future scientific collaborative research, as they open up new ways of fusing data from different sources and across various domains, and analysis on wide geographic areas. The recently established NSF OOI program, through its CyberInfrastructure component addresses this challenge by providing broad access from sensor networks for data acquisition up to computational grids for massive computations and binding infrastructure facilitating policy management and governance of the emerging system-of-scientific-systems. We provide insight into the integration core of this effort, namely, a hierarchic service-oriented architecture for a robust, performant, and maintainable implementation. We first discuss the relationship between data management and CI crosscutting concerns such as identity management, policy and governance, which define the organizational contexts for data access and usage. Next, we detail critical services including data ingestion, transformation, preservation, inventory, and presentation. To address interoperability issues between data represented in various formats we employ a semantic framework derived from the Earth System Grid technology, a canonical representation for scientific data based on DAP/OPeNDAP, and related data publishers such as ERDDAP. Finally, we briefly present the underlying transport based on a messaging infrastructure over the AMQP protocol, and the preservation based on a distributed file system through SDSC iRODS.

  15. How to keep the Grid full and working with ATLAS production and physics jobs

    NASA Astrophysics Data System (ADS)

    Pacheco Pagés, A.; Barreiro Megino, F. H.; Cameron, D.; Fassi, F.; Filipcic, A.; Di Girolamo, A.; González de la Hoz, S.; Glushkov, I.; Maeno, T.; Walker, R.; Yang, W.; ATLAS Collaboration

    2017-10-01

    The ATLAS production system provides the infrastructure to process millions of events collected during the LHC Run 1 and the first two years of Run 2 using grid, clouds and high performance computing. We address in this contribution the strategies and improvements that have been implemented to the production system for optimal performance and to achieve the highest efficiency of available resources from operational perspective. We focus on the recent developments.

  16. Integration of High-Performance Computing into Cloud Computing Services

    NASA Astrophysics Data System (ADS)

    Vouk, Mladen A.; Sills, Eric; Dreher, Patrick

    High-Performance Computing (HPC) projects span a spectrum of computer hardware implementations ranging from peta-flop supercomputers, high-end tera-flop facilities running a variety of operating systems and applications, to mid-range and smaller computational clusters used for HPC application development, pilot runs and prototype staging clusters. What they all have in common is that they operate as a stand-alone system rather than a scalable and shared user re-configurable resource. The advent of cloud computing has changed the traditional HPC implementation. In this article, we will discuss a very successful production-level architecture and policy framework for supporting HPC services within a more general cloud computing infrastructure. This integrated environment, called Virtual Computing Lab (VCL), has been operating at NC State since fall 2004. Nearly 8,500,000 HPC CPU-Hrs were delivered by this environment to NC State faculty and students during 2009. In addition, we present and discuss operational data that show that integration of HPC and non-HPC (or general VCL) services in a cloud can substantially reduce the cost of delivering cloud services (down to cents per CPU hour).

  17. International Symposium on Grids and Clouds (ISGC) 2014

    NASA Astrophysics Data System (ADS)

    The International Symposium on Grids and Clouds (ISGC) 2014 will be held at Academia Sinica in Taipei, Taiwan from 23-28 March 2014, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC).“Bringing the data scientist to global e-Infrastructures” is the theme of ISGC 2014. The last decade has seen the phenomenal growth in the production of data in all forms by all research communities to produce a deluge of data from which information and knowledge need to be extracted. Key to this success will be the data scientist - educated to use advanced algorithms, applications and infrastructures - collaborating internationally to tackle society’s challenges. ISGC 2014 will bring together researchers working in all aspects of data science from different disciplines around the world to collaborate and educate themselves in the latest achievements and techniques being used to tackle the data deluge. In addition to the regular workshops, technical presentations and plenary keynotes, ISGC this year will focus on how to grow the data science community by considering the educational foundation needed for tomorrow’s data scientist. Topics of discussion include Physics (including HEP) and Engineering Applications, Biomedicine & Life Sciences Applications, Earth & Environmental Sciences & Biodiversity Applications, Humanities & Social Sciences Application, Virtual Research Environment (including Middleware, tools, services, workflow, ... etc.), Data Management, Big Data, Infrastructure & Operations Management, Infrastructure Clouds and Virtualisation, Interoperability, Business Models & Sustainability, Highly Distributed Computing Systems, and High Performance & Technical Computing (HPTC).

  18. Proceedings Second Annual Cyber Security and Information Infrastructure Research Workshop

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sheldon, Frederick T; Krings, Axel; Yoo, Seong-Moo

    2006-01-01

    The workshop theme is Cyber Security: Beyond the Maginot Line Recently the FBI reported that computer crime has skyrocketed costing over $67 billion in 2005 alone and affecting 2.8M+ businesses and organizations. Attack sophistication is unprecedented along with availability of open source concomitant tools. Private, academic, and public sectors invest significant resources in cyber security. Industry primarily performs cyber security research as an investment in future products and services. While the public sector also funds cyber security R&D, the majority of this activity focuses on the specific mission(s) of the funding agency. Thus, broad areas of cyber security remain neglectedmore » or underdeveloped. Consequently, this workshop endeavors to explore issues involving cyber security and related technologies toward strengthening such areas and enabling the development of new tools and methods for securing our information infrastructure critical assets. We aim to assemble new ideas and proposals about robust models on which we can build the architecture of a secure cyberspace including but not limited to: * Knowledge discovery and management * Critical infrastructure protection * De-obfuscating tools for the validation and verification of tamper-proofed software * Computer network defense technologies * Scalable information assurance strategies * Assessment-driven design for trust * Security metrics and testing methodologies * Validation of security and survivability properties * Threat assessment and risk analysis * Early accurate detection of the insider threat * Security hardened sensor networks and ubiquitous computing environments * Mobile software authentication protocols * A new "model" of the threat to replace the "Maginot Line" model and more . . .« less

  19. High-Performance Java Codes for Computational Fluid Dynamics

    NASA Technical Reports Server (NTRS)

    Riley, Christopher; Chatterjee, Siddhartha; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

    2001-01-01

    The computational science community is reluctant to write large-scale computationally -intensive applications in Java due to concerns over Java's poor performance, despite the claimed software engineering advantages of its object-oriented features. Naive Java implementations of numerical algorithms can perform poorly compared to corresponding Fortran or C implementations. To achieve high performance, Java applications must be designed with good performance as a primary goal. This paper presents the object-oriented design and implementation of two real-world applications from the field of Computational Fluid Dynamics (CFD): a finite-volume fluid flow solver (LAURA, from NASA Langley Research Center), and an unstructured mesh adaptation algorithm (2D_TAG, from NASA Ames Research Center). This work builds on our previous experience with the design of high-performance numerical libraries in Java. We examine the performance of the applications using the currently available Java infrastructure and show that the Java version of the flow solver LAURA performs almost within a factor of 2 of the original procedural version. Our Java version of the mesh adaptation algorithm 2D_TAG performs within a factor of 1.5 of its original procedural version on certain platforms. Our results demonstrate that object-oriented software design principles are not necessarily inimical to high performance.

  20. Spatial data analytics on heterogeneous multi- and many-core parallel architectures using python

    USGS Publications Warehouse

    Laura, Jason R.; Rey, Sergio J.

    2017-01-01

    Parallel vector spatial analysis concerns the application of parallel computational methods to facilitate vector-based spatial analysis. The history of parallel computation in spatial analysis is reviewed, and this work is placed into the broader context of high-performance computing (HPC) and parallelization research. The rise of cyber infrastructure and its manifestation in spatial analysis as CyberGIScience is seen as a main driver of renewed interest in parallel computation in the spatial sciences. Key problems in spatial analysis that have been the focus of parallel computing are covered. Chief among these are spatial optimization problems, computational geometric problems including polygonization and spatial contiguity detection, the use of Monte Carlo Markov chain simulation in spatial statistics, and parallel implementations of spatial econometric methods. Future directions for research on parallelization in computational spatial analysis are outlined.

  1. POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph

    NASA Astrophysics Data System (ADS)

    Poat, M. D.; Lauret, J.; Betts, W.

    2015-12-01

    The STAR online computing infrastructure has become an intensive dynamic system used for first-hand data collection and analysis resulting in a dense collection of data output. As we have transitioned to our current state, inefficient, limited storage systems have become an impediment to fast feedback to online shift crews. Motivation for a centrally accessible, scalable and redundant distributed storage system had become a necessity in this environment. OpenStack Swift Object Storage and Ceph Object Storage are two eye-opening technologies as community use and development have led to success elsewhere. In this contribution, OpenStack Swift and Ceph have been put to the test with single and parallel I/O tests, emulating real world scenarios for data processing and workflows. The Ceph file system storage, offering a POSIX compliant file system mounted similarly to an NFS share was of particular interest as it aligned with our requirements and was retained as our solution. I/O performance tests were run against the Ceph POSIX file system and have presented surprising results indicating true potential for fast I/O and reliability. STAR'S online compute farm historical use has been for job submission and first hand data analysis. The goal of reusing the online compute farm to maintain a storage cluster and job submission will be an efficient use of the current infrastructure.

  2. Sustainable Cooperative Robotic Technologies for Human and Robotic Outpost Infrastructure Construction and Maintenance

    NASA Technical Reports Server (NTRS)

    Stroupe, Ashley W.; Okon, Avi; Robinson, Matthew; Huntsberger, Terry; Aghazarian, Hrand; Baumgartner, Eric

    2004-01-01

    Robotic Construction Crew (RCC) is a heterogeneous multi-robot system for autonomous acquisition, transport, and precision mating of components in construction tasks. RCC minimizes resources constrained in a space environment such as computation, power, communication and, sensing. A behavior-based architecture provides adaptability and robustness despite low computational requirements. RCC successfully performs several construction related tasks in an emulated outdoor environment despite high levels of uncertainty in motions and sensing. Quantitative results are provided for formation keeping in component transport, precision instrument placement, and construction tasks.

  3. GreenView and GreenLand Applications Development on SEE-GRID Infrastructure

    NASA Astrophysics Data System (ADS)

    Mihon, Danut; Bacu, Victor; Gorgan, Dorian; Mészáros, Róbert; Gelybó, Györgyi; Stefanut, Teodor

    2010-05-01

    The GreenView and GreenLand applications [1] have been developed through the SEE-GRID-SCI (SEE-GRID eInfrastructure for regional eScience) FP7 project co-funded by the European Commission [2]. The development of environment applications is a challenge for Grid technologies and software development methodologies. This presentation exemplifies the development of the GreenView and GreenLand applications over the SEE-GRID infrastructure by the Grid Application Development Methodology [3]. Today's environmental applications are used in vary domains of Earth Science such as meteorology, ground and atmospheric pollution, ground metal detection or weather prediction. These applications run on satellite images (e.g. Landsat, MERIS, MODIS, etc.) and the accuracy of output results depends mostly of the quality of these images. The main drawback of such environmental applications regards the need of computation power and storage power (some images are almost 1GB in size), in order to process such a large data volume. Actually, almost applications requiring high computation resources have approached the migration onto the Grid infrastructure. This infrastructure offers the computing power by running the atomic application components on different Grid nodes in sequential or parallel mode. The middleware used between the Grid infrastructure and client applications is ESIP (Environment Oriented Satellite Image Processing Platform), which is based on gProcess platform [4]. In its current format, gProcess is used for launching new processes on the Grid nodes, but also for monitoring the execution status of these processes. This presentation highlights two case studies of Grid based environmental applications, GreenView and GreenLand [5]. GreenView is used in correlation with MODIS (Moderate Resolution Imaging Spectroradiometer) satellite images and meteorological datasets, in order to produce pseudo colored temperature and vegetation maps for different geographical CEE (Central Eastern Europe) regions. On the other hand, GreenLand is used for generating maps for different vegetation indexes (e.g. NDVI, EVI, SAVI, GEMI) based on Landsat satellite images. Both applications are using interpolation and random value generation algorithms, but also specific formulas for computing vegetation index values. The GreenView and GreenLand applications have been experimented over the SEE-GRID infrastructure and the performance evaluation is reported in [6]. The improvement of the execution time (obtained through a better parallelization of jobs), the extension of geographical areas to other parts of the Earth, and new user interaction techniques on spatial data and large set of satellite images are the goals of the future work. References [1] GreenView application on Wiki, http://wiki.egee-see.org/index.php/GreenView [2] SEE-GRID-SCI Project, http://www.see-grid-sci.eu/ [3] Gorgan D., Stefanut T., Bâcu V., Mihon D., Grid based Environment Application Development Methodology, SCICOM, 7th International Conference on "Large-Scale Scientific Computations", 4-8 June, 2009, Sozopol, Bulgaria, (To be published by Springer), (2009). [4] Gorgan D., Bacu V., Stefanut T., Rodila D., Mihon D., Grid based Satellite Image Processing Platform for Earth Observation Applications Development. IDAACS'2009 - IEEE Fifth International Workshop on "Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications", 21-23 September, Cosenza, Italy, IEEE Published in Computer Press, 247-252 (2009). [5] Mihon D., Bacu V., Stefanut T., Gorgan D., "Grid Based Environment Application Development - GreenView Application". ICCP2009 - IEEE 5th International Conference on Intelligent Computer Communication and Processing, 27 Aug, 2009 Cluj-Napoca. Published by IEEE Computer Press, pp. 275-282 (2009). [6] Danut Mihon, Victor Bacu, Dorian Gorgan, Róbert Mészáros, Györgyi Gelybó, Teodor Stefanut, Practical Considerations on the GreenView Application Development and Execution over SEE-GRID. SEE-GRID-SCI User Forum, 9-10 Dec 2009, Bogazici University, Istanbul, Turkey, ISBN: 978-975-403-510-0, pp. 167-175 (2009).

  4. Dynamic VM Provisioning for TORQUE in a Cloud Environment

    NASA Astrophysics Data System (ADS)

    Zhang, S.; Boland, L.; Coddington, P.; Sevior, M.

    2014-06-01

    Cloud computing, also known as an Infrastructure-as-a-Service (IaaS), is attracting more interest from the commercial and educational sectors as a way to provide cost-effective computational infrastructure. It is an ideal platform for researchers who must share common resources but need to be able to scale up to massive computational requirements for specific periods of time. This paper presents the tools and techniques developed to allow the open source TORQUE distributed resource manager and Maui cluster scheduler to dynamically integrate OpenStack cloud resources into existing high throughput computing clusters.

  5. Automatically generated code for relativistic inhomogeneous cosmologies

    NASA Astrophysics Data System (ADS)

    Bentivegna, Eloisa

    2017-02-01

    The applications of numerical relativity to cosmology are on the rise, contributing insight into such cosmological problems as structure formation, primordial phase transitions, gravitational-wave generation, and inflation. In this paper, I present the infrastructure for the computation of inhomogeneous dust cosmologies which was used recently to measure the effect of nonlinear inhomogeneity on the cosmic expansion rate. I illustrate the code's architecture, provide evidence for its correctness in a number of familiar cosmological settings, and evaluate its parallel performance for grids of up to several billion points. The code, which is available as free software, is based on the Einstein Toolkit infrastructure, and in particular leverages the automated code generation capabilities provided by its component Kranc.

  6. A Unified Data-Driven Approach for Programming In Situ Analysis and Visualization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aiken, Alex

    The placement and movement of data is becoming the key limiting factor on both performance and energy efficiency of high performance computations. As systems generate more data, it is becoming increasingly difficult to actually move that data elsewhere for post-processing, as the rate of improvements in supporting I/O infrastructure is not keeping pace. Together, these trends are creating a shift in how we think about exascale computations, from a viewpoint that focuses on FLOPS to one that focuses on data and data-centric operations as fundamental to the reasoning about, and optimization of, scientific workflows on extreme-scale architectures. The overarching goalmore » of our effort was the study of a unified data-driven approach for programming applications and in situ analysis and visualization. Our work was to understand the interplay between data-centric programming model requirements at extreme-scale and the overall impact of those requirements on the design, capabilities, flexibility, and implementation details for both applications and the supporting in situ infrastructure. In this context, we made many improvements to the Legion programming system (one of the leading data-centric models today) and demonstrated in situ analyses on real application codes using these improvements.« less

  7. A Security Monitoring Framework For Virtualization Based HEP Infrastructures

    NASA Astrophysics Data System (ADS)

    Gomez Ramirez, A.; Martinez Pedreira, M.; Grigoras, C.; Betev, L.; Lara, C.; Kebschull, U.; ALICE Collaboration

    2017-10-01

    High Energy Physics (HEP) distributed computing infrastructures require automatic tools to monitor, analyze and react to potential security incidents. These tools should collect and inspect data such as resource consumption, logs and sequence of system calls for detecting anomalies that indicate the presence of a malicious agent. They should also be able to perform automated reactions to attacks without administrator intervention. We describe a novel framework that accomplishes these requirements, with a proof of concept implementation for the ALICE experiment at CERN. We show how we achieve a fully virtualized environment that improves the security by isolating services and Jobs without a significant performance impact. We also describe a collected dataset for Machine Learning based Intrusion Prevention and Detection Systems on Grid computing. This dataset is composed of resource consumption measurements (such as CPU, RAM and network traffic), logfiles from operating system services, and system call data collected from production Jobs running in an ALICE Grid test site and a big set of malware samples. This malware set was collected from security research sites. Based on this dataset, we will proceed to develop Machine Learning algorithms able to detect malicious Jobs.

  8. Long-range interactions and parallel scalability in molecular simulations

    NASA Astrophysics Data System (ADS)

    Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

    2007-01-01

    Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.

  9. Technography and Design-Actuality Gap-Analysis of Internet Computer Technologies-Assisted Education: Western Expectations and Global Education

    ERIC Educational Resources Information Center

    Greenhalgh-Spencer, Heather; Jerbi, Moja

    2017-01-01

    In this paper, we provide a design-actuality gap-analysis of the internet infrastructure that exists in developing nations and nations in the global South with the deployed internet computer technologies (ICT)-assisted programs that are designed to use internet infrastructure to provide educational opportunities. Programs that specifically…

  10. Closing the Gap: Cybersecurity for U.S. Forces and Commands

    DTIC Science & Technology

    2017-03-30

    Dickson, Ph.D. Professor of Military Studies , JAWS Thesis Advisor Kevin Therrien, Col, USAF Committee Member Stephen Rogers, Colonel, USA Director...infrastructures, and includes the Internet, telecommunications networks, computer systems, and embedded processors and controllers in critical industries.”5...of information technology infrastructures, including the Internet, telecommunications networks, computer systems, and embedded processors and

  11. Computation Directorate Annual Report 2003

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Crawford, D L; McGraw, J R; Ashby, S F

    Big computers are icons: symbols of the culture, and of the larger computing infrastructure that exists at Lawrence Livermore. Through the collective effort of Laboratory personnel, they enable scientific discovery and engineering development on an unprecedented scale. For more than three decades, the Computation Directorate has supplied the big computers that enable the science necessary for Laboratory missions and programs. Livermore supercomputing is uniquely mission driven. The high-fidelity weapon simulation capabilities essential to the Stockpile Stewardship Program compel major advances in weapons codes and science, compute power, and computational infrastructure. Computation's activities align with this vital mission of the Departmentmore » of Energy. Increasingly, non-weapons Laboratory programs also rely on computer simulation. World-class achievements have been accomplished by LLNL specialists working in multi-disciplinary research and development teams. In these teams, Computation personnel employ a wide array of skills, from desktop support expertise, to complex applications development, to advanced research. Computation's skilled professionals make the Directorate the success that it has become. These individuals know the importance of the work they do and the many ways it contributes to Laboratory missions. They make appropriate and timely decisions that move the entire organization forward. They make Computation a leader in helping LLNL achieve its programmatic milestones. I dedicate this inaugural Annual Report to the people of Computation in recognition of their continuing contributions. I am proud that we perform our work securely and safely. Despite increased cyber attacks on our computing infrastructure from the Internet, advanced cyber security practices ensure that our computing environment remains secure. Through Integrated Safety Management (ISM) and diligent oversight, we address safety issues promptly and aggressively. The safety of our employees, whether at work or at home, is a paramount concern. Even as the Directorate meets today's supercomputing requirements, we are preparing for the future. We are investigating open-source cluster technology, the basis of our highly successful Mulitprogrammatic Capability Resource (MCR). Several breakthrough discoveries have resulted from MCR calculations coupled with theory and experiment, prompting Laboratory scientists to demand ever-greater capacity and capability. This demand is being met by a new 23-TF system, Thunder, with architecture modeled on MCR. In preparation for the ''after-next'' computer, we are researching technology even farther out on the horizon--cell-based computers. Assuming that the funding and the technology hold, we will acquire the cell-based machine BlueGene/L within the next 12 months.« less

  12. The National Information Infrastructure: Agenda for Action.

    ERIC Educational Resources Information Center

    Department of Commerce, Washington, DC. Information Infrastructure Task Force.

    The National Information Infrastructure (NII) is planned as a web of communications networks, computers, databases, and consumer electronics that will put vast amounts of information at the users' fingertips. Private sector firms are beginning to develop this infrastructure, but essential roles remain for the Federal Government. The National…

  13. The virtual machine (VM) scaler: an infrastructure manager supporting environmental modeling on IaaS clouds

    USDA-ARS?s Scientific Manuscript database

    Infrastructure-as-a-service (IaaS) clouds provide a new medium for deployment of environmental modeling applications. Harnessing advancements in virtualization, IaaS clouds can provide dynamic scalable infrastructure to better support scientific modeling computational demands. Providing scientific m...

  14. Helix Nebula: Enabling federation of existing data infrastructures and data services to an overarching cross-domain e-infrastructure

    NASA Astrophysics Data System (ADS)

    Lengert, Wolfgang; Farres, Jordi; Lanari, Riccardo; Casu, Francesco; Manunta, Michele; Lassalle-Balier, Gerard

    2014-05-01

    Helix Nebula has established a growing public private partnership of more than 30 commercial cloud providers, SMEs, and publicly funded research organisations and e-infrastructures. The Helix Nebula strategy is to establish a federated cloud service across Europe. Three high-profile flagships, sponsored by CERN (high energy physics), EMBL (life sciences) and ESA/DLR/CNES/CNR (earth science), have been deployed and extensively tested within this federated environment. The commitments behind these initial flagships have created a critical mass that attracts suppliers and users to the initiative, to work together towards an "Information as a Service" market place. Significant progress in implementing the following 4 programmatic goals (as outlined in the strategic Plan Ref.1) has been achieved: × Goal #1 Establish a Cloud Computing Infrastructure for the European Research Area (ERA) serving as a platform for innovation and evolution of the overall infrastructure. × Goal #2 Identify and adopt suitable policies for trust, security and privacy on a European-level can be provided by the European Cloud Computing framework and infrastructure. × Goal #3 Create a light-weight governance structure for the future European Cloud Computing Infrastructure that involves all the stakeholders and can evolve over time as the infrastructure, services and user-base grows. × Goal #4 Define a funding scheme involving the three stake-holder groups (service suppliers, users, EC and national funding agencies) into a Public-Private-Partnership model to implement a Cloud Computing Infrastructure that delivers a sustainable business environment adhering to European level policies. Now in 2014 a first version of this generic cross-domain e-infrastructure is ready to go into operations building on federation of European industry and contributors (data, tools, knowledge, ...). This presentation describes how Helix Nebula is being used in the domain of earth science focusing on geohazards. The so called "Supersite Exploitation Platform" (SSEP) provides scientists an overarching federated e-infrastructure with a very fast access to (i) large volume of data (EO/non-space data), (ii) computing resources (e.g. hybrid cloud/grid), (iii) processing software (e.g. toolboxes, RTMs, retrieval baselines, visualization routines), and (iv) general platform capabilities (e.g. user management and access control, accounting, information portal, collaborative tools, social networks etc.). In this federation each data provider remains in full control of the implementation of its data policy. This presentation outlines the Architecture (technical and services) supporting very heterogeneous science domains as well as the procedures for new-comers to join the Helix Nebula Market Place. Ref.1 http://cds.cern.ch/record/1374172/files/CERN-OPEN-2011-036.pdf

  15. Role of the ATLAS Grid Information System (AGIS) in Distributed Data Analysis and Simulation

    NASA Astrophysics Data System (ADS)

    Anisenkov, A. V.

    2018-03-01

    In modern high-energy physics experiments, particular attention is paid to the global integration of information and computing resources into a unified system for efficient storage and processing of experimental data. Annually, the ATLAS experiment performed at the Large Hadron Collider at the European Organization for Nuclear Research (CERN) produces tens of petabytes raw data from the recording electronics and several petabytes of data from the simulation system. For processing and storage of such super-large volumes of data, the computing model of the ATLAS experiment is based on heterogeneous geographically distributed computing environment, which includes the worldwide LHC computing grid (WLCG) infrastructure and is able to meet the requirements of the experiment for processing huge data sets and provide a high degree of their accessibility (hundreds of petabytes). The paper considers the ATLAS grid information system (AGIS) used by the ATLAS collaboration to describe the topology and resources of the computing infrastructure, to configure and connect the high-level software systems of computer centers, to describe and store all possible parameters, control, configuration, and other auxiliary information required for the effective operation of the ATLAS distributed computing applications and services. The role of the AGIS system in the development of a unified description of the computing resources provided by grid sites, supercomputer centers, and cloud computing into a consistent information model for the ATLAS experiment is outlined. This approach has allowed the collaboration to extend the computing capabilities of the WLCG project and integrate the supercomputers and cloud computing platforms into the software components of the production and distributed analysis workload management system (PanDA, ATLAS).

  16. Low cost, high performance processing of single particle cryo-electron microscopy data in the cloud

    PubMed Central

    Cianfrocco, Michael A; Leschziner, Andres E

    2015-01-01

    The advent of a new generation of electron microscopes and direct electron detectors has realized the potential of single particle cryo-electron microscopy (cryo-EM) as a technique to generate high-resolution structures. Calculating these structures requires high performance computing clusters, a resource that may be limiting to many likely cryo-EM users. To address this limitation and facilitate the spread of cryo-EM, we developed a publicly available ‘off-the-shelf’ computing environment on Amazon's elastic cloud computing infrastructure. This environment provides users with single particle cryo-EM software packages and the ability to create computing clusters with 16–480+ CPUs. We tested our computing environment using a publicly available 80S yeast ribosome dataset and estimate that laboratories could determine high-resolution cryo-EM structures for $50 to $1500 per structure within a timeframe comparable to local clusters. Our analysis shows that Amazon's cloud computing environment may offer a viable computing environment for cryo-EM. DOI: http://dx.doi.org/10.7554/eLife.06664.001 PMID:25955969

  17. 1001 Ways to run AutoDock Vina for virtual screening

    NASA Astrophysics Data System (ADS)

    Jaghoori, Mohammad Mahdi; Bleijlevens, Boris; Olabarriaga, Silvia D.

    2016-03-01

    Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.

  18. 1001 Ways to run AutoDock Vina for virtual screening.

    PubMed

    Jaghoori, Mohammad Mahdi; Bleijlevens, Boris; Olabarriaga, Silvia D

    2016-03-01

    Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.

  19. Integration of Cloud resources in the LHCb Distributed Computing

    NASA Astrophysics Data System (ADS)

    Úbeda García, Mario; Méndez Muñoz, Víctor; Stagni, Federico; Cabarrou, Baptiste; Rauschmayr, Nathalie; Charpentier, Philippe; Closier, Joel

    2014-06-01

    This contribution describes how Cloud resources have been integrated in the LHCb Distributed Computing. LHCb is using its specific Dirac extension (LHCbDirac) as an interware for its Distributed Computing. So far, it was seamlessly integrating Grid resources and Computer clusters. The cloud extension of DIRAC (VMDIRAC) allows the integration of Cloud computing infrastructures. It is able to interact with multiple types of infrastructures in commercial and institutional clouds, supported by multiple interfaces (Amazon EC2, OpenNebula, OpenStack and CloudStack) - instantiates, monitors and manages Virtual Machines running on this aggregation of Cloud resources. Moreover, specifications for institutional Cloud resources proposed by Worldwide LHC Computing Grid (WLCG), mainly by the High Energy Physics Unix Information Exchange (HEPiX) group, have been taken into account. Several initiatives and computing resource providers in the eScience environment have already deployed IaaS in production during 2013. Keeping this on mind, pros and cons of a cloud based infrasctructure have been studied in contrast with the current setup. As a result, this work addresses four different use cases which represent a major improvement on several levels of our infrastructure. We describe the solution implemented by LHCb for the contextualisation of the VMs based on the idea of Cloud Site. We report on operational experience of using in production several institutional Cloud resources that are thus becoming integral part of the LHCb Distributed Computing resources. Furthermore, we describe as well the gradual migration of our Service Infrastructure towards a fully distributed architecture following the Service as a Service (SaaS) model.

  20. Leveraging e-Science infrastructure for electrochemical research.

    PubMed

    Peachey, Tom; Mashkina, Elena; Lee, Chong-Yong; Enticott, Colin; Abramson, David; Bond, Alan M; Elton, Darrell; Gavaghan, David J; Stevenson, Gareth P; Kennedy, Gareth F

    2011-08-28

    As in many scientific disciplines, modern chemistry involves a mix of experimentation and computer-supported theory. Historically, these skills have been provided by different groups, and range from traditional 'wet' laboratory science to advanced numerical simulation. Increasingly, progress is made by global collaborations, in which new theory may be developed in one part of the world and applied and tested in the laboratory elsewhere. e-Science, or cyber-infrastructure, underpins such collaborations by providing a unified platform for accessing scientific instruments, computers and data archives, and collaboration tools. In this paper we discuss the application of advanced e-Science software tools to electrochemistry research performed in three different laboratories--two at Monash University in Australia and one at the University of Oxford in the UK. We show that software tools that were originally developed for a range of application domains can be applied to electrochemical problems, in particular Fourier voltammetry. Moreover, we show that, by replacing ad-hoc manual processes with e-Science tools, we obtain more accurate solutions automatically.

  1. The optimal design of service level agreement in IAAS based on BDIM

    NASA Astrophysics Data System (ADS)

    Liu, Xiaochen; Zhan, Zhiqiang

    2013-03-01

    Cloud Computing has become more and more prevalent over the past few years, and we have seen the importance of Infrastructure-as-a-service (IaaS). This kind of service enables scaling of bandwidth, memory, computing power and storage. But the SLA in IaaS also faces complexity and variety. Users also consider the business of the service. To meet the most users requirements, a methodology for designing optimal SLA in IaaS from the business perspectives is proposed. This method is different from the conventional SLA design method, It not only focuses on service provider perspective, also from the customer to carry on the design. This methodology better captures the linkage between service provider and service client by considering minimizing the business loss originated from performance degradation and IT infrastructure failures and maximizing profits for service provider and clients. An optimal design in an IaaS model is provided and an example are analyzed to show this approach obtain higher profit.

  2. Software Defined Networking challenges and future direction: A case study of implementing SDN features on OpenStack private cloud

    NASA Astrophysics Data System (ADS)

    Shamugam, Veeramani; Murray, I.; Leong, J. A.; Sidhu, Amandeep S.

    2016-03-01

    Cloud computing provides services on demand instantly, such as access to network infrastructure consisting of computing hardware, operating systems, network storage, database and applications. Network usage and demands are growing at a very fast rate and to meet the current requirements, there is a need for automatic infrastructure scaling. Traditional networks are difficult to automate because of the distributed nature of their decision making process for switching or routing which are collocated on the same device. Managing complex environments using traditional networks is time-consuming and expensive, especially in the case of generating virtual machines, migration and network configuration. To mitigate the challenges, network operations require efficient, flexible, agile and scalable software defined networks (SDN). This paper discuss various issues in SDN and suggests how to mitigate the network management related issues. A private cloud prototype test bed was setup to implement the SDN on the OpenStack platform to test and evaluate the various network performances provided by the various configurations.

  3. A model to forecast data centre infrastructure costs.

    NASA Astrophysics Data System (ADS)

    Vernet, R.

    2015-12-01

    The computing needs in the HEP community are increasing steadily, but the current funding situation in many countries is tight. As a consequence experiments, data centres, and funding agencies have to rationalize resource usage and expenditures. CC-IN2P3 (Lyon, France) provides computing resources to many experiments including LHC, and is a major partner for astroparticle projects like LSST, CTA or Euclid. The financial cost to accommodate all these experiments is substantial and has to be planned well in advance for funding and strategic reasons. In that perspective, leveraging infrastructure expenses, electric power cost and hardware performance observed in our site over the last years, we have built a model that integrates these data and provides estimates of the investments that would be required to cater to the experiments for the mid-term future. We present how our model is built and the expenditure forecast it produces, taking into account the experiment roadmaps. We also examine the resource growth predicted by our model over the next years assuming a flat-budget scenario.

  4. An Innovative Infrastructure with a Universal Geo-Spatiotemporal Data Representation Supporting Cost-Effective Integration of Diverse Earth Science Data

    NASA Technical Reports Server (NTRS)

    Rilee, Michael Lee; Kuo, Kwo-Sen

    2017-01-01

    The SpatioTemporal Adaptive Resolution Encoding (STARE) is a unifying scheme encoding geospatial and temporal information for organizing data on scalable computing/storage resources, minimizing expensive data transfers. STARE provides a compact representation that turns set-logic functions into integer operations, e.g. conditional sub-setting, taking into account representative spatiotemporal resolutions of the data in the datasets. STARE geo-spatiotemporally aligns data placements of diverse data on massive parallel resources to maximize performance. Automating important scientific functions (e.g. regridding) and computational functions (e.g. data placement) allows scientists to focus on domain-specific questions instead of expending their efforts and expertise on data processing. With STARE-enabled automation, SciDB (Scientific Database) plus STARE provides a database interface, reducing costly data preparation, increasing the volume and variety of interoperable data, and easing result sharing. Using SciDB plus STARE as part of an integrated analysis infrastructure dramatically eases combining diametrically different datasets.

  5. ENES the European Network for Earth System modelling and its infrastructure projects IS-ENES

    NASA Astrophysics Data System (ADS)

    Guglielmo, Francesca; Joussaume, Sylvie; Parinet, Marie

    2016-04-01

    The scientific community working on climate modelling is organized within the European Network for Earth System modelling (ENES). In the past decade, several European university departments, research centres, meteorological services, computer centres, and industrial partners engaged in the creation of ENES with the purpose of working together and cooperating towards the further development of the network, by signing a Memorandum of Understanding. As of 2015, the consortium counts 47 partners. The climate modelling community, and thus ENES, faces challenges which are both science-driven, i.e. analysing of the full complexity of the Earth System to improve our understanding and prediction of climate changes, and have multi-faceted societal implications, as a better representation of climate change on regional scales leads to improved understanding and prediction of impacts and to the development and provision of climate services. ENES, promoting and endorsing projects and initiatives, helps in developing and evaluating of state-of-the-art climate and Earth system models, facilitates model inter-comparison studies, encourages exchanges of software and model results, and fosters the use of high performance computing facilities dedicated to high-resolution multi-model experiments. ENES brings together public and private partners, integrates countries underrepresented in climate modelling studies, and reaches out to different user communities, thus enhancing European expertise and competitiveness. In this need of sophisticated models, world-class, high-performance computers, and state-of-the-art software solutions to make efficient use of models, data and hardware, a key role is played by the constitution and maintenance of a solid infrastructure, developing and providing services to the different user communities. ENES has investigated the infrastructural needs and has received funding from the EU FP7 program for the IS-ENES (InfraStructure for ENES) phase I and II projects. We present here the case study of an existing network of institutions brought together toward common goals by a non-binding agreement, ENES, and of its two IS-ENES projects. These latter will be discussed in their double role as a means to provide and/or maintain the actual infrastructure (hardware, software, skilled human resources, services) to achieve ENES scientific goals -fulfilling the aims set in a strategy document-, but also to inform and provide to the network a structured way of working and of interacting with the extended community. The genesis and evolution of the network and the interaction network/projects will also be analysed in terms of long-term sustainability.

  6. Secure dissemination of electronic healthcare records in distributed wireless environments.

    PubMed

    Belsis, Petros; Vassis, Dimitris; Skourlas, Christos; Pantziou, Grammati

    2008-01-01

    A new networking paradigm has emerged with the appearance of wireless computing. Among else ad-hoc networks, mobile and ubiquitous environments can boost the performance of systems in which they get applied. Among else, medical environments are a convenient example of their applicability. With the utilisation of wireless infrastructures, medical data may be accessible to healthcare practitioners, enabling continuous access to medical data. Due to the critical nature of medical information, the design and implementation of these infrastructures demands special treatment in order to meet specific requirements; among else, special care should be taken in order to manage interoperability, security, and in order to deal with bandwidth and hardware resource constraints that characterize the wireless topology. In this paper we present an architecture that attempts to deal with these issues; moreover, in order to prove the validity of our approach we have also evaluated the performance of our platform through simulation in different operating scenarios.

  7. Integrating operation design into infrastructure planning to foster robustness of planned water systems

    NASA Astrophysics Data System (ADS)

    Bertoni, Federica; Giuliani, Matteo; Castelletti, Andrea

    2017-04-01

    Over the past years, many studies have looked at the planning and management of water infrastructure systems as two separate problems, where the dynamic component (i.e., operations) is considered only after the static problem (i.e., planning) has been resolved. Most recent works have started to investigate planning and management as two strictly interconnected faces of the same problem, where the former is solved jointly with the latter in an integrated framework. This brings advantages to multi-purpose water reservoir systems, where several optimal operating strategies exist and similar system designs might perform differently on the long term depending on the considered short-term operating tradeoff. An operationally robust design will be therefore one performing well across multiple feasible tradeoff operating policies. This work aims at studying the interaction between short-term operating strategies and their impacts on long-term structural decisions, when long-lived infrastructures with complex ecological impacts and multi-sectoral demands to satisfy (i.e., reservoirs) are considered. A parametric reinforcement learning approach is adopted for nesting optimization and control yielding to both optimal reservoir design and optimal operational policies for water reservoir systems. The method is demonstrated on a synthetic reservoir that must be designed and operated for ensuring reliable water supply to downstream users. At first, the optimal design capacity derived is compared with the 'no-fail storage' computed through Rippl, a capacity design function that returns the minimum storage needed to satisfy specified water demands without allowing supply shortfall. Then, the optimal reservoir volume is used to simulate the simplified case study under other operating objectives than water supply, in order to assess whether and how the system performance changes. The more robust the infrastructural design, the smaller the difference between the performances of different operating strategies.

  8. Scalable Analysis Methods and In Situ Infrastructure for Extreme Scale Knowledge Discovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bethel, Wes

    2016-07-24

    The primary challenge motivating this team’s work is the widening gap between the ability to compute information and to store it for subsequent analysis. This gap adversely impacts science code teams, who are able to perform analysis only on a small fraction of the data they compute, resulting in the very real likelihood of lost or missed science, when results are computed but not analyzed. Our approach is to perform as much analysis or visualization processing on data while it is still resident in memory, an approach that is known as in situ processing. The idea in situ processing wasmore » not new at the time of the start of this effort in 2014, but efforts in that space were largely ad hoc, and there was no concerted effort within the research community that aimed to foster production-quality software tools suitable for use by DOE science projects. In large, our objective was produce and enable use of production-quality in situ methods and infrastructure, at scale, on DOE HPC facilities, though we expected to have impact beyond DOE due to the widespread nature of the challenges, which affect virtually all large-scale computational science efforts. To achieve that objective, we assembled a unique team of researchers consisting of representatives from DOE national laboratories, academia, and industry, and engaged in software technology R&D, as well as engaged in close partnerships with DOE science code teams, to produce software technologies that were shown to run effectively at scale on DOE HPC platforms.« less

  9. HPCC and the National Information Infrastructure: an overview.

    PubMed Central

    Lindberg, D A

    1995-01-01

    The National Information Infrastructure (NII) or "information superhighway" is a high-priority federal initiative to combine communications networks, computers, databases, and consumer electronics to deliver information services to all U.S. citizens. The NII will be used to improve government and social services while cutting administrative costs. Operated by the private sector, the NII will rely on advanced technologies developed under the direction of the federal High Performance Computing and Communications (HPCC) Program. These include computing systems capable of performing trillions of operations (teraops) per second and networks capable of transmitting billions of bits (gigabits) per second. Among other activities, the HPCC Program supports the national supercomputer research centers, the federal portion of the Internet, and the development of interface software, such as Mosaic, that facilitates access to network information services. Health care has been identified as a critical demonstration area for HPCC technology and an important application area for the NII. As an HPCC participant, the National Library of Medicine (NLM) assists hospitals and medical centers to connect to the Internet through projects directed by the Regional Medical Libraries and through an Internet Connections Program cosponsored by the National Science Foundation. In addition to using the Internet to provide enhanced access to its own information services, NLM sponsors health-related applications of HPCC technology. Examples include the "Visible Human" project and recently awarded contracts for test-bed networks to share patient data and medical images, telemedicine projects to provide consultation and medical care to patients in rural areas, and advanced computer simulations of human anatomy for training in "virtual surgery." PMID:7703935

  10. Network Computing Infrastructure to Share Tools and Data in Global Nuclear Energy Partnership

    NASA Astrophysics Data System (ADS)

    Kim, Guehee; Suzuki, Yoshio; Teshima, Naoya

    CCSE/JAEA (Center for Computational Science and e-Systems/Japan Atomic Energy Agency) integrated a prototype system of a network computing infrastructure for sharing tools and data to support the U.S. and Japan collaboration in GNEP (Global Nuclear Energy Partnership). We focused on three technical issues to apply our information process infrastructure, which are accessibility, security, and usability. In designing the prototype system, we integrated and improved both network and Web technologies. For the accessibility issue, we adopted SSL-VPN (Security Socket Layer-Virtual Private Network) technology for the access beyond firewalls. For the security issue, we developed an authentication gateway based on the PKI (Public Key Infrastructure) authentication mechanism to strengthen the security. Also, we set fine access control policy to shared tools and data and used shared key based encryption method to protect tools and data against leakage to third parties. For the usability issue, we chose Web browsers as user interface and developed Web application to provide functions to support sharing tools and data. By using WebDAV (Web-based Distributed Authoring and Versioning) function, users can manipulate shared tools and data through the Windows-like folder environment. We implemented the prototype system in Grid infrastructure for atomic energy research: AEGIS (Atomic Energy Grid Infrastructure) developed by CCSE/JAEA. The prototype system was applied for the trial use in the first period of GNEP.

  11. Geospatial-enabled Data Exploration and Computation through Data Infrastructure Building Blocks

    NASA Astrophysics Data System (ADS)

    Song, C. X.; Biehl, L. L.; Merwade, V.; Villoria, N.

    2015-12-01

    Geospatial data are present everywhere today with the proliferation of location-aware computing devices and sensors. This is especially true in the scientific community where large amounts of data are driving research and education activities in many domains. Collaboration over geospatial data, for example, in modeling, data analysis and visualization, must still overcome the barriers of specialized software and expertise among other challenges. The GABBs project aims at enabling broader access to geospatial data exploration and computation by developing spatial data infrastructure building blocks that leverage capabilities of end-to-end application service and virtualized computing framework in HUBzero. Funded by NSF Data Infrastructure Building Blocks (DIBBS) initiative, GABBs provides a geospatial data architecture that integrates spatial data management, mapping and visualization and will make it available as open source. The outcome of the project will enable users to rapidly create tools and share geospatial data and tools on the web for interactive exploration of data without requiring significant software development skills, GIS expertise or IT administrative privileges. This presentation will describe the development of geospatial data infrastructure building blocks and the scientific use cases that help drive the software development, as well as seek feedback from the user communities.

  12. Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

    NASA Astrophysics Data System (ADS)

    Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt; Larson, Krista; Sfiligoi, Igor; Rynge, Mats

    2014-06-01

    Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared over the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by "Big Data" will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.

  13. Cloud Bursting with GlideinWMS: Means to satisfy ever increasing computing needs for Scientific Workflows

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mhashilkar, Parag; Tiradani, Anthony; Holzman, Burt

    Scientific communities have been in the forefront of adopting new technologies and methodologies in the computing. Scientific computing has influenced how science is done today, achieving breakthroughs that were impossible to achieve several decades ago. For the past decade several such communities in the Open Science Grid (OSG) and the European Grid Infrastructure (EGI) have been using GlideinWMS to run complex application workflows to effectively share computational resources over the grid. GlideinWMS is a pilot-based workload management system (WMS) that creates on demand, a dynamically sized overlay HTCondor batch system on grid resources. At present, the computational resources shared overmore » the grid are just adequate to sustain the computing needs. We envision that the complexity of the science driven by 'Big Data' will further push the need for computational resources. To fulfill their increasing demands and/or to run specialized workflows, some of the big communities like CMS are investigating the use of cloud computing as Infrastructure-As-A-Service (IAAS) with GlideinWMS as a potential alternative to fill the void. Similarly, communities with no previous access to computing resources can use GlideinWMS to setup up a batch system on the cloud infrastructure. To enable this, the architecture of GlideinWMS has been extended to enable support for interfacing GlideinWMS with different Scientific and commercial cloud providers like HLT, FutureGrid, FermiCloud and Amazon EC2. In this paper, we describe a solution for cloud bursting with GlideinWMS. The paper describes the approach, architectural changes and lessons learned while enabling support for cloud infrastructures in GlideinWMS.« less

  14. Computational simulation of concurrent engineering for aerospace propulsion systems

    NASA Technical Reports Server (NTRS)

    Chamis, C. C.; Singhal, S. N.

    1992-01-01

    Results are summarized of an investigation to assess the infrastructure available and the technology readiness in order to develop computational simulation methods/software for concurrent engineering. These results demonstrate that development of computational simulations methods for concurrent engineering is timely. Extensive infrastructure, in terms of multi-discipline simulation, component-specific simulation, system simulators, fabrication process simulation, and simulation of uncertainties - fundamental in developing such methods, is available. An approach is recommended which can be used to develop computational simulation methods for concurrent engineering for propulsion systems and systems in general. Benefits and facets needing early attention in the development are outlined.

  15. Computational simulation for concurrent engineering of aerospace propulsion systems

    NASA Technical Reports Server (NTRS)

    Chamis, C. C.; Singhal, S. N.

    1993-01-01

    Results are summarized for an investigation to assess the infrastructure available and the technology readiness in order to develop computational simulation methods/software for concurrent engineering. These results demonstrate that development of computational simulation methods for concurrent engineering is timely. Extensive infrastructure, in terms of multi-discipline simulation, component-specific simulation, system simulators, fabrication process simulation, and simulation of uncertainties--fundamental to develop such methods, is available. An approach is recommended which can be used to develop computational simulation methods for concurrent engineering of propulsion systems and systems in general. Benefits and issues needing early attention in the development are outlined.

  16. Computational simulation for concurrent engineering of aerospace propulsion systems

    NASA Astrophysics Data System (ADS)

    Chamis, C. C.; Singhal, S. N.

    1993-02-01

    Results are summarized for an investigation to assess the infrastructure available and the technology readiness in order to develop computational simulation methods/software for concurrent engineering. These results demonstrate that development of computational simulation methods for concurrent engineering is timely. Extensive infrastructure, in terms of multi-discipline simulation, component-specific simulation, system simulators, fabrication process simulation, and simulation of uncertainties--fundamental to develop such methods, is available. An approach is recommended which can be used to develop computational simulation methods for concurrent engineering of propulsion systems and systems in general. Benefits and issues needing early attention in the development are outlined.

  17. A prototype Infrastructure for Cloud-based distributed services in High Availability over WAN

    NASA Astrophysics Data System (ADS)

    Bulfon, C.; Carlino, G.; De Salvo, A.; Doria, A.; Graziosi, C.; Pardi, S.; Sanchez, A.; Carboni, M.; Bolletta, P.; Puccio, L.; Capone, V.; Merola, L.

    2015-12-01

    In this work we present the architectural and performance studies concerning a prototype of a distributed Tier2 infrastructure for HEP, instantiated between the two Italian sites of INFN-Romal and INFN-Napoli. The network infrastructure is based on a Layer-2 geographical link, provided by the Italian NREN (GARR), directly connecting the two remote LANs of the named sites. By exploiting the possibilities offered by the new distributed file systems, a shared storage area with synchronous copy has been set up. The computing infrastructure, based on an OpenStack facility, is using a set of distributed Hypervisors installed in both sites. The main parameter to be taken into account when managing two remote sites with a single framework is the effect of the latency, due to the distance and the end-to-end service overhead. In order to understand the capabilities and limits of our setup, the impact of latency has been investigated by means of a set of stress tests, including data I/O throughput, metadata access performance evaluation and network occupancy, during the life cycle of a Virtual Machine. A set of resilience tests has also been performed, in order to verify the stability of the system on the event of hardware or software faults. The results of this work show that the reliability and robustness of the chosen architecture are effective enough to build a production system and to provide common services. This prototype can also be extended to multiple sites with small changes of the network topology, thus creating a National Network of Cloud-based distributed services, in HA over WAN.

  18. Heterogeneous access and processing of EO-Data on a Cloud based Infrastructure delivering operational Products

    NASA Astrophysics Data System (ADS)

    Niggemann, F.; Appel, F.; Bach, H.; de la Mar, J.; Schirpke, B.; Dutting, K.; Rucker, G.; Leimbach, D.

    2015-04-01

    To address the challenges of effective data handling faced by Small and Medium Sized Enterprises (SMEs) a cloud-based infrastructure for accessing and processing of Earth Observation(EO)-data has been developed within the project APPS4GMES(www.apps4gmes.de). To gain homogenous multi mission data access an Input Data Portal (IDP) been implemented on this infrastructure. The IDP consists of an Open Geospatial Consortium (OGC) conformant catalogue, a consolidation module for format conversion and an OGC-conformant ordering framework. Metadata of various EO-sources and with different standards is harvested and transferred to an OGC conformant Earth Observation Product standard and inserted into the catalogue by a Metadata Harvester. The IDP can be accessed for search and ordering of the harvested datasets by the services implemented on the cloud infrastructure. Different land-surface services have been realised by the project partners, using the implemented IDP and cloud infrastructure. Results of these are customer ready products, as well as pre-products (e.g. atmospheric corrected EO data), serving as a basis for other services. Within the IDP an automated access to ESA's Sentinel-1 Scientific Data Hub has been implemented. Searching and downloading of the SAR data can be performed in an automated way. With the implementation of the Sentinel-1 Toolbox and own software, for processing of the datasets for further use, for example for Vista's snow monitoring, delivering input for the flood forecast services, can also be performed in an automated way. For performance tests of the cloud environment a sophisticated model based atmospheric correction and pre-classification service has been implemented. Tests conducted an automated synchronised processing of one entire Landsat 8 (LS-8) coverage for Germany and performance comparisons to standard desktop systems. Results of these tests, showing a performance improvement by the factor of six, proved the high flexibility and computing power of the cloud environment. To make full use of the cloud capabilities a possibility for automated upscaling of the hardware resources has been implemented. Together with the IDP infrastructure fast and automated processing of various satellite sources to deliver market ready products can be realised, thus increasing customer needs and numbers can be satisfied without loss of accuracy and quality.

  19. A service-based BLAST command tool supported by cloud infrastructures.

    PubMed

    Carrión, Abel; Blanquer, Ignacio; Hernández, Vicente

    2012-01-01

    Notwithstanding the benefits of distributed-computing infrastructures for empowering bioinformatics analysis tools with the needed computing and storage capability, the actual use of these infrastructures is still low. Learning curves and deployment difficulties have reduced the impact on the wide research community. This article presents a porting strategy of BLAST based on a multiplatform client and a service that provides the same interface as sequential BLAST, thus reducing learning curve and with minimal impact on their integration on existing workflows. The porting has been done using the execution and data access components from the EC project Venus-C and the Windows Azure infrastructure provided in this project. The results obtained demonstrate a low overhead on the global execution framework and reasonable speed-up and cost-efficiency with respect to a sequential version.

  20. The Cloud Area Padovana: from pilot to production

    NASA Astrophysics Data System (ADS)

    Andreetto, P.; Costa, F.; Crescente, A.; Dorigo, A.; Fantinel, S.; Fanzago, F.; Sgaravatto, M.; Traldi, S.; Verlato, M.; Zangrando, L.

    2017-10-01

    The Cloud Area Padovana has been running for almost two years. This is an OpenStack-based scientific cloud, spread across two different sites: the INFN Padova Unit and the INFN Legnaro National Labs. The hardware resources have been scaled horizontally and vertically, by upgrading some hypervisors and by adding new ones: currently it provides about 1100 cores. Some in-house developments were also integrated in the OpenStack dashboard, such as a tool for user and project registrations with direct support for the INFN-AAI Identity Provider as a new option for the user authentication. In collaboration with the EU-funded Indigo DataCloud project, the integration with Docker-based containers has been experimented with and will be available in production soon. This computing facility now satisfies the computational and storage demands of more than 70 users affiliated with about 20 research projects. We present here the architecture of this Cloud infrastructure, the tools and procedures used to operate it. We also focus on the lessons learnt in these two years, describing the problems that were found and the corrective actions that had to be applied. We also discuss about the chosen strategy for upgrades, which combines the need to promptly integrate the OpenStack new developments, the demand to reduce the downtimes of the infrastructure, and the need to limit the effort requested for such updates. We also discuss how this Cloud infrastructure is being used. In particular we focus on two big physics experiments which are intensively exploiting this computing facility: CMS and SPES. CMS deployed on the cloud a complex computational infrastructure, composed of several user interfaces for job submission in the Grid environment/local batch queues or for interactive processes; this is fully integrated with the local Tier-2 facility. To avoid a static allocation of the resources, an elastic cluster, based on cernVM, has been configured: it allows to automatically create and delete virtual machines according to the user needs. SPES, using a client-server system called TraceWin, exploits INFN’s virtual resources performing a very large number of simulations on about a thousand nodes elastically managed.

  1. iTools: a framework for classification, categorization and integration of computational biology resources.

    PubMed

    Dinov, Ivo D; Rubin, Daniel; Lorensen, William; Dugan, Jonathan; Ma, Jeff; Murphy, Shawn; Kirschner, Beth; Bug, William; Sherman, Michael; Floratos, Aris; Kennedy, David; Jagadish, H V; Schmidt, Jeanette; Athey, Brian; Califano, Andrea; Musen, Mark; Altman, Russ; Kikinis, Ron; Kohane, Isaac; Delp, Scott; Parker, D Stott; Toga, Arthur W

    2008-05-28

    The advancement of the computational biology field hinges on progress in three fundamental directions--the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources--data, software tools and web-services. The iTools design, implementation and resource meta-data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu.

  2. iTools: A Framework for Classification, Categorization and Integration of Computational Biology Resources

    PubMed Central

    Dinov, Ivo D.; Rubin, Daniel; Lorensen, William; Dugan, Jonathan; Ma, Jeff; Murphy, Shawn; Kirschner, Beth; Bug, William; Sherman, Michael; Floratos, Aris; Kennedy, David; Jagadish, H. V.; Schmidt, Jeanette; Athey, Brian; Califano, Andrea; Musen, Mark; Altman, Russ; Kikinis, Ron; Kohane, Isaac; Delp, Scott; Parker, D. Stott; Toga, Arthur W.

    2008-01-01

    The advancement of the computational biology field hinges on progress in three fundamental directions – the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources–data, software tools and web-services. The iTools design, implementation and resource meta - data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu. PMID:18509477

  3. Modelling noise propagation using Grid Resources. Progress within GDI-Grid

    NASA Astrophysics Data System (ADS)

    Kiehle, Christian; Mayer, Christian; Padberg, Alexander; Stapelfeld, Hartmut

    2010-05-01

    Modelling noise propagation using Grid Resources. Progress within GDI-Grid. GDI-Grid (english: SDI-Grid) is a research project funded by the German Ministry for Science and Education (BMBF). It aims at bridging the gaps between OGC Web Services (OWS) and Grid infrastructures and identifying the potential of utilizing the superior storage capacities and computational power of grid infrastructures for geospatial applications while keeping the well-known service interfaces specified by the OGC. The project considers all major OGC webservice interfaces for Web Mapping (WMS), Feature access (Web Feature Service), Coverage access (Web Coverage Service) and processing (Web Processing Service). The major challenge within GDI-Grid is the harmonization of diverging standards as defined by standardization bodies for Grid computing and spatial information exchange. The project started in 2007 and will continue until June 2010. The concept for the gridification of OWS developed by lat/lon GmbH and the Department of Geography of the University of Bonn is applied to three real-world scenarios in order to check its practicability: a flood simulation, a scenario for emergency routing and a noise propagation simulation. The latter scenario is addressed by the Stapelfeldt Ingenieurgesellschaft mbH located in Dortmund adapting their LimA software to utilize grid resources. Noise mapping of e.g. traffic noise in urban agglomerates and along major trunk roads is a reoccurring demand of the EU Noise Directive. Input data requires road net and traffic, terrain, buildings and noise protection screens as well as population distribution. Noise impact levels are generally calculated in 10 m grid and along relevant building facades. For each receiver position sources within a typical range of 2000 m are split down into small segments, depending on local geometry. For each of the segments propagation analysis includes diffraction effects caused by all obstacles on the path of sound propagation. This immense intensive calculation needs to be performed for a major part of European landscape. A LINUX version of the commercial LimA software for noise mapping analysis has been implemented on a test cluster within the German D-GRID computer network. Results and performance indicators will be presented. The presentation is an extension to last-years presentation "Spatial Data Infrastructures and Grid Computing: the GDI-Grid project" that described the gridification concept developed in the GDI-Grid project and provided an overview of the conceptual gaps between Grid Computing and Spatial Data Infrastructures. Results from the GDI-Grid project are incorporated in the OGC-OGF (Open Grid Forum) collaboration efforts as well as the OGC WPS 2.0 standards working group developing the next major version of the WPS specification.

  4. Investigation into Cloud Computing for More Robust Automated Bulk Image Geoprocessing

    NASA Technical Reports Server (NTRS)

    Brown, Richard B.; Smoot, James C.; Underwood, Lauren; Armstrong, C. Duane

    2012-01-01

    Geospatial resource assessments frequently require timely geospatial data processing that involves large multivariate remote sensing data sets. In particular, for disasters, response requires rapid access to large data volumes, substantial storage space and high performance processing capability. The processing and distribution of this data into usable information products requires a processing pipeline that can efficiently manage the required storage, computing utilities, and data handling requirements. In recent years, with the availability of cloud computing technology, cloud processing platforms have made available a powerful new computing infrastructure resource that can meet this need. To assess the utility of this resource, this project investigates cloud computing platforms for bulk, automated geoprocessing capabilities with respect to data handling and application development requirements. This presentation is of work being conducted by Applied Sciences Program Office at NASA-Stennis Space Center. A prototypical set of image manipulation and transformation processes that incorporate sample Unmanned Airborne System data were developed to create value-added products and tested for implementation on the "cloud". This project outlines the steps involved in creating and testing of open source software developed process code on a local prototype platform, and then transitioning this code with associated environment requirements into an analogous, but memory and processor enhanced cloud platform. A data processing cloud was used to store both standard digital camera panchromatic and multi-band image data, which were subsequently subjected to standard image processing functions such as NDVI (Normalized Difference Vegetation Index), NDMI (Normalized Difference Moisture Index), band stacking, reprojection, and other similar type data processes. Cloud infrastructure service providers were evaluated by taking these locally tested processing functions, and then applying them to a given cloud-enabled infrastructure to assesses and compare environment setup options and enabled technologies. This project reviews findings that were observed when cloud platforms were evaluated for bulk geoprocessing capabilities based on data handling and application development requirements.

  5. High-reliability computing for the smarter planet

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Quinn, Heather M; Graham, Paul; Manuzzato, Andrea

    2010-01-01

    The geometric rate of improvement of transistor size and integrated circuit performance, known as Moore's Law, has been an engine of growth for our economy, enabling new products and services, creating new value and wealth, increasing safety, and removing menial tasks from our daily lives. Affordable, highly integrated components have enabled both life-saving technologies and rich entertainment applications. Anti-lock brakes, insulin monitors, and GPS-enabled emergency response systems save lives. Cell phones, internet appliances, virtual worlds, realistic video games, and mp3 players enrich our lives and connect us together. Over the past 40 years of silicon scaling, the increasing capabilities ofmore » inexpensive computation have transformed our society through automation and ubiquitous communications. In this paper, we will present the concept of the smarter planet, how reliability failures affect current systems, and methods that can be used to increase the reliable adoption of new automation in the future. We will illustrate these issues using a number of different electronic devices in a couple of different scenarios. Recently IBM has been presenting the idea of a 'smarter planet.' In smarter planet documents, IBM discusses increased computer automation of roadways, banking, healthcare, and infrastructure, as automation could create more efficient systems. A necessary component of the smarter planet concept is to ensure that these new systems have very high reliability. Even extremely rare reliability problems can easily escalate to problematic scenarios when implemented at very large scales. For life-critical systems, such as automobiles, infrastructure, medical implantables, and avionic systems, unmitigated failures could be dangerous. As more automation moves into these types of critical systems, reliability failures will need to be managed. As computer automation continues to increase in our society, the need for greater radiation reliability is necessary. Already critical infrastructure is failing too frequently. In this paper, we will introduce the Cross-Layer Reliability concept for designing more reliable computer systems.« less

  6. Automating ATLAS Computing Operations using the Site Status Board

    NASA Astrophysics Data System (ADS)

    J, Andreeva; Iglesias C, Borrego; S, Campana; Girolamo A, Di; I, Dzhunov; Curull X, Espinal; S, Gayazov; E, Magradze; M, Nowotka M.; L, Rinaldi; P, Saiz; J, Schovancova; A, Stewart G.; M, Wright

    2012-12-01

    The automation of operations is essential to reduce manpower costs and improve the reliability of the system. The Site Status Board (SSB) is a framework which allows Virtual Organizations to monitor their computing activities at distributed sites and to evaluate site performance. The ATLAS experiment intensively uses the SSB for the distributed computing shifts, for estimating data processing and data transfer efficiencies at a particular site, and for implementing automatic exclusion of sites from computing activities, in case of potential problems. The ATLAS SSB provides a real-time aggregated monitoring view and keeps the history of the monitoring metrics. Based on this history, usability of a site from the perspective of ATLAS is calculated. The paper will describe how the SSB is integrated in the ATLAS operations and computing infrastructure and will cover implementation details of the ATLAS SSB sensors and alarm system, based on the information in the SSB. It will demonstrate the positive impact of the use of the SSB on the overall performance of ATLAS computing activities and will overview future plans.

  7. The JASMIN Cloud: specialised and hybrid to meet the needs of the Environmental Sciences Community

    NASA Astrophysics Data System (ADS)

    Kershaw, Philip; Lawrence, Bryan; Churchill, Jonathan; Pritchard, Matt

    2014-05-01

    Cloud computing provides enormous opportunities for the research community. The large public cloud providers provide near-limitless scaling capability. However, adapting Cloud to scientific workloads is not without its problems. The commodity nature of the public cloud infrastructure can be at odds with the specialist requirements of the research community. Issues such as trust, ownership of data, WAN bandwidth and costing models make additional barriers to more widespread adoption. Alongside the application of public cloud for scientific applications, a number of private cloud initiatives are underway in the research community of which the JASMIN Cloud is one example. Here, cloud service models are being effectively super-imposed over more established services such as data centres, compute cluster facilities and Grids. These have the potential to deliver the specialist infrastructure needed for the science community coupled with the benefits of a Cloud service model. The JASMIN facility based at the Rutherford Appleton Laboratory was established in 2012 to support the data analysis requirements of the climate and Earth Observation community. In its first year of operation, the 5PB of available storage capacity was filled and the hosted compute capability used extensively. JASMIN has modelled the concept of a centralised large-volume data analysis facility. Key characteristics have enabled success: peta-scale fast disk connected via low latency networks to compute resources and the use of virtualisation for effective management of the resources for a range of users. A second phase is now underway funded through NERC's (Natural Environment Research Council) Big Data initiative. This will see significant expansion to the resources available with a doubling of disk-based storage to 12PB and an increase of compute capacity by a factor of ten to over 3000 processing cores. This expansion is accompanied by a broadening in the scope for JASMIN, as a service available to the entire UK environmental science community. Experience with the first phase demonstrated the range of user needs. A trade-off is needed between access privileges to resources, flexibility of use and security. This has influenced the form and types of service under development for the new phase. JASMIN will deploy a specialised private cloud organised into "Managed" and "Unmanaged" components. In the Managed Cloud, users have direct access to the storage and compute resources for optimal performance but for reasons of security, via a more restrictive PaaS (Platform-as-a-Service) interface. The Unmanaged Cloud is deployed in an isolated part of the network but co-located with the rest of the infrastructure. This enables greater liberty to tenants - full IaaS (Infrastructure-as-a-Service) capability to provision customised infrastructure - whilst at the same time protecting more sensitive parts of the system from direct access using these elevated privileges. The private cloud will be augmented with cloud-bursting capability so that it can exploit the resources available from public clouds, making it effectively a hybrid solution. A single interface will overlay the functionality of both the private cloud and external interfaces to public cloud providers giving users the flexibility to migrate resources between infrastructures as requirements dictate.

  8. Sampling Approaches for Multi-Domain Internet Performance Measurement Infrastructures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Calyam, Prasad

    2014-09-15

    The next-generation of high-performance networks being developed in DOE communities are critical for supporting current and emerging data-intensive science applications. The goal of this project is to investigate multi-domain network status sampling techniques and tools to measure/analyze performance, and thereby provide “network awareness” to end-users and network operators in DOE communities. We leverage the infrastructure and datasets available through perfSONAR, which is a multi-domain measurement framework that has been widely deployed in high-performance computing and networking communities; the DOE community is a core developer and the largest adopter of perfSONAR. Our investigations include development of semantic scheduling algorithms, measurement federationmore » policies, and tools to sample multi-domain and multi-layer network status within perfSONAR deployments. We validate our algorithms and policies with end-to-end measurement analysis tools for various monitoring objectives such as network weather forecasting, anomaly detection, and fault-diagnosis. In addition, we develop a multi-domain architecture for an enterprise-specific perfSONAR deployment that can implement monitoring-objective based sampling and that adheres to any domain-specific measurement policies.« less

  9. Enabling Data Intensive Science through Service Oriented Science: Virtual Laboratories and Science Gateways

    NASA Astrophysics Data System (ADS)

    Lescinsky, D. T.; Wyborn, L. A.; Evans, B. J. K.; Allen, C.; Fraser, R.; Rankine, T.

    2014-12-01

    We present collaborative work on a generic, modular infrastructure for virtual laboratories (VLs, similar to science gateways) that combine online access to data, scientific code, and computing resources as services that support multiple data intensive scientific computing needs across a wide range of science disciplines. We are leveraging access to 10+ PB of earth science data on Lustre filesystems at Australia's National Computational Infrastructure (NCI) Research Data Storage Infrastructure (RDSI) node, co-located with NCI's 1.2 PFlop Raijin supercomputer and a 3000 CPU core research cloud. The development, maintenance and sustainability of VLs is best accomplished through modularisation and standardisation of interfaces between components. Our approach has been to break up tightly-coupled, specialised application packages into modules, with identified best techniques and algorithms repackaged either as data services or scientific tools that are accessible across domains. The data services can be used to manipulate, visualise and transform multiple data types whilst the scientific tools can be used in concert with multiple scientific codes. We are currently designing a scalable generic infrastructure that will handle scientific code as modularised services and thereby enable the rapid/easy deployment of new codes or versions of codes. The goal is to build open source libraries/collections of scientific tools, scripts and modelling codes that can be combined in specially designed deployments. Additional services in development include: provenance, publication of results, monitoring, workflow tools, etc. The generic VL infrastructure will be hosted at NCI, but can access alternative computing infrastructures (i.e., public/private cloud, HPC).The Virtual Geophysics Laboratory (VGL) was developed as a pilot project to demonstrate the underlying technology. This base is now being redesigned and generalised to develop a Virtual Hazards Impact and Risk Laboratory (VHIRL); any enhancements and new capabilities will be incorporated into a generic VL infrastructure. At same time, we are scoping seven new VLs and in the process, identifying other common components to prioritise and focus development.

  10. Digital Rocks Portal: Preservation, Sharing, Remote Visualization and Automated Analysis of Imaged Datasets

    NASA Astrophysics Data System (ADS)

    Prodanovic, M.; Esteva, M.; Ketcham, R. A.; Hanlon, M.; Pettengill, M.; Ranganath, A.; Venkatesh, A.

    2016-12-01

    Due to advances in imaging modalities such as X-ray microtomography and scattered electron microscopy, 2D and 3D imaged datasets of rock microstructure on nanometer to centimeter length scale allow investigation of nonlinear flow and mechanical phenomena using numerical approaches. This in turn produces various upscaled parameters required by subsurface flow and deformation simulators. However, a single research group typically specializes in an imaging modality and/or related modeling on a single length scale, and lack of data-sharing infrastructure makes it difficult to integrate different length scales. We developed a sustainable, open and easy-to-use repository called the Digital Rocks Portal (http://www.digitalrocksportal.org), that (1) organizes images and related experimental measurements of different porous materials, (2) improves access to them for a wider community of geosciences or engineering researchers not necessarily trained in computer science or data analysis. Our objective is to enable scientific inquiry and engineering decisions founded on a data-driven basis. We show how the data loaded in the portal can be documented, referenced in publications via digital object identifiers, visualize and linked to other repositories. We then show preliminary results on integrating remote parallel visualization and flow simulation workflow with the pore structures currently stored in the repository. We finally discuss the issues of collecting correct metadata, data discoverability and repository sustainability. This is the first repository for this particular data, but is part of the wider ecosystem of geoscience data and model cyber-infrastructure called "Earthcube" (http://earthcube.org/) sponsored by National Science Foundation. For data sustainability and continuous access, the portal is implemented within the reliable, 24/7 maintained High Performance Computing Infrastructure supported by the Texas Advanced Computing Center (TACC) at the University of Texas at Austin. Long-term storage is provided through the University of Texas System Research Cyber-infrastructure initiative.

  11. Data Provenance Hybridization Supporting Extreme-Scale Scientific WorkflowApplications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Elsethagen, Todd O.; Stephan, Eric G.; Raju, Bibi

    As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs To gain insight and a more quantitative understanding of a workflow’s performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow’s execution. In this paper, we describe IPPD’s provenance management solution (ProvEn) andmore » its hybrid data store combining both of these data provenance perspectives.« less

  12. National Computational Infrastructure for Lattice Gauge Theory SciDAC-2 Closeout Report Indiana University Component

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gottlieb, Steven Arthur; DeTar, Carleton; Tousaint, Doug

    This is the closeout report for the Indiana University portion of the National Computational Infrastructure for Lattice Gauge Theory project supported by the United States Department of Energy under the SciDAC program. It includes information about activities at Indian University, the University of Arizona, and the University of Utah, as those three universities coordinated their activities.

  13. Deploying Crowd-Sourced Formal Verification Systems in a DoD Network

    DTIC Science & Technology

    2013-09-01

    INTENTIONALLY LEFT BLANK 1 I. INTRODUCTION A. INTRODUCTION In 2014 cyber attacks on critical infrastructure are expected to increase...CSFV systems on the Internet‒‒possibly using cloud infrastructure (Dean, 2013). By using Amazon Compute Cloud (EC2) systems, DARPA will use ordinary...through standard access methods. Those clients could be mobile phones, laptops, netbooks, tablet computers or personal digital assistants (PDAs) (Smoot

  14. Computational biomedicine: a challenge for the twenty-first century.

    PubMed

    Coveney, Peter V; Shublaq, Nour W

    2012-01-01

    With the relentless increase of computer power and the widespread availability of digital patient-specific medical data, we are now entering an era when it is becoming possible to develop predictive models of human disease and pathology, which can be used to support and enhance clinical decision-making. The approach amounts to a grand challenge to computational science insofar as we need to be able to provide seamless yet secure access to large scale heterogeneous personal healthcare data in a facile way, typically integrated into complex workflows-some parts of which may need to be run on high performance computers-in a facile way that is integrated into clinical decision support software. In this paper, we review the state of the art in terms of case studies drawn from neurovascular pathologies and HIV/AIDS. These studies are representative of a large number of projects currently being performed within the Virtual Physiological Human initiative. They make demands of information technology at many scales, from the desktop to national and international infrastructures for data storage and processing, linked by high performance networks.

  15. Cyberdyn supercomputer - a tool for imaging geodinamic processes

    NASA Astrophysics Data System (ADS)

    Pomeran, Mihai; Manea, Vlad; Besutiu, Lucian; Zlagnean, Luminita

    2014-05-01

    More and more physical processes developed within the deep interior of our planet, but with significant impact on the Earth's shape and structure, become subject to numerical modelling by using high performance computing facilities. Nowadays, worldwide an increasing number of research centers decide to make use of such powerful and fast computers for simulating complex phenomena involving fluid dynamics and get deeper insight to intricate problems of Earth's evolution. With the CYBERDYN cybernetic infrastructure (CCI), the Solid Earth Dynamics Department in the Institute of Geodynamics of the Romanian Academy boldly steps into the 21st century by entering the research area of computational geodynamics. The project that made possible this advancement, has been jointly supported by EU and Romanian Government through the Structural and Cohesion Funds. It lasted for about three years, ending October 2013. CCI is basically a modern high performance Beowulf-type supercomputer (HPCC), combined with a high performance visualization cluster (HPVC) and a GeoWall. The infrastructure is mainly structured around 1344 cores and 3 TB of RAM. The high speed interconnect is provided by a Qlogic InfiniBand switch, able to transfer up to 40 Gbps. The CCI storage component is a 40 TB Panasas NAS. The operating system is Linux (CentOS). For control and maintenance, the Bright Cluster Manager package is used. The SGE job scheduler manages the job queues. CCI has been designed for a theoretical peak performance up to 11.2 TFlops. Speed tests showed that a high resolution numerical model (256 × 256 × 128 FEM elements) could be resolved with a mean computational speed of 1 time step at 30 seconds, by employing only a fraction of the computing power (20%). After passing the mandatory tests, the CCI has been involved in numerical modelling of various scenarios related to the East Carpathians tectonic and geodynamic evolution, including the Neogene magmatic activity, and the intriguing intermediate-depth seismicity within the so-called Vrancea zone. The CFD code for numerical modelling is CitcomS, a widely employed open source package specifically developed for earth sciences. Several preliminary 3D geodynamic models for simulating an assumed subduction or the effect of a mantle plume will be presented and discussed.

  16. Optimization and parallelization of the thermal–hydraulic subchannel code CTF for high-fidelity multi-physics applications

    DOE PAGES

    Salko, Robert K.; Schmidt, Rodney C.; Avramova, Maria N.

    2014-11-23

    This study describes major improvements to the computational infrastructure of the CTF subchannel code so that full-core, pincell-resolved (i.e., one computational subchannel per real bundle flow channel) simulations can now be performed in much shorter run-times, either in stand-alone mode or as part of coupled-code multi-physics calculations. These improvements support the goals of the Department Of Energy Consortium for Advanced Simulation of Light Water Reactors (CASL) Energy Innovation Hub to develop high fidelity multi-physics simulation tools for nuclear energy design and analysis.

  17. High Performance Computing and Visualization Infrastructure for Simultaneous Parallel Computing and Parallel Visualization Research

    DTIC Science & Technology

    2016-11-09

    Total Number: Sub Contractors (DD882) Names of Personnel receiving masters degrees Names of personnel receiving PHDs Names of other research staff...Broadcom 5720 QP 1Gb Network Daughter Card (2) Intel Xeon E5-2680 v3 2.5GHz, 30M Cache, 9.60GT/s QPI, Turbo, HT , 12C/24T (120W...Broadcom 5720 QP 1Gb Network Daughter Card (2) Intel Xeon E5-2680 v3 2.5GHz, 30M Cache, 9.60GT/s QPI, Turbo, HT , 12C/24T (120W

  18. High-Performance I/O: HDF5 for Lattice QCD

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kurth, Thorsten; Pochinsky, Andrew; Sarje, Abhinav

    2015-01-01

    Practitioners of lattice QCD/QFT have been some of the primary pioneer users of the state-of-the-art high-performance-computing systems, and contribute towards the stress tests of such new machines as soon as they become available. As with all aspects of high-performance-computing, I/O is becoming an increasingly specialized component of these systems. In order to take advantage of the latest available high-performance I/O infrastructure, to ensure reliability and backwards compatibility of data files, and to help unify the data structures used in lattice codes, we have incorporated parallel HDF5 I/O into the SciDAC supported USQCD software stack. Here we present the design andmore » implementation of this I/O framework. Our HDF5 implementation outperforms optimized QIO at the 10-20% level and leaves room for further improvement by utilizing appropriate dataset chunking.« less

  19. High-Performance I/O: HDF5 for Lattice QCD

    DOE PAGES

    Kurth, Thorsten; Pochinsky, Andrew; Sarje, Abhinav; ...

    2017-05-09

    Practitioners of lattice QCD/QFT have been some of the primary pioneer users of the state-of-the-art high-performance-computing systems, and contribute towards the stress tests of such new machines as soon as they become available. As with all aspects of high-performance-computing, I/O is becoming an increasingly specialized component of these systems. In order to take advantage of the latest available high-performance I/O infrastructure, to ensure reliability and backwards compatibility of data files, and to help unify the data structures used in lattice codes, we have incorporated parallel HDF5 I/O into the SciDAC supported USQCD software stack. Here we present the design andmore » implementation of this I/O framework. Our HDF5 implementation outperforms optimized QIO at the 10-20% level and leaves room for further improvement by utilizing appropriate dataset chunking.« less

  20. Exploring Infiniband Hardware Virtualization in OpenNebula towards Efficient High-Performance Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pais Pitta de Lacerda Ruivo, Tiago; Bernabeu Altayo, Gerard; Garzoglio, Gabriele

    2014-11-11

    has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an OpenNebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the FermiCloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linux’s Hypervisor as well as the OpenNebula manager. We evaluated the performance of the hardware virtualization on up to 56more » virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as w a MPI-intensive application (the HPL Linpack benchmark).« less

  1. People at risk - nexus critical infrastructure and society

    NASA Astrophysics Data System (ADS)

    Heiser, Micha; Thaler, Thomas; Fuchs, Sven

    2016-04-01

    Strategic infrastructure networks include the highly complex and interconnected systems that are so vital to a city or state that any sudden disruption can result in debilitating impacts on human life, the economy and the society as a whole. Recently, various studies have applied complex network-based models to study the performance and vulnerability of infrastructure systems under various types of attacks and hazards - a major part of them is, particularly after the 9/11 incident, related to terrorism attacks. Here, vulnerability is generally defined as the performance drop of an infrastructure system under a given disruptive event. The performance can be measured by different metrics, which correspond to various levels of resilience. In this paper, we will address vulnerability and exposure of critical infrastructure in the Eastern Alps. The Federal State Tyrol is an international transport route and an essential component of the north-south transport connectivity in Europe. Any interruption of the transport flow leads to incommensurable consequences in terms of indirect losses, since the system does not feature redundant elements at comparable economic efficiency. Natural hazard processes such as floods, debris flows, rock falls and avalanches, endanger this infrastructure line, such as large flood events in 2005 or 2012, rock falls 2014, which had strong impacts to the critical infrastructure, such as disruption of the railway lines (in 2005 and 2012), highways and motorways (in 2014). The aim of this paper is to present how critical infrastructures as well as communities and societies are vulnerable and can be resilient against natural hazard risks and the relative cascading effects to different compartments (industrial, infrastructural, societal, institutional, cultural, etc.), which is the dominant by the type of hazard (avalanches, torrential flooding, debris flow, rock falls). Specific themes will be addressed in various case studies to allow cross-learning and cross-comparison of, for example rural and urban areas, and different scales. Correspondingly, scale-specific resilience indicators and metrics will be developed to tailor methods to specific needs according to the scale of assessment (micro/local and macro/regional) and to the type of infrastructure. The traditional indicators normally used in structural analysis are not sufficient to understand how events happening on the networks can have cascading consequences. Moreover, effects have multidimensional (technical, economic, organizational and human), multiscale (micro and macro) and temporal characteristics (short- to long-term incidence). These considerations will guide to different activities: 1) computation of classic structural analysis indicators on the case studies in order to obtain an identity of the transport infrastructure and; 2) development of a set of new measures of resilience. To mitigate natural hazard risk a large amount of protection measures of different typology have been constructed following inhomogeneous reliability standards. The focus of this case study will be on resilience issues and decision making in the context of a large scale sectorial approach focused on transport infrastructure network.

  2. From the CMS Computing Experience in the WLCG STEP'09 Challenge to the First Data Taking of the LHC Era

    NASA Astrophysics Data System (ADS)

    Bonacorsi, D.; Gutsche, O.

    The Worldwide LHC Computing Grid (WLCG) project decided in March 2009 to perform scale tests of parts of its overall Grid infrastructure before the start of the LHC data taking. The "Scale Test for the Experiment Program" (STEP'09) was performed mainly in June 2009 -with more selected tests in September- October 2009 -and emphasized the simultaneous test of the computing systems of all 4 LHC experiments. CMS tested its Tier-0 tape writing and processing capabilities. The Tier-1 tape systems were stress tested using the complete range of Tier-1 work-flows: transfer from Tier-0 and custody of data on tape, processing and subsequent archival, redistribution of datasets amongst all Tier-1 sites as well as burst transfers of datasets to Tier-2 sites. The Tier-2 analysis capacity was tested using bulk analysis job submissions to backfill normal user activity. In this talk, we will report on the different performed tests and present their post-mortem analysis.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kurth, Thorsten; Pochinsky, Andrew; Sarje, Abhinav

    Practitioners of lattice QCD/QFT have been some of the primary pioneer users of the state-of-the-art high-performance-computing systems, and contribute towards the stress tests of such new machines as soon as they become available. As with all aspects of high-performance-computing, I/O is becoming an increasingly specialized component of these systems. In order to take advantage of the latest available high-performance I/O infrastructure, to ensure reliability and backwards compatibility of data files, and to help unify the data structures used in lattice codes, we have incorporated parallel HDF5 I/O into the SciDAC supported USQCD software stack. Here we present the design andmore » implementation of this I/O framework. Our HDF5 implementation outperforms optimized QIO at the 10-20% level and leaves room for further improvement by utilizing appropriate dataset chunking.« less

  4. LXtoo: an integrated live Linux distribution for the bioinformatics community

    PubMed Central

    2012-01-01

    Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356

  5. LXtoo: an integrated live Linux distribution for the bioinformatics community.

    PubMed

    Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu

    2012-07-19

    Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.

  6. Scaling the CERN OpenStack cloud

    NASA Astrophysics Data System (ADS)

    Bell, T.; Bompastor, B.; Bukowiec, S.; Castro Leon, J.; Denis, M. K.; van Eldik, J.; Fermin Lobo, M.; Fernandez Alvarez, L.; Fernandez Rodriguez, D.; Marino, A.; Moreira, B.; Noel, B.; Oulevey, T.; Takase, W.; Wiebalck, A.; Zilli, S.

    2015-12-01

    CERN has been running a production OpenStack cloud since July 2013 to support physics computing and infrastructure services for the site. In the past year, CERN Cloud Infrastructure has seen a constant increase in nodes, virtual machines, users and projects. This paper will present what has been done in order to make the CERN cloud infrastructure scale out.

  7. Science of Security Lablet - Scalability and Usability

    DTIC Science & Technology

    2014-12-16

    mobile computing [19]. However, the high-level infrastructure design and our own implementation (both described throughout this paper) can easily...critical and infrastructural systems demands high levels of sophistication in the technical aspects of cybersecurity, software and hardware design...Forget, S. Komanduri, Alessandro Acquisti, Nicolas Christin, Lorrie Cranor, Rahul Telang. "Security Behavior Observatory: Infrastructure for Long-term

  8. Role of Computational Fluid Dynamics and Wind Tunnels in Aeronautics R and D

    NASA Technical Reports Server (NTRS)

    Malik, Murjeeb R.; Bushnell, Dennis M.

    2012-01-01

    The purpose of this report is to investigate the status and future projections for the question of supplantation of wind tunnels by computation in design and to intuit the potential impact of computation approaches on wind-tunnel utilization all with an eye toward reducing the infrastructure cost at aeronautics R&D centers. Wind tunnels have been closing for myriad reasons, and such closings have reduced infrastructure costs. Further cost reductions are desired, and the work herein attempts to project which wind-tunnel capabilities can be replaced in the future and, if possible, the timing of such. If the possibility exists to project when a facility could be closed, then maintenance and other associated costs could be rescheduled accordingly (i.e., before the fact) to obtain an even greater infrastructure cost reduction.

  9. A Computational framework for telemedicine.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Foster, I.; von Laszewski, G.; Thiruvathukal, G. K.

    1998-07-01

    Emerging telemedicine applications require the ability to exploit diverse and geographically distributed resources. Highspeed networks are used to integrate advanced visualization devices, sophisticated instruments, large databases, archival storage devices, PCs, workstations, and supercomputers. This form of telemedical environment is similar to networked virtual supercomputers, also known as metacomputers. Metacomputers are already being used in many scientific application areas. In this article, we analyze requirements necessary for a telemedical computing infrastructure and compare them with requirements found in a typical metacomputing environment. We will show that metacomputing environments can be used to enable a more powerful and unified computational infrastructure formore » telemedicine. The Globus metacomputing toolkit can provide the necessary low level mechanisms to enable a large scale telemedical infrastructure. The Globus toolkit components are designed in a modular fashion and can be extended to support the specific requirements for telemedicine.« less

  10. Evolution of a Materials Data Infrastructure

    NASA Astrophysics Data System (ADS)

    Warren, James A.; Ward, Charles H.

    2018-06-01

    The field of materials science and engineering is writing a new chapter in its evolution, one of digitally empowered materials discovery, development, and deployment. The 2008 Integrated Computational Materials Engineering (ICME) study report helped usher in this paradigm shift, making a compelling case and strong recommendations for an infrastructure supporting ICME that would enable access to precompetitive materials data for both scientific and engineering applications. With the launch of the Materials Genome Initiative in 2011, which drew substantial inspiration from the ICME study, digital data was highlighted as a core component of a Materials Innovation Infrastructure, along with experimental and computational tools. Over the past 10 years, our understanding of what it takes to provide accessible materials data has matured and rapid progress has been made in establishing a Materials Data Infrastructure (MDI). We are learning that the MDI is essential to eliminating the seams between experiment and computation by providing a means for them to connect effortlessly. Additionally, the MDI is becoming an enabler, allowing materials engineering to tie into a much broader model-based engineering enterprise for product design.

  11. Advancing Cyberinfrastructure to support high resolution water resources modeling

    NASA Astrophysics Data System (ADS)

    Tarboton, D. G.; Ogden, F. L.; Jones, N.; Horsburgh, J. S.

    2012-12-01

    Addressing the problem of how the availability and quality of water resources at large scales are sensitive to climate variability, watershed alterations and management activities requires computational resources that combine data from multiple sources and support integrated modeling. Related cyberinfrastructure challenges include: 1) how can we best structure data and computer models to address this scientific problem through the use of high-performance and data-intensive computing, and 2) how can we do this in a way that discipline scientists without extensive computational and algorithmic knowledge and experience can take advantage of advances in cyberinfrastructure? This presentation will describe a new system called CI-WATER that is being developed to address these challenges and advance high resolution water resources modeling in the Western U.S. We are building on existing tools that enable collaboration to develop model and data interfaces that link integrated system models running within an HPC environment to multiple data sources. Our goal is to enhance the use of computational simulation and data-intensive modeling to better understand water resources. Addressing water resource problems in the Western U.S. requires simulation of natural and engineered systems, as well as representation of legal (water rights) and institutional constraints alongside the representation of physical processes. We are establishing data services to represent the engineered infrastructure and legal and institutional systems in a way that they can be used with high resolution multi-physics watershed modeling at high spatial resolution. These services will enable incorporation of location-specific information on water management infrastructure and systems into the assessment of regional water availability in the face of growing demands, uncertain future meteorological forcings, and existing prior-appropriations water rights. This presentation will discuss the informatics challenges involved with data management and easy-to-use access to high performance computing being tackled in this project.

  12. Information technology developments within the national biological information infrastructure

    USGS Publications Warehouse

    Cotter, G.; Frame, M.T.

    2000-01-01

    Looking out an office window or exploring a community park, one can easily see the tremendous challenges that biological information presents the computer science community. Biological information varies in format and content depending whether or not it is information pertaining to a particular species (i.e. Brown Tree Snake), or a specific ecosystem, which often includes multiple species, land use characteristics, and geospatially referenced information. The complexity and uniqueness of each individual species or ecosystem do not easily lend themselves to today's computer science tools and applications. To address the challenges that the biological enterprise presents the National Biological Information Infrastructure (NBII) (http://www.nbii.gov) was established in 1993. The NBII is designed to address these issues on a National scale within the United States, and through international partnerships abroad. This paper discusses current computer science efforts within the National Biological Information Infrastructure Program and future computer science research endeavors that are needed to address the ever-growing issues related to our Nation's biological concerns.

  13. Use of agents to implement an integrated computing environment

    NASA Technical Reports Server (NTRS)

    Hale, Mark A.; Craig, James I.

    1995-01-01

    Integrated Product and Process Development (IPPD) embodies the simultaneous application to both system and quality engineering methods throughout an iterative design process. The use of IPPD results in the time-conscious, cost-saving development of engineering systems. To implement IPPD, a Decision-Based Design perspective is encapsulated in an approach that focuses on the role of the human designer in product development. The approach has two parts and is outlined in this paper. First, an architecture, called DREAMS, is being developed that facilitates design from a decision-based perspective. Second, a supporting computing infrastructure, called IMAGE, is being designed. Agents are used to implement the overall infrastructure on the computer. Successful agent utilization requires that they be made of three components: the resource, the model, and the wrap. Current work is focused on the development of generalized agent schemes and associated demonstration projects. When in place, the technology independent computing infrastructure will aid the designer in systematically generating knowledge used to facilitate decision-making.

  14. Development of Measures to Assess Product Modularity and Reconfigurability

    DTIC Science & Technology

    2010-03-01

    mission needs. For example, a thermal blanket is the only “module” currently being used to control spacecraft temperature (i.e. no active cooling). If...infrastructure, and thermal control. The spacecraft components include the autonomous flight software; the quantity of high- performance computing; power... thermal requirements are satisfied using this thermal blanket , then there may not be a need for active cooling to improve the thermal range of the

  15. Computational Materials Science and Chemistry: Accelerating Discovery and Innovation through Simulation-Based Engineering and Science

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Crabtree, George; Glotzer, Sharon; McCurdy, Bill

    This report is based on a SC Workshop on Computational Materials Science and Chemistry for Innovation on July 26-27, 2010, to assess the potential of state-of-the-art computer simulations to accelerate understanding and discovery in materials science and chemistry, with a focus on potential impacts in energy technologies and innovation. The urgent demand for new energy technologies has greatly exceeded the capabilities of today's materials and chemical processes. To convert sunlight to fuel, efficiently store energy, or enable a new generation of energy production and utilization technologies requires the development of new materials and processes of unprecedented functionality and performance. Newmore » materials and processes are critical pacing elements for progress in advanced energy systems and virtually all industrial technologies. Over the past two decades, the United States has developed and deployed the world's most powerful collection of tools for the synthesis, processing, characterization, and simulation and modeling of materials and chemical systems at the nanoscale, dimensions of a few atoms to a few hundred atoms across. These tools, which include world-leading x-ray and neutron sources, nanoscale science facilities, and high-performance computers, provide an unprecedented view of the atomic-scale structure and dynamics of materials and the molecular-scale basis of chemical processes. For the first time in history, we are able to synthesize, characterize, and model materials and chemical behavior at the length scale where this behavior is controlled. This ability is transformational for the discovery process and, as a result, confers a significant competitive advantage. Perhaps the most spectacular increase in capability has been demonstrated in high performance computing. Over the past decade, computational power has increased by a factor of a million due to advances in hardware and software. This rate of improvement, which shows no sign of abating, has enabled the development of computer simulations and models of unprecedented fidelity. We are at the threshold of a new era where the integrated synthesis, characterization, and modeling of complex materials and chemical processes will transform our ability to understand and design new materials and chemistries with predictive power. In turn, this predictive capability will transform technological innovation by accelerating the development and deployment of new materials and processes in products and manufacturing. Harnessing the potential of computational science and engineering for the discovery and development of materials and chemical processes is essential to maintaining leadership in these foundational fields that underpin energy technologies and industrial competitiveness. Capitalizing on the opportunities presented by simulation-based engineering and science in materials and chemistry will require an integration of experimental capabilities with theoretical and computational modeling; the development of a robust and sustainable infrastructure to support the development and deployment of advanced computational models; and the assembly of a community of scientists and engineers to implement this integration and infrastructure. This community must extend to industry, where incorporating predictive materials science and chemistry into design tools can accelerate the product development cycle and drive economic competitiveness. The confluence of new theories, new materials synthesis capabilities, and new computer platforms has created an unprecedented opportunity to implement a "materials-by-design" paradigm with wide-ranging benefits in technological innovation and scientific discovery. The Workshop on Computational Materials Science and Chemistry for Innovation was convened in Bethesda, Maryland, on July 26-27, 2010. Sponsored by the Department of Energy (DOE) Offices of Advanced Scientific Computing Research and Basic Energy Sciences, the workshop brought together 160 experts in materials science, chemistry, and computational science representing more than 65 universities, laboratories, and industries, and four agencies. The workshop examined seven foundational challenge areas in materials science and chemistry: materials for extreme conditions, self-assembly, light harvesting, chemical reactions, designer fluids, thin films and interfaces, and electronic structure. Each of these challenge areas is critical to the development of advanced energy systems, and each can be accelerated by the integrated application of predictive capability with theory and experiment. The workshop concluded that emerging capabilities in predictive modeling and simulation have the potential to revolutionize the development of new materials and chemical processes. Coupled with world-leading materials characterization and nanoscale science facilities, this predictive capability provides the foundation for an innovation ecosystem that can accelerate the discovery, development, and deployment of new technologies, including advanced energy systems. Delivering on the promise of this innovation ecosystem requires the following: Integration of synthesis, processing, characterization, theory, and simulation and modeling. Many of the newly established Energy Frontier Research Centers and Energy Hubs are exploiting this integration. Achieving/strengthening predictive capability in foundational challenge areas. Predictive capability in the seven foundational challenge areas described in this report is critical to the development of advanced energy technologies. Developing validated computational approaches that span vast differences in time and length scales. This fundamental computational challenge crosscuts all of the foundational challenge areas. Similarly challenging is coupling of analytical data from multiple instruments and techniques that are required to link these length and time scales. Experimental validation and quantification of uncertainty in simulation and modeling. Uncertainty quantification becomes increasingly challenging as simulations become more complex. Robust and sustainable computational infrastructure, including software and applications. For modeling and simulation, software equals infrastructure. To validate the computational tools, software is critical infrastructure that effectively translates huge arrays of experimental data into useful scientific understanding. An integrated approach for managing this infrastructure is essential. Efficient transfer and incorporation of simulation-based engineering and science in industry. Strategies for bridging the gap between research and industrial applications and for widespread industry adoption of integrated computational materials engineering are needed.« less

  16. Intelligent systems technology infrastructure for integrated systems

    NASA Technical Reports Server (NTRS)

    Lum, Henry

    1991-01-01

    A system infrastructure must be properly designed and integrated from the conceptual development phase to accommodate evolutionary intelligent technologies. Several technology development activities were identified that may have application to rendezvous and capture systems. Optical correlators in conjunction with fuzzy logic control might be used for the identification, tracking, and capture of either cooperative or non-cooperative targets without the intensive computational requirements associated with vision processing. A hybrid digital/analog system was developed and tested with a robotic arm. An aircraft refueling application demonstration is planned within two years. Initially this demonstration will be ground based with a follow-on air based demonstration. System dependability measurement and modeling techniques are being developed for fault management applications. This involves usage of incremental solution/evaluation techniques and modularized systems to facilitate reuse and to take advantage of natural partitions in system models. Though not yet commercially available and currently subject to accuracy limitations, technology is being developed to perform optical matrix operations to enhance computational speed. Optical terrain recognition using camera image sequencing processed with optical correlators is being developed to determine position and velocity in support of lander guidance. The system is planned for testing in conjunction with Dryden Flight Research Facility. Advanced architecture technology is defining open architecture design constraints, test bed concepts (processors, multiple hardware/software and multi-dimensional user support, knowledge/tool sharing infrastructure), and software engineering interface issues.

  17. High-throughput neuroimaging-genetics computational infrastructure

    PubMed Central

    Dinov, Ivo D.; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Hobel, Sam; Vespa, Paul; Woo Moon, Seok; Van Horn, John D.; Franco, Joseph; Toga, Arthur W.

    2014-01-01

    Many contemporary neuroscientific investigations face significant challenges in terms of data management, computational processing, data mining, and results interpretation. These four pillars define the core infrastructure necessary to plan, organize, orchestrate, validate, and disseminate novel scientific methods, computational resources, and translational healthcare findings. Data management includes protocols for data acquisition, archival, query, transfer, retrieval, and aggregation. Computational processing involves the necessary software, hardware, and networking infrastructure required to handle large amounts of heterogeneous neuroimaging, genetics, clinical, and phenotypic data and meta-data. Data mining refers to the process of automatically extracting data features, characteristics and associations, which are not readily visible by human exploration of the raw dataset. Result interpretation includes scientific visualization, community validation of findings and reproducible findings. In this manuscript we describe the novel high-throughput neuroimaging-genetics computational infrastructure available at the Institute for Neuroimaging and Informatics (INI) and the Laboratory of Neuro Imaging (LONI) at University of Southern California (USC). INI and LONI include ultra-high-field and standard-field MRI brain scanners along with an imaging-genetics database for storing the complete provenance of the raw and derived data and meta-data. In addition, the institute provides a large number of software tools for image and shape analysis, mathematical modeling, genomic sequence processing, and scientific visualization. A unique feature of this architecture is the Pipeline environment, which integrates the data management, processing, transfer, and visualization. Through its client-server architecture, the Pipeline environment provides a graphical user interface for designing, executing, monitoring validating, and disseminating of complex protocols that utilize diverse suites of software tools and web-services. These pipeline workflows are represented as portable XML objects which transfer the execution instructions and user specifications from the client user machine to remote pipeline servers for distributed computing. Using Alzheimer's and Parkinson's data, we provide several examples of translational applications using this infrastructure1. PMID:24795619

  18. A Cyber-ITS Framework for Massive Traffic Data Analysis Using Cyber Infrastructure

    PubMed Central

    Fontaine, Michael D.

    2013-01-01

    Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing. PMID:23766690

  19. A Cyber-ITS framework for massive traffic data analysis using cyber infrastructure.

    PubMed

    Xia, Yingjie; Hu, Jia; Fontaine, Michael D

    2013-01-01

    Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing.

  20. 32 CFR 240.4 - Policy.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... enterprise information infrastructure requirements. (c) The academic disciplines, with concentrations in IA..., computer systems analysis, cyber operations, cybersecurity, database administration, data management... infrastructure development and academic research to support the DoD IA/IT critical areas of interest. ...

  1. 32 CFR 240.4 - Policy.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... enterprise information infrastructure requirements. (c) The academic disciplines, with concentrations in IA..., computer systems analysis, cyber operations, cybersecurity, database administration, data management... infrastructure development and academic research to support the DoD IA/IT critical areas of interest. ...

  2. 32 CFR 240.4 - Policy.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... enterprise information infrastructure requirements. (c) The academic disciplines, with concentrations in IA..., computer systems analysis, cyber operations, cybersecurity, database administration, data management... infrastructure development and academic research to support the DoD IA/IT critical areas of interest. ...

  3. Role of information systems in controlling costs: the electronic medical record (EMR) and the high-performance computing and communications (HPCC) efforts

    NASA Astrophysics Data System (ADS)

    Kun, Luis G.

    1994-12-01

    On October 18, 1991, the IEEE-USA produced an entity statement which endorsed the vital importance of the High Performance Computer and Communications Act of 1991 (HPCC) and called for the rapid implementation of all its elements. Efforts are now underway to develop a Computer Based Patient Record (CBPR), the National Information Infrastructure (NII) as part of the HPCC, and the so-called `Patient Card'. Multiple legislative initiatives which address these and related information technology issues are pending in Congress. Clearly, a national information system will greatly affect the way health care delivery is provided to the United States public. Timely and reliable information represents a critical element in any initiative to reform the health care system as well as to protect and improve the health of every person. Appropriately used, information technologies offer a vital means of improving the quality of patient care, increasing access to universal care and lowering overall costs within a national health care program. Health care reform legislation should reflect increased budgetary support and a legal mandate for the creation of a national health care information system by: (1) constructing a National Information Infrastructure; (2) building a Computer Based Patient Record System; (3) bringing the collective resources of our National Laboratories to bear in developing and implementing the NII and CBPR, as well as a security system with which to safeguard the privacy rights of patients and the physician-patient privilege; and (4) utilizing Government (e.g. DOD, DOE) capabilities (technology and human resources) to maximize resource utilization, create new jobs and accelerate technology transfer to address health care issues.

  4. Probability Distributome: A Web Computational Infrastructure for Exploring the Properties, Interrelations, and Applications of Probability Distributions.

    PubMed

    Dinov, Ivo D; Siegrist, Kyle; Pearl, Dennis K; Kalinin, Alexandr; Christou, Nicolas

    2016-06-01

    Probability distributions are useful for modeling, simulation, analysis, and inference on varieties of natural processes and physical phenomena. There are uncountably many probability distributions. However, a few dozen families of distributions are commonly defined and are frequently used in practice for problem solving, experimental applications, and theoretical studies. In this paper, we present a new computational and graphical infrastructure, the Distributome , which facilitates the discovery, exploration and application of diverse spectra of probability distributions. The extensible Distributome infrastructure provides interfaces for (human and machine) traversal, search, and navigation of all common probability distributions. It also enables distribution modeling, applications, investigation of inter-distribution relations, as well as their analytical representations and computational utilization. The entire Distributome framework is designed and implemented as an open-source, community-built, and Internet-accessible infrastructure. It is portable, extensible and compatible with HTML5 and Web2.0 standards (http://Distributome.org). We demonstrate two types of applications of the probability Distributome resources: computational research and science education. The Distributome tools may be employed to address five complementary computational modeling applications (simulation, data-analysis and inference, model-fitting, examination of the analytical, mathematical and computational properties of specific probability distributions, and exploration of the inter-distributional relations). Many high school and college science, technology, engineering and mathematics (STEM) courses may be enriched by the use of modern pedagogical approaches and technology-enhanced methods. The Distributome resources provide enhancements for blended STEM education by improving student motivation, augmenting the classical curriculum with interactive webapps, and overhauling the learning assessment protocols.

  5. Probability Distributome: A Web Computational Infrastructure for Exploring the Properties, Interrelations, and Applications of Probability Distributions

    PubMed Central

    Dinov, Ivo D.; Siegrist, Kyle; Pearl, Dennis K.; Kalinin, Alexandr; Christou, Nicolas

    2015-01-01

    Probability distributions are useful for modeling, simulation, analysis, and inference on varieties of natural processes and physical phenomena. There are uncountably many probability distributions. However, a few dozen families of distributions are commonly defined and are frequently used in practice for problem solving, experimental applications, and theoretical studies. In this paper, we present a new computational and graphical infrastructure, the Distributome, which facilitates the discovery, exploration and application of diverse spectra of probability distributions. The extensible Distributome infrastructure provides interfaces for (human and machine) traversal, search, and navigation of all common probability distributions. It also enables distribution modeling, applications, investigation of inter-distribution relations, as well as their analytical representations and computational utilization. The entire Distributome framework is designed and implemented as an open-source, community-built, and Internet-accessible infrastructure. It is portable, extensible and compatible with HTML5 and Web2.0 standards (http://Distributome.org). We demonstrate two types of applications of the probability Distributome resources: computational research and science education. The Distributome tools may be employed to address five complementary computational modeling applications (simulation, data-analysis and inference, model-fitting, examination of the analytical, mathematical and computational properties of specific probability distributions, and exploration of the inter-distributional relations). Many high school and college science, technology, engineering and mathematics (STEM) courses may be enriched by the use of modern pedagogical approaches and technology-enhanced methods. The Distributome resources provide enhancements for blended STEM education by improving student motivation, augmenting the classical curriculum with interactive webapps, and overhauling the learning assessment protocols. PMID:27158191

  6. A Primer on High-Throughput Computing for Genomic Selection

    PubMed Central

    Wu, Xiao-Lin; Beissinger, Timothy M.; Bauck, Stewart; Woodward, Brent; Rosa, Guilherme J. M.; Weigel, Kent A.; Gatti, Natalia de Leon; Gianola, Daniel

    2011-01-01

    High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. PMID:22303303

  7. INFN-Pisa scientific computation environment (GRID, HPC and Interactive Analysis)

    NASA Astrophysics Data System (ADS)

    Arezzini, S.; Carboni, A.; Caruso, G.; Ciampa, A.; Coscetti, S.; Mazzoni, E.; Piras, S.

    2014-06-01

    The INFN-Pisa Tier2 infrastructure is described, optimized not only for GRID CPU and Storage access, but also for a more interactive use of the resources in order to provide good solutions for the final data analysis step. The Data Center, equipped with about 6700 production cores, permits the use of modern analysis techniques realized via advanced statistical tools (like RooFit and RooStat) implemented in multicore systems. In particular a POSIX file storage access integrated with standard SRM access is provided. Therefore the unified storage infrastructure is described, based on GPFS and Xrootd, used both for SRM data repository and interactive POSIX access. Such a common infrastructure allows a transparent access to the Tier2 data to the users for their interactive analysis. The organization of a specialized many cores CPU facility devoted to interactive analysis is also described along with the login mechanism integrated with the INFN-AAI (National INFN Infrastructure) to extend the site access and use to a geographical distributed community. Such infrastructure is used also for a national computing facility in use to the INFN theoretical community, it enables a synergic use of computing and storage resources. Our Center initially developed for the HEP community is now growing and includes also HPC resources fully integrated. In recent years has been installed and managed a cluster facility (1000 cores, parallel use via InfiniBand connection) and we are now updating this facility that will provide resources for all the intermediate level HPC computing needs of the INFN theoretical national community.

  8. Design for Run-Time Monitor on Cloud Computing

    NASA Astrophysics Data System (ADS)

    Kang, Mikyung; Kang, Dong-In; Yun, Mira; Park, Gyung-Leen; Lee, Junghoon

    Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is the type of a parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring the system status change, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize resources on cloud computing. RTM monitors application software through library instrumentation as well as underlying hardware through performance counter optimizing its computing configuration based on the analyzed data.

  9. A new implementation of full resolution SBAS-DInSAR processing chain for the effective monitoring of structures and infrastructures

    NASA Astrophysics Data System (ADS)

    Bonano, Manuela; Buonanno, Sabatino; Ojha, Chandrakanta; Berardino, Paolo; Lanari, Riccardo; Zeni, Giovanni; Manunta, Michele

    2017-04-01

    The advanced DInSAR technique referred to as Small BAseline Subset (SBAS) algorithm has already largely demonstrated its effectiveness to carry out multi-scale and multi-platform surface deformation analyses relevant to both natural and man-made hazards. Thanks to its capability to generate displacement maps and long-term deformation time series at both regional (low resolution analysis) and local (full resolution analysis) spatial scales, it allows to get more insights on the spatial and temporal patterns of localized displacements relevant to single buildings and infrastructures over extended urban areas, with a key role in supporting risk mitigation and preservation activities. The extensive application of the multi-scale SBAS-DInSAR approach in many scientific contexts has gone hand in hand with new SAR satellite mission development, characterized by different frequency bands, spatial resolution, revisit times and ground coverage. This brought to the generation of huge DInSAR data stacks to be efficiently handled, processed and archived, with a strong impact on both the data storage and the computational requirements needed for generating the full resolution SBAS-DInSAR results. Accordingly, innovative and effective solutions for the automatic processing of massive SAR data archives and for the operational management of the derived SBAS-DInSAR products need to be designed and implemented, by exploiting the high efficiency (in terms of portability, scalability and computing performances) of the new ICT methodologies. In this work, we present a novel parallel implementation of the full resolution SBAS-DInSAR processing chain, aimed at investigating localized displacements affecting single buildings and infrastructures relevant to very large urban areas, relying on different granularity level parallelization strategies. The image granularity level is applied in most steps of the SBAS-DInSAR processing chain and exploits the multiprocessor systems with distributed memory. Moreover, in some processing steps very heavy from the computational point of view, the Graphical Processing Units (GPU) are exploited for the processing of blocks working on a pixel-by-pixel basis, requiring strong modifications on some key parts of the sequential full resolution SBAS-DInSAR processing chain. GPU processing is implemented by efficiently exploiting parallel processing architectures (as CUDA) for increasing the computing performances, in terms of optimization of the available GPU memory, as well as reduction of the Input/Output operations on the GPU and of the whole processing time for specific blocks w.r.t. the corresponding sequential implementation, particularly critical in presence of huge DInSAR datasets. Moreover, to efficiently handle the massive amount of DInSAR measurements provided by the new generation SAR constellations (CSK and Sentinel-1), we perform a proper re-design strategy aimed at the robust assimilation of the full resolution SBAS-DInSAR results into the web-based Geonode platform of the Spatial Data Infrastructure, thus allowing the efficient management, analysis and integration of the interferometric results with different data sources.

  10. The Secret Life of Quarks, Final Report for the University of North Carolina at Chapel Hill

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fowler, Robert J.

    This final report summarizes activities and results at the University of North Carolina as part of the the SciDAC-2 Project The Secret Life of Quarks: National Computational Infrastructure for Lattice Quantum Chromodynamics. The overall objective of the project is to construct the software needed to study quantum chromo- dynamics (QCD), the theory of the strong interactions of subatomic physics, and similar strongly coupled gauge theories anticipated to be of importance in the LHC era. It built upon the successful efforts of the SciDAC-1 project National Computational Infrastructure for Lattice Gauge Theory, in which a QCD Applications Programming Interface (QCD API)more » was developed that enables lat- tice gauge theorists to make effective use of a wide variety of massively parallel computers. In the SciDAC-2 project, optimized versions of the QCD API were being created for the IBM Blue- Gene/L (BG/L) and BlueGene/P (BG/P), the Cray XT3/XT4 and its successors, and clusters based on multi-core processors and Infiniband communications networks. The QCD API is being used to enhance the performance of the major QCD community codes and to create new applications. Software libraries of physics tools have been expanded to contain sharable building blocks for inclusion in application codes, performance analysis and visualization tools, and software for au- tomation of physics work flow. New software tools were designed for managing the large data sets generated in lattice QCD simulations, and for sharing them through the International Lattice Data Grid consortium. As part of the overall project, researchers at UNC were funded through ASCR to work in three general areas. The main thrust has been performance instrumentation and analysis in support of the SciDAC QCD code base as it evolved and as it moved to new computation platforms. In support of the performance activities, performance data was to be collected in a database for the purpose of broader analysis. Third, the UNC work was done at RENCI (Renaissance Computing Institute), which has extensive expertise and facilities for scientific data visualization, so we acted in an ongoing consulting and support role in that area.« less

  11. Web-Based Integrated Research Environment for Aerodynamic Analyses and Design

    NASA Astrophysics Data System (ADS)

    Ahn, Jae Wan; Kim, Jin-Ho; Kim, Chongam; Cho, Jung-Hyun; Hur, Cinyoung; Kim, Yoonhee; Kang, Sang-Hyun; Kim, Byungsoo; Moon, Jong Bae; Cho, Kum Won

    e-AIRS[1,2], an abbreviation of ‘e-Science Aerospace Integrated Research System,' is a virtual organization designed to support aerodynamic flow analyses in aerospace engineering using the e-Science environment. As the first step toward a virtual aerospace engineering organization, e-AIRS intends to give a full support of aerodynamic research process. Currently, e-AIRS can handle both the computational and experimental aerodynamic research on the e-Science infrastructure. In detail, users can conduct a full CFD (Computational Fluid Dynamics) research process, request wind tunnel experiment, perform comparative analysis between computational prediction and experimental measurement, and finally, collaborate with other researchers using the web portal. The present paper describes those services and the internal architecture of the e-AIRS system.

  12. The Red Atrapa Sismos (Quake Catcher Network in Mexico): assessing performance during large and damaging earthquakes.

    USGS Publications Warehouse

    Dominguez, Luis A.; Yildirim, Battalgazi; Husker, Allen L.; Cochran, Elizabeth S.; Christensen, Carl; Cruz-Atienza, Victor M.

    2015-01-01

    Each volunteer computer monitors ground motion and communicates using the Berkeley Open Infrastructure for Network Computing (BOINC, Anderson, 2004). Using a standard short‐term average, long‐term average (STLA) algorithm (Earle and Shearer, 1994; Cochran, Lawrence, Christensen, Chung, 2009; Cochran, Lawrence, Christensen, and Jakka, 2009), volunteer computer and sensor systems detect abrupt changes in the acceleration recordings. Each time a possible trigger signal is declared, a small package of information containing sensor and ground‐motion information is streamed to one of the QCN servers (Chung et al., 2011). Trigger signals, correlated in space and time, are then processed by the QCN server to look for potential earthquakes.

  13. Advanced Artificial Science. The development of an artificial science and engineering research infrastructure to facilitate innovative computational modeling, analysis, and application to interdisciplinary areas of scientific investigation.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saffer, Shelley

    2014-12-01

    This is a final report of the DOE award DE-SC0001132, Advanced Artificial Science. The development of an artificial science and engineering research infrastructure to facilitate innovative computational modeling, analysis, and application to interdisciplinary areas of scientific investigation. This document describes the achievements of the goals, and resulting research made possible by this award.

  14. NAS infrastructure management system build 1.5 computer-human interface

    DOT National Transportation Integrated Search

    2001-01-01

    Human factors engineers from the National Airspace System (NAS) Human Factors Branch (ACT-530) of the Federal Aviation Administration William J. Hughes Technical Center conducted an evaluation of the NAS Infrastructure Management System (NIMS) Build ...

  15. An EMSO data case study within the INDIGO-DC project

    NASA Astrophysics Data System (ADS)

    Monna, Stephen; Marcucci, Nicola M.; Marinaro, Giuditta; Fiore, Sandro; D'Anca, Alessandro; Antonacci, Marica; Beranzoli, Laura; Favali, Paolo

    2017-04-01

    We present our experience based on a case study within the INDIGO-DataCloud (INtegrating Distributed data Infrastructures for Global ExplOitation) project (www.indigo-datacloud.eu). The aim of INDIGO-DC is to develop a data and computing platform targeting scientific communities. Our case study is an example of activities performed by INGV using data from seafloor observatories that are nodes of the infrastructure EMSO (European Multidisciplinary Seafloor and water column Observatory)-ERIC (www.emso-eu.org). EMSO is composed of several deep-seafloor and water column observatories, deployed at key sites in the European waters, thus forming a widely distributed pan-European infrastructure. In our case study we consider data collected by the NEMO-SN1 observatory, one of the EMSO nodes used for geohazard monitoring, located in the Western Ionian Sea in proximity of Etna volcano. Starting from the case study, through an agile approach, we defined some requirements for INDIGO developers, and tested some of the proposed INDIGO solutions that are of interest for our research community. Given that EMSO is a distributed infrastructure, we are interested in INDIGO solutions that allow access to distributed data storage. Access should be both user-oriented and machine-oriented, and with the use of a common identity and access system. For this purpose, we have been testing: - ONEDATA (https://onedata.org), as global data management system. - INDIGO-IAM as Identity and Access Management system. Another aspect we are interested in is the efficient data processing, and we have focused on two types of INDIGO products: - Ophidia (http://ophidia.cmcc.it), a big data analytics framework for eScience for the analysis of multidimensional data. - A collection of INDIGO Services to run processes for scientific computing through the INDIGO Orchestrator.

  16. A probabilistic and adaptive approach to modeling performance of pavement infrastructure

    DOT National Transportation Integrated Search

    2007-08-01

    Accurate prediction of pavement performance is critical to pavement management agencies. Reliable and accurate predictions of pavement infrastructure performance can save significant amounts of money for pavement infrastructure management agencies th...

  17. A Reliable TTP-Based Infrastructure with Low Sensor Resource Consumption for the Smart Home Multi-Platform

    PubMed Central

    Kang, Jungho; Kim, Mansik; Park, Jong Hyuk

    2016-01-01

    With the ICT technology making great progress in the smart home environment, the ubiquitous environment is rapidly emerging all over the world, but problems are also increasing proportionally to the rapid growth of the smart home market such as multiplatform heterogeneity and new security threats. In addition, the smart home sensors have so low computing resources that they cannot process complicated computation tasks, which is required to create a proper security environment. A service provider also faces overhead in processing data from a rapidly increasing number of sensors. This paper aimed to propose a scheme to build infrastructure in which communication entities can securely authenticate and design security channel with physically unclonable PUFs and the TTP that smart home communication entities can rely on. In addition, we analyze and evaluate the proposed scheme for security and performance and prove that it can build secure channels with low resources. Finally, we expect that the proposed scheme can be helpful for secure communication with low resources in future smart home multiplatforms. PMID:27399699

  18. A Reliable TTP-Based Infrastructure with Low Sensor Resource Consumption for the Smart Home Multi-Platform.

    PubMed

    Kang, Jungho; Kim, Mansik; Park, Jong Hyuk

    2016-07-05

    With the ICT technology making great progress in the smart home environment, the ubiquitous environment is rapidly emerging all over the world, but problems are also increasing proportionally to the rapid growth of the smart home market such as multiplatform heterogeneity and new security threats. In addition, the smart home sensors have so low computing resources that they cannot process complicated computation tasks, which is required to create a proper security environment. A service provider also faces overhead in processing data from a rapidly increasing number of sensors. This paper aimed to propose a scheme to build infrastructure in which communication entities can securely authenticate and design security channel with physically unclonable PUFs and the TTP that smart home communication entities can rely on. In addition, we analyze and evaluate the proposed scheme for security and performance and prove that it can build secure channels with low resources. Finally, we expect that the proposed scheme can be helpful for secure communication with low resources in future smart home multiplatforms.

  19. Reconciliation of the cloud computing model with US federal electronic health record regulations

    PubMed Central

    2011-01-01

    Cloud computing refers to subscription-based, fee-for-service utilization of computer hardware and software over the Internet. The model is gaining acceptance for business information technology (IT) applications because it allows capacity and functionality to increase on the fly without major investment in infrastructure, personnel or licensing fees. Large IT investments can be converted to a series of smaller operating expenses. Cloud architectures could potentially be superior to traditional electronic health record (EHR) designs in terms of economy, efficiency and utility. A central issue for EHR developers in the US is that these systems are constrained by federal regulatory legislation and oversight. These laws focus on security and privacy, which are well-recognized challenges for cloud computing systems in general. EHRs built with the cloud computing model can achieve acceptable privacy and security through business associate contracts with cloud providers that specify compliance requirements, performance metrics and liability sharing. PMID:21727204

  20. Processing of the WLCG monitoring data using NoSQL

    NASA Astrophysics Data System (ADS)

    Andreeva, J.; Beche, A.; Belov, S.; Dzhunov, I.; Kadochnikov, I.; Karavakis, E.; Saiz, P.; Schovancova, J.; Tuckett, D.

    2014-06-01

    The Worldwide LHC Computing Grid (WLCG) today includes more than 150 computing centres where more than 2 million jobs are being executed daily and petabytes of data are transferred between sites. Monitoring the computing activities of the LHC experiments, over such a huge heterogeneous infrastructure, is extremely demanding in terms of computation, performance and reliability. Furthermore, the generated monitoring flow is constantly increasing, which represents another challenge for the monitoring systems. While existing solutions are traditionally based on Oracle for data storage and processing, recent developments evaluate NoSQL for processing large-scale monitoring datasets. NoSQL databases are getting increasingly popular for processing datasets at the terabyte and petabyte scale using commodity hardware. In this contribution, the integration of NoSQL data processing in the Experiment Dashboard framework is described along with first experiences of using this technology for monitoring the LHC computing activities.

  1. Reconciliation of the cloud computing model with US federal electronic health record regulations.

    PubMed

    Schweitzer, Eugene J

    2012-01-01

    Cloud computing refers to subscription-based, fee-for-service utilization of computer hardware and software over the Internet. The model is gaining acceptance for business information technology (IT) applications because it allows capacity and functionality to increase on the fly without major investment in infrastructure, personnel or licensing fees. Large IT investments can be converted to a series of smaller operating expenses. Cloud architectures could potentially be superior to traditional electronic health record (EHR) designs in terms of economy, efficiency and utility. A central issue for EHR developers in the US is that these systems are constrained by federal regulatory legislation and oversight. These laws focus on security and privacy, which are well-recognized challenges for cloud computing systems in general. EHRs built with the cloud computing model can achieve acceptable privacy and security through business associate contracts with cloud providers that specify compliance requirements, performance metrics and liability sharing.

  2. Intelligent systems technology infrastructure for integrated systems

    NASA Technical Reports Server (NTRS)

    Lum, Henry, Jr.

    1991-01-01

    Significant advances have occurred during the last decade in intelligent systems technologies (a.k.a. knowledge-based systems, KBS) including research, feasibility demonstrations, and technology implementations in operational environments. Evaluation and simulation data obtained to date in real-time operational environments suggest that cost-effective utilization of intelligent systems technologies can be realized for Automated Rendezvous and Capture applications. The successful implementation of these technologies involve a complex system infrastructure integrating the requirements of transportation, vehicle checkout and health management, and communication systems without compromise to systems reliability and performance. The resources that must be invoked to accomplish these tasks include remote ground operations and control, built-in system fault management and control, and intelligent robotics. To ensure long-term evolution and integration of new validated technologies over the lifetime of the vehicle, system interfaces must also be addressed and integrated into the overall system interface requirements. An approach for defining and evaluating the system infrastructures including the testbed currently being used to support the on-going evaluations for the evolutionary Space Station Freedom Data Management System is presented and discussed. Intelligent system technologies discussed include artificial intelligence (real-time replanning and scheduling), high performance computational elements (parallel processors, photonic processors, and neural networks), real-time fault management and control, and system software development tools for rapid prototyping capabilities.

  3. Digital Rocks Portal: a sustainable platform for imaged dataset sharing, translation and automated analysis

    NASA Astrophysics Data System (ADS)

    Prodanovic, M.; Esteva, M.; Hanlon, M.; Nanda, G.; Agarwal, P.

    2015-12-01

    Recent advances in imaging have provided a wealth of 3D datasets that reveal pore space microstructure (nm to cm length scale) and allow investigation of nonlinear flow and mechanical phenomena from first principles using numerical approaches. This framework has popularly been called "digital rock physics". Researchers, however, have trouble storing and sharing the datasets both due to their size and the lack of standardized image types and associated metadata for volumetric datasets. This impedes scientific cross-validation of the numerical approaches that characterize large scale porous media properties, as well as development of multiscale approaches required for correct upscaling. A single research group typically specializes in an imaging modality and/or related modeling on a single length scale, and lack of data-sharing infrastructure makes it difficult to integrate different length scales. We developed a sustainable, open and easy-to-use repository called the Digital Rocks Portal, that (1) organizes images and related experimental measurements of different porous materials, (2) improves access to them for a wider community of geosciences or engineering researchers not necessarily trained in computer science or data analysis. Once widely accepter, the repository will jumpstart productivity and enable scientific inquiry and engineering decisions founded on a data-driven basis. This is the first repository of its kind. We show initial results on incorporating essential software tools and pipelines that make it easier for researchers to store and reuse data, and for educators to quickly visualize and illustrate concepts to a wide audience. For data sustainability and continuous access, the portal is implemented within the reliable, 24/7 maintained High Performance Computing Infrastructure supported by the Texas Advanced Computing Center (TACC) at the University of Texas at Austin. Long-term storage is provided through the University of Texas System Research Cyber-infrastructure initiative.

  4. Elastic Extension of a CMS Computing Centre Resources on External Clouds

    NASA Astrophysics Data System (ADS)

    Codispoti, G.; Di Maria, R.; Aiftimiei, C.; Bonacorsi, D.; Calligola, P.; Ciaschini, V.; Costantini, A.; Dal Pra, S.; DeGirolamo, D.; Grandi, C.; Michelotto, D.; Panella, M.; Peco, G.; Sapunenko, V.; Sgaravatto, M.; Taneja, S.; Zizzi, G.

    2016-10-01

    After the successful LHC data taking in Run-I and in view of the future runs, the LHC experiments are facing new challenges in the design and operation of the computing facilities. The computing infrastructure for Run-II is dimensioned to cope at most with the average amount of data recorded. The usage peaks, as already observed in Run-I, may however originate large backlogs, thus delaying the completion of the data reconstruction and ultimately the data availability for physics analysis. In order to cope with the production peaks, CMS - along the lines followed by other LHC experiments - is exploring the opportunity to access Cloud resources provided by external partners or commercial providers. Specific use cases have already been explored and successfully exploited during Long Shutdown 1 (LS1) and the first part of Run 2. In this work we present the proof of concept of the elastic extension of a CMS site, specifically the Bologna Tier-3, on an external OpenStack infrastructure. We focus on the “Cloud Bursting” of a CMS Grid site using a newly designed LSF configuration that allows the dynamic registration of new worker nodes to LSF. In this approach, the dynamically added worker nodes instantiated on the OpenStack infrastructure are transparently accessed by the LHC Grid tools and at the same time they serve as an extension of the farm for the local usage. The amount of resources allocated thus can be elastically modeled to cope up with the needs of CMS experiment and local users. Moreover, a direct access/integration of OpenStack resources to the CMS workload management system is explored. In this paper we present this approach, we report on the performances of the on-demand allocated resources, and we discuss the lessons learned and the next steps.

  5. Digital Rocks Portal: a Sustainable Platform for Data Management, Analysis and Remote Visualization of Volumetric Images of Porous Media

    NASA Astrophysics Data System (ADS)

    Prodanovic, M.; Esteva, M.; Ketcham, R. A.

    2017-12-01

    Nanometer to centimeter-scale imaging such as (focused ion beam) scattered electron microscopy, magnetic resonance imaging and X-ray (micro)tomography has since 1990s introduced 2D and 3D datasets of rock microstructure that allow investigation of nonlinear flow and mechanical phenomena on the length scales that are otherwise impervious to laboratory measurements. The numerical approaches that use such images produce various upscaled parameters required by subsurface flow and deformation simulators. All of this has revolutionized our knowledge about grain scale phenomena. However, a lack of data-sharing infrastructure among research groups makes it difficult to integrate different length scales. We have developed a sustainable, open and easy-to-use repository called the Digital Rocks Portal (https://www.digitalrocksportal.org), that (1) organizes images and related experimental measurements of different porous materials, (2) improves access to them for a wider community of engineering or geosciences researchers not necessarily trained in computer science or data analysis. Digital Rocks Portal (NSF EarthCube Grant 1541008) is the first repository for imaged porous microstructure data. It is implemented within the reliable, 24/7 maintained High Performance Computing Infrastructure supported by the Texas Advanced Computing Center (University of Texas at Austin). Long-term storage is provided through the University of Texas System Research Cyber-infrastructure initiative. We show how the data can be documented, referenced in publications via digital object identifiers (see Figure below for examples), visualized, searched for and linked to other repositories. We show recently implemented integration of the remote parallel visualization, bulk upload for large datasets as well as preliminary flow simulation workflow with the pore structures currently stored in the repository. We discuss the issues of collecting correct metadata, data discoverability and repository sustainability.

  6. Computing and Communications Infrastructure for Network-Centric Warfare: Exploiting COTS, Assuring Performance

    DTIC Science & Technology

    2004-06-01

    remote databases, has seen little vendor acceptance. Each database ( Oracle , DB2, MySQL , etc.) has its own client- server protocol. Therefore each...existing standards – SQL , X.500/LDAP, FTP, etc. • View information dissemination as selective replication – State-oriented vs . message-oriented...allowing the 8 application to start. The resource management system would serve as a broker to the resources, making sure that resources are not

  7. Extraction of urban vegetation with Pleiades multiangular images

    NASA Astrophysics Data System (ADS)

    Lefebvre, Antoine; Nabucet, Jean; Corpetti, Thomas; Courty, Nicolas; Hubert-Moy, Laurence

    2016-10-01

    Vegetation is essential in urban environments since it provides significant services in terms of health, heat, property value, ecology ... As part of the European Union Biodiversity Strategy Plan for 2020, the protection and development of green-infrastructures is strengthened in urban areas. In order to evaluate and monitor the quality of the green infra-structures, this article investigates contributions of Pléiades multi-angular images to extract and characterize low and high urban vegetation. From such images one can extract both spectral and elevation information from optical images. Our method is composed of 3 main steps : (1) the computation of a normalized Digital Surface Model from the multi-angular images ; (2) Extraction of spectral and contextual features ; (3) a classification of vegetation classes (tree and grass) performed with a random forest classifier. Results performed in the city of Rennes in France show the ability of multi-angular images to extract DEM in urban area despite building height. It also highlights its importance and its complementarity with contextual information to extract urban vegetation.

  8. Multiscale Methods for Accurate, Efficient, and Scale-Aware Models of the Earth System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Goldhaber, Steve; Holland, Marika

    The major goal of this project was to contribute improvements to the infrastructure of an Earth System Model in order to support research in the Multiscale Methods for Accurate, Efficient, and Scale-Aware models of the Earth System project. In support of this, the NCAR team accomplished two main tasks: improving input/output performance of the model and improving atmospheric model simulation quality. Improvement of the performance and scalability of data input and diagnostic output within the model required a new infrastructure which can efficiently handle the unstructured grids common in multiscale simulations. This allows for a more computationally efficient model, enablingmore » more years of Earth System simulation. The quality of the model simulations was improved by reducing grid-point noise in the spectral element version of the Community Atmosphere Model (CAM-SE). This was achieved by running the physics of the model using grid-cell data on a finite-volume grid.« less

  9. Transportation Infrastructure Design and Construction \\0x16 Virtual Training Tools

    DOT National Transportation Integrated Search

    2003-09-01

    This project will develop 3D interactive computer-training environments for a major element of transportation infrastructure : hot mix asphalt paving. These tools will include elements of hot mix design (including laboratory equipment) and constructi...

  10. Putting the Information Infrastructure to Work. Report of the Information Infrastructure Task Force Committee on Applications and Technology. NIST Special Publication 857.

    ERIC Educational Resources Information Center

    National Inst. of Standards and Technology, Gaithersburg, MD.

    An interconnection of computer networks, telecommunications services, and applications, the National Information Infrastructure (NII) can open up new vistas and profoundly change much of American life. This report explores some of the opportunities and obstacles to the use of the NII by people and organizations. The goal is to express how…

  11. Integrating Network Management for Cloud Computing Services

    DTIC Science & Technology

    2015-06-01

    abstraction and system design. In this dissertation, we make three major contributions. We rst propose to consolidate the tra c and infrastructure management...abstraction and system design. In this dissertation, we make three major contributions. We first propose to consolidate the traffic and infrastructure ...1.3.1 Safe Datacenter Traffic/ Infrastructure Management . . . . . . 9 1.3.2 End-host/Network Cooperative Traffic Management . . . . . . 10 1.3.3 Direct

  12. Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community.

    PubMed

    Krampis, Konstantinos; Booth, Tim; Chapman, Brad; Tiwari, Bela; Bicak, Mesude; Field, Dawn; Nelson, Karen E

    2012-03-19

    A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.

  13. Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community

    PubMed Central

    2012-01-01

    Background A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Results Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Conclusions Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them. PMID:22429538

  14. RFA Guardian: Comprehensive Simulation of Radiofrequency Ablation Treatment of Liver Tumors.

    PubMed

    Voglreiter, Philip; Mariappan, Panchatcharam; Pollari, Mika; Flanagan, Ronan; Blanco Sequeiros, Roberto; Portugaller, Rupert Horst; Fütterer, Jurgen; Schmalstieg, Dieter; Kolesnik, Marina; Moche, Michael

    2018-01-15

    The RFA Guardian is a comprehensive application for high-performance patient-specific simulation of radiofrequency ablation of liver tumors. We address a wide range of usage scenarios. These include pre-interventional planning, sampling of the parameter space for uncertainty estimation, treatment evaluation and, in the worst case, failure analysis. The RFA Guardian is the first of its kind that exhibits sufficient performance for simulating treatment outcomes during the intervention. We achieve this by combining a large number of high-performance image processing, biomechanical simulation and visualization techniques into a generalized technical workflow. Further, we wrap the feature set into a single, integrated application, which exploits all available resources of standard consumer hardware, including massively parallel computing on graphics processing units. This allows us to predict or reproduce treatment outcomes on a single personal computer with high computational performance and high accuracy. The resulting low demand for infrastructure enables easy and cost-efficient integration into the clinical routine. We present a number of evaluation cases from the clinical practice where users performed the whole technical workflow from patient-specific modeling to final validation and highlight the opportunities arising from our fast, accurate prediction techniques.

  15. Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure.

    PubMed

    Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei

    2011-09-07

    Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed.

  16. Future opportunities and trends for e-infrastructures and life sciences: going beyond the grid to enable life science data analysis

    PubMed Central

    Duarte, Afonso M. S.; Psomopoulos, Fotis E.; Blanchet, Christophe; Bonvin, Alexandre M. J. J.; Corpas, Manuel; Franc, Alain; Jimenez, Rafael C.; de Lucas, Jesus M.; Nyrönen, Tommi; Sipos, Gergely; Suhr, Stephanie B.

    2015-01-01

    With the increasingly rapid growth of data in life sciences we are witnessing a major transition in the way research is conducted, from hypothesis-driven studies to data-driven simulations of whole systems. Such approaches necessitate the use of large-scale computational resources and e-infrastructures, such as the European Grid Infrastructure (EGI). EGI, one of key the enablers of the digital European Research Area, is a federation of resource providers set up to deliver sustainable, integrated and secure computing services to European researchers and their international partners. Here we aim to provide the state of the art of Grid/Cloud computing in EU research as viewed from within the field of life sciences, focusing on key infrastructures and projects within the life sciences community. Rather than focusing purely on the technical aspects underlying the currently provided solutions, we outline the design aspects and key characteristics that can be identified across major research approaches. Overall, we aim to provide significant insights into the road ahead by establishing ever-strengthening connections between EGI as a whole and the life sciences community. PMID:26157454

  17. Future opportunities and trends for e-infrastructures and life sciences: going beyond the grid to enable life science data analysis.

    PubMed

    Duarte, Afonso M S; Psomopoulos, Fotis E; Blanchet, Christophe; Bonvin, Alexandre M J J; Corpas, Manuel; Franc, Alain; Jimenez, Rafael C; de Lucas, Jesus M; Nyrönen, Tommi; Sipos, Gergely; Suhr, Stephanie B

    2015-01-01

    With the increasingly rapid growth of data in life sciences we are witnessing a major transition in the way research is conducted, from hypothesis-driven studies to data-driven simulations of whole systems. Such approaches necessitate the use of large-scale computational resources and e-infrastructures, such as the European Grid Infrastructure (EGI). EGI, one of key the enablers of the digital European Research Area, is a federation of resource providers set up to deliver sustainable, integrated and secure computing services to European researchers and their international partners. Here we aim to provide the state of the art of Grid/Cloud computing in EU research as viewed from within the field of life sciences, focusing on key infrastructures and projects within the life sciences community. Rather than focusing purely on the technical aspects underlying the currently provided solutions, we outline the design aspects and key characteristics that can be identified across major research approaches. Overall, we aim to provide significant insights into the road ahead by establishing ever-strengthening connections between EGI as a whole and the life sciences community.

  18. GLIDE: a grid-based light-weight infrastructure for data-intensive environments

    NASA Technical Reports Server (NTRS)

    Mattmann, Chris A.; Malek, Sam; Beckman, Nels; Mikic-Rakic, Marija; Medvidovic, Nenad; Chrichton, Daniel J.

    2005-01-01

    The promise of the grid is that it will enable public access and sharing of immense amounts of computational and data resources among dynamic coalitions of individuals and institutions. However, the current grid solutions make several limiting assumptions that curtail their widespread adoption. To address these limitations, we present GLIDE, a prototype light-weight, data-intensive middleware infrastructure that enables access to the robust data and computational power of the grid on DREAM platforms.

  19. Privacy and the National Information Infrastructure.

    ERIC Educational Resources Information Center

    Rotenberg, Marc

    1994-01-01

    Explains the work of Computer Professionals for Social Responsibility regarding privacy issues in the use of electronic networks; recommends principles that should be adopted for a National Information Infrastructure privacy code; discusses the need for public education; and suggests pertinent legislative proposals. (LRW)

  20. Effecting IT infrastructure culture change: management by processes and metrics

    NASA Technical Reports Server (NTRS)

    Miller, R. L.

    2001-01-01

    This talk describes the processes and metrics used by Jet Propulsion Laboratory to bring about the required IT infrastructure culture change to update and certify, as Y2K compliant, thousands of computers and millions of lines of code.

  1. IEEE TRANSACTIONS ON CYBERNETICS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Craig R. RIeger; David H. Scheidt; William D. Smart

    2014-11-01

    MODERN societies depend on complex and critical infrastructures for energy, transportation, sustenance, medical care, emergency response, communications security. As computers, automation, and information technology (IT) have advanced, these technologies have been exploited to enhance the efficiency of operating the processes that make up these infrastructures

  2. Estuarine wetland evolution including sea-level rise and infrastructure effects.

    NASA Astrophysics Data System (ADS)

    Rodriguez, Jose Fernando; Trivisonno, Franco; Rojas, Steven Sandi; Riccardi, Gerardo; Stenta, Hernan; Saco, Patricia Mabel

    2015-04-01

    Estuarine wetlands are an extremely valuable resource in terms of biotic diversity, flood attenuation, storm surge protection, groundwater recharge, filtering of surface flows and carbon sequestration. On a large scale the survival of these systems depends on the slope of the land and a balance between the rates of accretion and sea-level rise, but local man-made flow disturbances can have comparable effects. Climate change predictions for most of Australia include an accelerated sea level rise, which may challenge the survival of estuarine wetlands. Furthermore, coastal infrastructure poses an additional constraint on the adaptive capacity of these ecosystems. Numerical models are increasingly being used to assess wetland dynamics and to help manage some of these situations. We present results of a wetland evolution model that is based on computed values of hydroperiod and tidal range that drive vegetation preference. Our first application simulates the long term evolution of an Australian wetland heavily constricted by infrastructure that is undergoing the effects of predicted accelerated sea level rise. The wetland presents a vegetation zonation sequence mudflats - mangrove - saltmarsh from the seaward margin and up the topographic gradient but is also affected by compartmentalization due to internal road embankments and culverts that effectively attenuates tidal input to the upstream compartments. For this reason, the evolution model includes a 2D hydrodynamic module which is able to handle man-made flow controls and spatially varying roughness. It continually simulates tidal inputs into the wetland and computes annual values of hydroperiod and tidal range to update vegetation distribution based on preference to hydrodynamic conditions of the different vegetation types. It also computes soil accretion rates and updates roughness coefficient values according to evolving vegetation types. In order to explore in more detail the magnitude of flow attenuation due to roughness and its effects on the computation of tidal range and hydroperiod, we performed numerical experiments simulating floodplain flow on the side of a tidal creek using different roughness values. Even though the values of roughness that produce appreciable changes in hydroperiod and tidal range are relatively high, they are within the range expected for some of the wetland vegetation. Both applications of the model show that flow attenuation can play a major role in wetland hydrodynamics and that its effects must be considered when predicting wetland evolution under climate change scenarios, particularly in situations where existing infrastructure affects the flow.

  3. The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data Collections

    NASA Astrophysics Data System (ADS)

    Evans, B. J. K.; Pugh, T.; Wyborn, L. A.; Porter, D.; Allen, C.; Smillie, J.; Antony, J.; Trenham, C.; Evans, B. J.; Beckett, D.; Erwin, T.; King, E.; Hodge, J.; Woodcock, R.; Fraser, R.; Lescinsky, D. T.

    2014-12-01

    The National Computational Infrastructure (NCI) has co-located a priority set of national data assets within a HPC research platform. This powerful in-situ computational platform has been created to help serve and analyse the massive amounts of data across the spectrum of environmental collections - in particular the climate, observational data and geoscientific domains. This paper examines the infrastructure, innovation and opportunity for this significant research platform. NCI currently manages nationally significant data collections (10+ PB) categorised as 1) earth system sciences, climate and weather model data assets and products, 2) earth and marine observations and products, 3) geosciences, 4) terrestrial ecosystem, 5) water management and hydrology, and 6) astronomy, social science and biosciences. The data is largely sourced from the NCI partners (who include the custodians of many of the national scientific records), major research communities, and collaborating overseas organisations. By co-locating these large valuable data assets, new opportunities have arisen by harmonising the data collections, making a powerful transdisciplinary research platformThe data is accessible within an integrated HPC-HPD environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large scale and high-bandwidth Lustre filesystems. New scientific software, cloud-scale techniques, server-side visualisation and data services have been harnessed and integrated into the platform, so that analysis is performed seamlessly across the traditional boundaries of the underlying data domains. Characterisation of the techniques along with performance profiling ensures scalability of each software component, all of which can either be enhanced or replaced through future improvements. A Development-to-Operations (DevOps) framework has also been implemented to manage the scale of the software complexity alone. This ensures that software is both upgradable and maintainable, and can be readily reused with complexly integrated systems and become part of the growing global trusted community tools for cross-disciplinary research.

  4. A modular (almost) automatic set-up for elastic multi-tenants cloud (micro)infrastructures

    NASA Astrophysics Data System (ADS)

    Amoroso, A.; Astorino, F.; Bagnasco, S.; Balashov, N. A.; Bianchi, F.; Destefanis, M.; Lusso, S.; Maggiora, M.; Pellegrino, J.; Yan, L.; Yan, T.; Zhang, X.; Zhao, X.

    2017-10-01

    An auto-installing tool on an usb drive can allow for a quick and easy automatic deployment of OpenNebula-based cloud infrastructures remotely managed by a central VMDIRAC instance. A single team, in the main site of an HEP Collaboration or elsewhere, can manage and run a relatively large network of federated (micro-)cloud infrastructures, making an highly dynamic and elastic use of computing resources. Exploiting such an approach can lead to modular systems of cloud-bursting infrastructures addressing complex real-life scenarios.

  5. Next Generation Distributed Computing for Cancer Research

    PubMed Central

    Agarwal, Pankaj; Owzar, Kouros

    2014-01-01

    Advances in next generation sequencing (NGS) and mass spectrometry (MS) technologies have provided many new opportunities and angles for extending the scope of translational cancer research while creating tremendous challenges in data management and analysis. The resulting informatics challenge is invariably not amenable to the use of traditional computing models. Recent advances in scalable computing and associated infrastructure, particularly distributed computing for Big Data, can provide solutions for addressing these challenges. In this review, the next generation of distributed computing technologies that can address these informatics problems is described from the perspective of three key components of a computational platform, namely computing, data storage and management, and networking. A broad overview of scalable computing is provided to set the context for a detailed description of Hadoop, a technology that is being rapidly adopted for large-scale distributed computing. A proof-of-concept Hadoop cluster, set up for performance benchmarking of NGS read alignment, is described as an example of how to work with Hadoop. Finally, Hadoop is compared with a number of other current technologies for distributed computing. PMID:25983539

  6. From photons to big-data applications: terminating terabits

    PubMed Central

    2016-01-01

    Computer architectures have entered a watershed as the quantity of network data generated by user applications exceeds the data-processing capacity of any individual computer end-system. It will become impossible to scale existing computer systems while a gap grows between the quantity of networked data and the capacity for per system data processing. Despite this, the growth in demand in both task variety and task complexity continues unabated. Networked computer systems provide a fertile environment in which new applications develop. As networked computer systems become akin to infrastructure, any limitation upon the growth in capacity and capabilities becomes an important constraint of concern to all computer users. Considering a networked computer system capable of processing terabits per second, as a benchmark for scalability, we critique the state of the art in commodity computing, and propose a wholesale reconsideration in the design of computer architectures and their attendant ecosystem. Our proposal seeks to reduce costs, save power and increase performance in a multi-scale approach that has potential application from nanoscale to data-centre-scale computers. PMID:26809573

  7. From photons to big-data applications: terminating terabits.

    PubMed

    Zilberman, Noa; Moore, Andrew W; Crowcroft, Jon A

    2016-03-06

    Computer architectures have entered a watershed as the quantity of network data generated by user applications exceeds the data-processing capacity of any individual computer end-system. It will become impossible to scale existing computer systems while a gap grows between the quantity of networked data and the capacity for per system data processing. Despite this, the growth in demand in both task variety and task complexity continues unabated. Networked computer systems provide a fertile environment in which new applications develop. As networked computer systems become akin to infrastructure, any limitation upon the growth in capacity and capabilities becomes an important constraint of concern to all computer users. Considering a networked computer system capable of processing terabits per second, as a benchmark for scalability, we critique the state of the art in commodity computing, and propose a wholesale reconsideration in the design of computer architectures and their attendant ecosystem. Our proposal seeks to reduce costs, save power and increase performance in a multi-scale approach that has potential application from nanoscale to data-centre-scale computers. © 2016 The Authors.

  8. Next generation distributed computing for cancer research.

    PubMed

    Agarwal, Pankaj; Owzar, Kouros

    2014-01-01

    Advances in next generation sequencing (NGS) and mass spectrometry (MS) technologies have provided many new opportunities and angles for extending the scope of translational cancer research while creating tremendous challenges in data management and analysis. The resulting informatics challenge is invariably not amenable to the use of traditional computing models. Recent advances in scalable computing and associated infrastructure, particularly distributed computing for Big Data, can provide solutions for addressing these challenges. In this review, the next generation of distributed computing technologies that can address these informatics problems is described from the perspective of three key components of a computational platform, namely computing, data storage and management, and networking. A broad overview of scalable computing is provided to set the context for a detailed description of Hadoop, a technology that is being rapidly adopted for large-scale distributed computing. A proof-of-concept Hadoop cluster, set up for performance benchmarking of NGS read alignment, is described as an example of how to work with Hadoop. Finally, Hadoop is compared with a number of other current technologies for distributed computing.

  9. Security Analysis of Smart Grid Cyber Physical Infrastructures Using Modeling and Game Theoretic Simulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abercrombie, Robert K; Sheldon, Frederick T.

    Cyber physical computing infrastructures typically consist of a number of sites are interconnected. Its operation critically depends both on cyber components and physical components. Both types of components are subject to attacks of different kinds and frequencies, which must be accounted for the initial provisioning and subsequent operation of the infrastructure via information security analysis. Information security analysis can be performed using game theory implemented in dynamic Agent Based Game Theoretic (ABGT) simulations. Such simulations can be verified with the results from game theory analysis and further used to explore larger scale, real world scenarios involving multiple attackers, defenders, andmore » information assets. We concentrated our analysis on the electric sector failure scenarios and impact analyses by the NESCOR Working Group Study, From the Section 5 electric sector representative failure scenarios; we extracted the four generic failure scenarios and grouped them into three specific threat categories (confidentiality, integrity, and availability) to the system. These specific failure scenarios serve as a demonstration of our simulation. The analysis using our ABGT simulation demonstrates how to model the electric sector functional domain using a set of rationalized game theoretic rules decomposed from the failure scenarios in terms of how those scenarios might impact the cyber physical infrastructure network with respect to CIA.« less

  10. In-Use and Emerging Disruptive Technology Trends

    DTIC Science & Technology

    2015-03-31

    blog/establishing-zero-trust- infrastructure / (accessed No- vember 7, 2014) Mobile Thin Client End Points In the early days of computing, the...companies are using their network infrastructure to break into the mobile broadband market. For example, Ca- blevision recently began providing a Wi-Fi...smartphones and mobile devic- es will be used within the Pentagon. A building-wide cellular infrastructure is not the an- swer to retrieving and sending

  11. Services and the National Information Infrastructure. Report of the Information Infrastructure Task Force Committee on Applications and Technology, Technology Policy Working Group. Draft for Public Comment.

    ERIC Educational Resources Information Center

    Office of Science and Technology Policy, Washington, DC.

    In this report, the National Information Infrastructure (NII) services issue is addressed, and activities to advance the development of NII services are recommended. The NII is envisioned to grow into a seamless web of communications networks, computers, databases, and consumer electronics that will put vast amounts of information at users'…

  12. Design and implementation of a remote UAV-based mobile health monitoring system

    NASA Astrophysics Data System (ADS)

    Li, Songwei; Wan, Yan; Fu, Shengli; Liu, Mushuang; Wu, H. Felix

    2017-04-01

    Unmanned aerial vehicles (UAVs) play increasing roles in structure health monitoring. With growing mobility in modern Internet-of-Things (IoT) applications, the health monitoring of mobile structures becomes an emerging application. In this paper, we develop a UAV-carried vision-based monitoring system that allows a UAV to continuously track and monitor a mobile infrastructure and transmit back the monitoring information in real- time from a remote location. The monitoring system uses a simple UAV-mounted camera and requires only a single feature located on the mobile infrastructure for target detection and tracking. The computation-effective vision-based tracking solution based on a single feature is an improvement over existing vision-based lead-follower tracking systems that either have poor tracking performance due to the use of a single feature, or have improved tracking performance at a cost of the usage of multiple features. In addition, a UAV-carried aerial networking infrastructure using directional antennas is used to enable robust real-time transmission of monitoring video streams over a long distance. Automatic heading control is used to self-align headings of directional antennas to enable robust communication in mobility. Compared to existing omni-communication systems, the directional communication solution significantly increases the operation range of remote monitoring systems. In this paper, we develop the integrated modeling framework of camera and mobile platforms, design the tracking algorithm, develop a testbed of UAVs and mobile platforms, and evaluate system performance through both simulation studies and field tests.

  13. New Trends in E-Science: Machine Learning and Knowledge Discovery in Databases

    NASA Astrophysics Data System (ADS)

    Brescia, Massimo

    2012-11-01

    Data mining, or Knowledge Discovery in Databases (KDD), while being the main methodology to extract the scientific information contained in Massive Data Sets (MDS), needs to tackle crucial problems since it has to orchestrate complex challenges posed by transparent access to different computing environments, scalability of algorithms, reusability of resources. To achieve a leap forward for the progress of e-science in the data avalanche era, the community needs to implement an infrastructure capable of performing data access, processing and mining in a distributed but integrated context. The increasing complexity of modern technologies carried out a huge production of data, whose related warehouse management and the need to optimize analysis and mining procedures lead to a change in concept on modern science. Classical data exploration, based on local user own data storage and limited computing infrastructures, is no more efficient in the case of MDS, worldwide spread over inhomogeneous data centres and requiring teraflop processing power. In this context modern experimental and observational science requires a good understanding of computer science, network infrastructures, Data Mining, etc. i.e. of all those techniques which fall into the domain of the so called e-science (recently assessed also by the Fourth Paradigm of Science). Such understanding is almost completely absent in the older generations of scientists and this reflects in the inadequacy of most academic and research programs. A paradigm shift is needed: statistical pattern recognition, object oriented programming, distributed computing, parallel programming need to become an essential part of scientific background. A possible practical solution is to provide the research community with easy-to understand, easy-to-use tools, based on the Web 2.0 technologies and Machine Learning methodology. Tools where almost all the complexity is hidden to the final user, but which are still flexible and able to produce efficient and reliable scientific results. All these considerations will be described in the detail in the chapter. Moreover, examples of modern applications offering to a wide variety of e-science communities a large spectrum of computational facilities to exploit the wealth of available massive data sets and powerful machine learning and statistical algorithms will be also introduced.

  14. Tracking the NGS revolution: managing life science research on shared high-performance computing clusters.

    PubMed

    Dahlö, Martin; Scofield, Douglas G; Schaal, Wesley; Spjuth, Ola

    2018-05-01

    Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences. The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat. Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.

  15. An economic and financial exploratory

    NASA Astrophysics Data System (ADS)

    Cincotti, S.; Sornette, D.; Treleaven, P.; Battiston, S.; Caldarelli, G.; Hommes, C.; Kirman, A.

    2012-11-01

    This paper describes the vision of a European Exploratory for economics and finance using an interdisciplinary consortium of economists, natural scientists, computer scientists and engineers, who will combine their expertise to address the enormous challenges of the 21st century. This Academic Public facility is intended for economic modelling, investigating all aspects of risk and stability, improving financial technology, and evaluating proposed regulatory and taxation changes. The European Exploratory for economics and finance will be constituted as a network of infrastructure, observatories, data repositories, services and facilities and will foster the creation of a new cross-disciplinary research community of social scientists, complexity scientists and computing (ICT) scientists to collaborate in investigating major issues in economics and finance. It is also considered a cradle for training and collaboration with the private sector to spur spin-offs and job creations in Europe in the finance and economic sectors. The Exploratory will allow Social Scientists and Regulators as well as Policy Makers and the private sector to conduct realistic investigations with real economic, financial and social data. The Exploratory will (i) continuously monitor and evaluate the status of the economies of countries in their various components, (ii) use, extend and develop a large variety of methods including data mining, process mining, computational and artificial intelligence and every other computer and complex science techniques coupled with economic theory and econometric, and (iii) provide the framework and infrastructure to perform what-if analysis, scenario evaluations and computational, laboratory, field and web experiments to inform decision makers and help develop innovative policy, market and regulation designs.

  16. Tracking the NGS revolution: managing life science research on shared high-performance computing clusters

    PubMed Central

    2018-01-01

    Abstract Background Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences. Results The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat. Conclusions Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases. PMID:29659792

  17. Building analytical platform with Big Data solutions for log files of PanDA infrastructure

    NASA Astrophysics Data System (ADS)

    Alekseev, A. A.; Barreiro Megino, F. G.; Klimentov, A. A.; Korchuganova, T. A.; Maendo, T.; Padolski, S. V.

    2018-05-01

    The paper describes the implementation of a high-performance system for the processing and analysis of log files for the PanDA infrastructure of the ATLAS experiment at the Large Hadron Collider (LHC), responsible for the workload management of order of 2M daily jobs across the Worldwide LHC Computing Grid. The solution is based on the ELK technology stack, which includes several components: Filebeat, Logstash, ElasticSearch (ES), and Kibana. Filebeat is used to collect data from logs. Logstash processes data and export to Elasticsearch. ES are responsible for centralized data storage. Accumulated data in ES can be viewed using a special software Kibana. These components were integrated with the PanDA infrastructure and replaced previous log processing systems for increased scalability and usability. The authors will describe all the components and their configuration tuning for the current tasks, the scale of the actual system and give several real-life examples of how this centralized log processing and storage service is used to showcase the advantages for daily operations.

  18. Extensible Infrastructure for Browsing and Searching Abstracted Spacecraft Data

    NASA Technical Reports Server (NTRS)

    Wallick, Michael N.; Crockett, Thomas M.; Joswig, Joseph C.; Torres, Recaredo J.; Norris, Jeffrey S.; Fox, Jason M.; Powell, Mark W.; Mittman, David S.; Abramyan, Lucy; Shams, Khawaja S.; hide

    2009-01-01

    A computer program has been developed to provide a common interface for all space mission data, and allows different types of data to be displayed in the same context. This software provides an infrastructure for representing any type of mission data.

  19. Performance Analysis, Design Considerations, and Applications of Extreme-Scale In Situ Infrastructures

    DOE PAGES

    Ayachit, Utkarsh; Bauer, Andrew; Duque, Earl P. N.; ...

    2016-11-01

    A key trend facing extreme-scale computational science is the widening gap between computational and I/O rates, and the challenge that follows is how to best gain insight from simulation data when it is increasingly impractical to save it to persistent storage for subsequent visual exploration and analysis. One approach to this challenge is centered around the idea of in situ processing, where visualization and analysis processing is performed while data is still resident in memory. Our paper examines several key design and performance issues related to the idea of in situ processing at extreme scale on modern platforms: Scalability, overhead,more » performance measurement and analysis, comparison and contrast with a traditional post hoc approach, and interfacing with simulation codes. We illustrate these principles in practice with studies, conducted on large-scale HPC platforms, that include a miniapplication and multiple science application codes, one of which demonstrates in situ methods in use at greater than 1M-way concurrency.« less

  20. Redefining Tactical Operations for MER Using Cloud Computing

    NASA Technical Reports Server (NTRS)

    Joswig, Joseph C.; Shams, Khawaja S.

    2011-01-01

    The Mars Exploration Rover Mission (MER) includes the twin rovers, Spirit and Opportunity, which have been performing geological research and surface exploration since early 2004. The rovers' durability well beyond their original prime mission (90 sols or Martian days) has allowed them to be a valuable platform for scientific research for well over 2000 sols, but as a by-product it has produced new challenges in providing efficient and cost-effective tactical operational planning. An early stage process adaptation was the move to distributed operations as mission scientists returned to their places of work in the summer of 2004, but they would still came together via teleconference and connected software to plan rover activities a few times a week. This distributed model has worked well since, but it requires the purchase, operation, and maintenance of a dedicated infrastructure at the Jet Propulsion Laboratory. This server infrastructure is costly to operate and the periodic nature of its usage (typically heavy usage for 8 hours every 2 days) has made moving to a cloud based tactical infrastructure an extremely tempting proposition. In this paper we will review both past and current implementations of the tactical planning application focusing on remote plan saving and discuss the unique challenges present with long-latency, distributed operations. We then detail the motivations behind our move to cloud based computing services and as well as our system design and implementation. We will discuss security and reliability concerns and how they were addressed

  1. Towards sustainable infrastructure management: knowledge-based service-oriented computing framework for visual analytics

    NASA Astrophysics Data System (ADS)

    Vatcha, Rashna; Lee, Seok-Won; Murty, Ajeet; Tolone, William; Wang, Xiaoyu; Dou, Wenwen; Chang, Remco; Ribarsky, William; Liu, Wanqiu; Chen, Shen-en; Hauser, Edd

    2009-05-01

    Infrastructure management (and its associated processes) is complex to understand, perform and thus, hard to make efficient and effective informed decisions. The management involves a multi-faceted operation that requires the most robust data fusion, visualization and decision making. In order to protect and build sustainable critical assets, we present our on-going multi-disciplinary large-scale project that establishes the Integrated Remote Sensing and Visualization (IRSV) system with a focus on supporting bridge structure inspection and management. This project involves specific expertise from civil engineers, computer scientists, geographers, and real-world practitioners from industry, local and federal government agencies. IRSV is being designed to accommodate the essential needs from the following aspects: 1) Better understanding and enforcement of complex inspection process that can bridge the gap between evidence gathering and decision making through the implementation of ontological knowledge engineering system; 2) Aggregation, representation and fusion of complex multi-layered heterogeneous data (i.e. infrared imaging, aerial photos and ground-mounted LIDAR etc.) with domain application knowledge to support machine understandable recommendation system; 3) Robust visualization techniques with large-scale analytical and interactive visualizations that support users' decision making; and 4) Integration of these needs through the flexible Service-oriented Architecture (SOA) framework to compose and provide services on-demand. IRSV is expected to serve as a management and data visualization tool for construction deliverable assurance and infrastructure monitoring both periodically (annually, monthly, even daily if needed) as well as after extreme events.

  2. A Study of ATLAS Grid Performance for Distributed Analysis

    NASA Astrophysics Data System (ADS)

    Panitkin, Sergey; Fine, Valery; Wenaus, Torre

    2012-12-01

    In the past two years the ATLAS Collaboration at the LHC has collected a large volume of data and published a number of ground breaking papers. The Grid-based ATLAS distributed computing infrastructure played a crucial role in enabling timely analysis of the data. We will present a study of the performance and usage of the ATLAS Grid as platform for physics analysis in 2011. This includes studies of general properties as well as timing properties of user jobs (wait time, run time, etc). These studies are based on mining of data archived by the PanDA workload management system.

  3. OSiRIS: a distributed Ceph deployment using software defined networking for multi-institutional research

    NASA Astrophysics Data System (ADS)

    McKee, Shawn; Kissel, Ezra; Meekhof, Benjeman; Swany, Martin; Miller, Charles; Gregorowicz, Michael

    2017-10-01

    We report on the first year of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) which is targeting the creation of a distributed Ceph storage infrastructure coupled together with software-defined networking to provide high-performance access for well-connected locations on any participating campus. The projects goal is to provide a single scalable, distributed storage infrastructure that allows researchers at each campus to read, write, manage and share data directly from their own computing locations. The NSF CC*DNI DIBBS program which funded OSiRIS is seeking solutions to the challenges of multi-institutional collaborations involving large amounts of data and we are exploring the creative use of Ceph and networking to address those challenges. While OSiRIS will eventually be serving a broad range of science domains, its first adopter will be the LHC ATLAS detector project via the ATLAS Great Lakes Tier-2 (AGLT2) jointly located at the University of Michigan and Michigan State University. Part of our presentation will cover how ATLAS is using the OSiRIS infrastructure and our experiences integrating our first user community. The presentation will also review the motivations for and goals of the project, the technical details of the OSiRIS infrastructure, the challenges in providing such an infrastructure, and the technical choices made to address those challenges. We will conclude with our plans for the remaining 4 years of the project and our vision for what we hope to deliver by the projects end.

  4. Multi-Objective Approach for Energy-Aware Workflow Scheduling in Cloud Computing Environments

    PubMed Central

    Kadima, Hubert; Granado, Bertrand

    2013-01-01

    We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach. PMID:24319361

  5. Multi-objective approach for energy-aware workflow scheduling in cloud computing environments.

    PubMed

    Yassa, Sonia; Chelouah, Rachid; Kadima, Hubert; Granado, Bertrand

    2013-01-01

    We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach.

  6. The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi

    The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less

  7. The Archive Solution for Distributed Workflow Management Agents of the CMS Experiment at LHC

    DOE PAGES

    Kuznetsov, Valentin; Fischer, Nils Leif; Guo, Yuyi

    2018-03-19

    The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregatemore » $$\\mathcal{O}$$(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.« less

  8. Framework for Deploying a Virtualized Computing Environment for Collaborative and Secure Data Analytics

    PubMed Central

    Meyer, Adrian; Green, Laura; Faulk, Ciearro; Galla, Stephen; Meyer, Anne-Marie

    2016-01-01

    Introduction: Large amounts of health data generated by a wide range of health care applications across a variety of systems have the potential to offer valuable insight into populations and health care systems, but robust and secure computing and analytic systems are required to leverage this information. Framework: We discuss our experiences deploying a Secure Data Analysis Platform (SeDAP), and provide a framework to plan, build and deploy a virtual desktop infrastructure (VDI) to enable innovation, collaboration and operate within academic funding structures. It outlines 6 core components: Security, Ease of Access, Performance, Cost, Tools, and Training. Conclusion: A platform like SeDAP is not simply successful through technical excellence and performance. It’s adoption is dependent on a collaborative environment where researchers and users plan and evaluate the requirements of all aspects. PMID:27683665

  9. Pavement Technology and Airport Infrastructure Expansion Impact

    NASA Astrophysics Data System (ADS)

    Sabib; Setiawan, M. I.; Kurniasih, N.; Ahmar, A. S.; Hasyim, C.

    2018-01-01

    This research aims for analyzing construction and infrastructure development activities potential contribution towards Airport Performance. This research is correlation study with variable research that includes Airport Performance as X variable and construction and infrastructure development activities as Y variable. The population in this research is 148 airports in Indonesia. The sampling technique uses total sampling, which means 148 airports that becomes the population unit then all of it become samples. The results of coefficient correlation (R) test showed that construction and infrastructure development activities variable have a relatively strong relationship with Airport Performance variable, but the value of Adjusted R Square shows that an increase in the construction and infrastructure development activities is influenced by factor other than Airport Performance.

  10. The diverse use of clouds by CMS

    DOE PAGES

    Andronis, Anastasios; Bauer, Daniela; Chaze, Olivier; ...

    2015-12-23

    The resources CMS is using are increasingly being offered as clouds. In Run 2 of the LHC the majority of CMS CERN resources, both in Meyrin and at the Wigner Computing Centre, will be presented as cloud resources on which CMS will have to build its own infrastructure. This infrastructure will need to run all of the CMS workflows including: Tier 0, production and user analysis. In addition, the CMS High Level Trigger will provide a compute resource comparable in scale to the total offered by the CMS Tier 1 sites, when it is not running as part of themore » trigger system. During these periods a cloud infrastructure will be overlaid on this resource, making it accessible for general CMS use. Finally, CMS is starting to utilise cloud resources being offered by individual institutes and is gaining experience to facilitate the use of opportunistically available cloud resources. Lastly, we present a snap shot of this infrastructure and its operation at the time of the CHEP2015 conference.« less

  11. Open source system OpenVPN in a function of Virtual Private Network

    NASA Astrophysics Data System (ADS)

    Skendzic, A.; Kovacic, B.

    2017-05-01

    Using of Virtual Private Networks (VPN) can establish high security level in network communication. VPN technology enables high security networking using distributed or public network infrastructure. VPN uses different security and managing rules inside networks. It can be set up using different communication channels like Internet or separate ISP communication infrastructure. VPN private network makes security communication channel over public network between two endpoints (computers). OpenVPN is an open source software product under GNU General Public License (GPL) that can be used to establish VPN communication between two computers inside business local network over public communication infrastructure. It uses special security protocols and 256-bit Encryption and it is capable of traversing network address translators (NATs) and firewalls. It allows computers to authenticate each other using a pre-shared secret key, certificates or username and password. This work gives review of VPN technology with a special accent on OpenVPN. This paper will also give comparison and financial benefits of using open source VPN software in business environment.

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hale, M.A.; Craig, J.I.

    Integrated Product and Process Development (IPPD) embodies the simultaneous application to both system and quality engineering methods throughout an iterative design process. The use of IPPD results in the time-conscious, cost-saving development of engineering systems. To implement IPPD, a Decision-Based Design perspective is encapsulated in an approach that focuses on the role of the human designer in product development. The approach has two parts and is outlined in this paper. First, an architecture, called DREAMS, is being developed that facilitates design from a decision-based perspective. Second, a supporting computing infrastructure, called IMAGE, is being designed. Agents are used to implementmore » the overall infrastructure on the computer. Successful agent utilization requires that they be made of three components: the resource, the model, and the wrap. Current work is focused on the development of generalized agent schemes and associated demonstration projects. When in place, the technology independent computing infrastructure will aid the designer in systematically generating knowledge used to facilitate decision-making.« less

  13. Cloudweaver: Adaptive and Data-Driven Workload Manager for Generic Clouds

    NASA Astrophysics Data System (ADS)

    Li, Rui; Chen, Lei; Li, Wen-Syan

    Cloud computing denotes the latest trend in application development for parallel computing on massive data volumes. It relies on clouds of servers to handle tasks that used to be managed by an individual server. With cloud computing, software vendors can provide business intelligence and data analytic services for internet scale data sets. Many open source projects, such as Hadoop, offer various software components that are essential for building a cloud infrastructure. Current Hadoop (and many others) requires users to configure cloud infrastructures via programs and APIs and such configuration is fixed during the runtime. In this chapter, we propose a workload manager (WLM), called CloudWeaver, which provides automated configuration of a cloud infrastructure for runtime execution. The workload management is data-driven and can adapt to dynamic nature of operator throughput during different execution phases. CloudWeaver works for a single job and a workload consisting of multiple jobs running concurrently, which aims at maximum throughput using a minimum set of processors.

  14. Assessing the uptake of persistent identifiers by research infrastructure users

    PubMed Central

    Maull, Keith E.

    2017-01-01

    Significant progress has been made in the past few years in the development of recommendations, policies, and procedures for creating and promoting citations to data sets, software, and other research infrastructures like computing facilities. Open questions remain, however, about the extent to which referencing practices of authors of scholarly publications are changing in ways desired by these initiatives. This paper uses four focused case studies to evaluate whether research infrastructures are being increasingly identified and referenced in the research literature via persistent citable identifiers. The findings of the case studies show that references to such resources are increasing, but that the patterns of these increases are variable. In addition, the study suggests that citation practices for data sets may change more slowly than citation practices for software and research facilities, due to the inertia of existing practices for referencing the use of data. Similarly, existing practices for acknowledging computing support may slow the adoption of formal citations for computing resources. PMID:28394907

  15. The StratusLab cloud distribution: Use-cases and support for scientific applications

    NASA Astrophysics Data System (ADS)

    Floros, E.

    2012-04-01

    The StratusLab project is integrating an open cloud software distribution that enables organizations to setup and provide their own private or public IaaS (Infrastructure as a Service) computing clouds. StratusLab distribution capitalizes on popular infrastructure virtualization solutions like KVM, the OpenNebula virtual machine manager, Claudia service manager and SlipStream deployment platform, which are further enhanced and expanded with additional components developed within the project. The StratusLab distribution covers the core aspects of a cloud IaaS architecture, namely Computing (life-cycle management of virtual machines), Storage, Appliance management and Networking. The resulting software stack provides a packaged turn-key solution for deploying cloud computing services. The cloud computing infrastructures deployed using StratusLab can support a wide range of scientific and business use cases. Grid computing has been the primary use case pursued by the project and for this reason the initial priority has been the support for the deployment and operation of fully virtualized production-level grid sites; a goal that has already been achieved by operating such a site as part of EGI's (European Grid Initiative) pan-european grid infrastructure. In this area the project is currently working to provide non-trivial capabilities like elastic and autonomic management of grid site resources. Although grid computing has been the motivating paradigm, StratusLab's cloud distribution can support a wider range of use cases. Towards this direction, we have developed and currently provide support for setting up general purpose computing solutions like Hadoop, MPI and Torque clusters. For what concerns scientific applications the project is collaborating closely with the Bioinformatics community in order to prepare VM appliances and deploy optimized services for bioinformatics applications. In a similar manner additional scientific disciplines like Earth Science can take advantage of StratusLab cloud solutions. Interested users are welcomed to join StratusLab's user community by getting access to the reference cloud services deployed by the project and offered to the public.

  16. Infrastructure for Training and Partnershipes: California Water and Coastal Ocean Resources

    NASA Technical Reports Server (NTRS)

    Siegel, David A.; Dozier, Jeffrey; Gautier, Catherine; Davis, Frank; Dickey, Tommy; Dunne, Thomas; Frew, James; Keller, Arturo; MacIntyre, Sally; Melack, John

    2000-01-01

    The purpose of this project was to advance the existing ICESS/Bren School computing infrastructure to allow scientists, students, and research trainees the opportunity to interact with environmental data and simulations in near-real time. Improvements made with the funding from this project have helped to strengthen the research efforts within both units, fostered graduate research training, and helped fortify partnerships with government and industry. With this funding, we were able to expand our computational environment in which computer resources, software, and data sets are shared by ICESS/Bren School faculty researchers in all areas of Earth system science. All of the graduate and undergraduate students associated with the Donald Bren School of Environmental Science and Management and the Institute for Computational Earth System Science have benefited from the infrastructure upgrades accomplished by this project. Additionally, the upgrades fostered a significant number of research projects (attached is a list of the projects that benefited from the upgrades). As originally proposed, funding for this project provided the following infrastructure upgrades: 1) a modem file management system capable of interoperating UNIX and NT file systems that can scale to 6.7 TB, 2) a Qualstar 40-slot tape library with two AIT tape drives and Legato Networker backup/archive software, 3) previously unavailable import/export capability for data sets on Zip, Jaz, DAT, 8mm, CD, and DLT media in addition to a 622Mb/s Internet 2 connection, 4) network switches capable of 100 Mbps to 128 desktop workstations, 5) Portable Batch System (PBS) computational task scheduler, and vi) two Compaq/Digital Alpha XP1000 compute servers each with 1.5 GB of RAM along with an SGI Origin 2000 (purchased partially using funds from this project along with funding from various other sources) to be used for very large computations, as required for simulation of mesoscale meteorology or climate.

  17. Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing.

    PubMed

    Memon, Farhat N; Owen, Anne M; Sanchez-Graillet, Olivia; Upton, Graham J G; Harrison, Andrew P

    2010-01-15

    A tetramer quadruplex structure is formed by four parallel strands of DNA/ RNA containing runs of guanine. These quadruplexes are able to form because guanine can Hoogsteen hydrogen bond to other guanines, and a tetrad of guanines can form a stable arrangement. Recently we have discovered that probes on Affymetrix GeneChips that contain runs of guanine do not measure gene expression reliably. We associate this finding with the likelihood that quadruplexes are forming on the surface of GeneChips. In order to cope with the rapidly expanding size of GeneChip array datasets in the public domain, we are exploring the use of cloud computing to replicate our experiments on 3' arrays to look at the effect of the location of G-spots (runs of guanines). Cloud computing is a recently introduced high-performance solution that takes advantage of the computational infrastructure of large organisations such as Amazon and Google. We expect that cloud computing will become widely adopted because it enables bioinformaticians to avoid capital expenditure on expensive computing resources and to only pay a cloud computing provider for what is used. Moreover, as well as financial efficiency, cloud computing is an ecologically-friendly technology, it enables efficient data-sharing and we expect it to be faster for development purposes. Here we propose the advantageous use of cloud computing to perform a large data-mining analysis of public domain 3' arrays.

  18. Use of Terrestrial Laser Scanner for Rigid Airport Pavement Management.

    PubMed

    Barbarella, Maurizio; D'Amico, Fabrizio; De Blasiis, Maria Rosaria; Di Benedetto, Alessandro; Fiani, Margherita

    2017-12-26

    The evaluation of the structural efficiency of airport infrastructures is a complex task. Faulting is one of the most important indicators of rigid pavement performance. The aim of our study is to provide a new method for faulting detection and computation on jointed concrete pavements. Nowadays, the assessment of faulting is performed with the use of laborious and time-consuming measurements that strongly hinder aircraft traffic. We proposed a field procedure for Terrestrial Laser Scanner data acquisition and a computation flow chart in order to identify and quantify the fault size at each joint of apron slabs. The total point cloud has been used to compute the least square plane fitting those points. The best-fit plane for each slab has been computed too. The attitude of each slab plane with respect to both the adjacent ones and the apron reference plane has been determined by the normal vectors to the surfaces. Faulting has been evaluated as the difference in elevation between the slab planes along chosen sections. For a more accurate evaluation of the faulting value, we have then considered a few strips of data covering rectangular areas of different sizes across the joints. The accuracy of the estimated quantities has been computed too.

  19. Sector and Sphere: the design and implementation of a high-performance data cloud

    PubMed Central

    Gu, Yunhong; Grossman, Robert L.

    2009-01-01

    Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but also across geographically distributed data centres. Similarly, the Sphere compute cloud supports user-defined functions (UDFs) over data both within and across data centres. As a special case, MapReduce-style programming can be implemented in Sphere by using a Map UDF followed by a Reduce UDF. We describe some experimental studies comparing Sector/Sphere and Hadoop using the Terasort benchmark. In these studies, Sector is approximately twice as fast as Hadoop. Sector/Sphere is open source. PMID:19451100

  20. Physicists Get INSPIREd: INSPIRE Project and Grid Applications

    NASA Astrophysics Data System (ADS)

    Klem, Jukka; Iwaszkiewicz, Jan

    2011-12-01

    INSPIRE is the new high-energy physics scientific information system developed by CERN, DESY, Fermilab and SLAC. INSPIRE combines the curated and trusted contents of SPIRES database with Invenio digital library technology. INSPIRE contains the entire HEP literature with about one million records and in addition to becoming the reference HEP scientific information platform, it aims to provide new kinds of data mining services and metrics to assess the impact of articles and authors. Grid and cloud computing provide new opportunities to offer better services in areas that require large CPU and storage resources including document Optical Character Recognition (OCR) processing, full-text indexing of articles and improved metrics. D4Science-II is a European project that develops and operates an e-Infrastructure supporting Virtual Research Environments (VREs). It develops an enabling technology (gCube) which implements a mechanism for facilitating the interoperation of its e-Infrastructure with other autonomously running data e-Infrastructures. As a result, this creates the core of an e-Infrastructure ecosystem. INSPIRE is one of the e-Infrastructures participating in D4Science-II project. In the context of the D4Science-II project, the INSPIRE e-Infrastructure makes available some of its resources and services to other members of the resulting ecosystem. Moreover, it benefits from the ecosystem via a dedicated Virtual Organization giving access to an array of resources ranging from computing and storage resources of grid infrastructures to data and services.

  1. Understanding the Performance and Potential of Cloud Computing for Scientific Applications

    DOE PAGES

    Sadooghi, Iman; Martin, Jesus Hernandez; Li, Tonglin; ...

    2015-02-19

    In this paper, commercial clouds bring a great opportunity to the scientific computing area. Scientific applications usually require significant resources, however not all scientists have access to sufficient high-end computing systems, may of which can be found in the Top500 list. Cloud Computing has gained the attention of scientists as a competitive resource to run HPC applications at a potentially lower cost. But as a different infrastructure, it is unclear whether clouds are capable of running scientific applications with a reasonable performance per money spent. This work studies the performance of public clouds and places this performance in context tomore » price. We evaluate the raw performance of different services of AWS cloud in terms of the basic resources, such as compute, memory, network and I/O. We also evaluate the performance of the scientific applications running in the cloud. This paper aims to assess the ability of the cloud to perform well, as well as to evaluate the cost of the cloud running scientific applications. We developed a full set of metrics and conducted a comprehensive performance evlauation over the Amazon cloud. We evaluated EC2, S3, EBS and DynamoDB among the many Amazon AWS services. We evaluated the memory sub-system performance with CacheBench, the network performance with iperf, processor and network performance with the HPL benchmark application, and shared storage with NFS and PVFS in addition to S3. We also evaluated a real scientific computing application through the Swift parallel scripting system at scale. Armed with both detailed benchmarks to gauge expected performance and a detailed monetary cost analysis, we expect this paper will be a recipe cookbook for scientists to help them decide where to deploy and run their scientific applications between public clouds, private clouds, or hybrid clouds.« less

  2. Understanding the Performance and Potential of Cloud Computing for Scientific Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sadooghi, Iman; Martin, Jesus Hernandez; Li, Tonglin

    In this paper, commercial clouds bring a great opportunity to the scientific computing area. Scientific applications usually require significant resources, however not all scientists have access to sufficient high-end computing systems, may of which can be found in the Top500 list. Cloud Computing has gained the attention of scientists as a competitive resource to run HPC applications at a potentially lower cost. But as a different infrastructure, it is unclear whether clouds are capable of running scientific applications with a reasonable performance per money spent. This work studies the performance of public clouds and places this performance in context tomore » price. We evaluate the raw performance of different services of AWS cloud in terms of the basic resources, such as compute, memory, network and I/O. We also evaluate the performance of the scientific applications running in the cloud. This paper aims to assess the ability of the cloud to perform well, as well as to evaluate the cost of the cloud running scientific applications. We developed a full set of metrics and conducted a comprehensive performance evlauation over the Amazon cloud. We evaluated EC2, S3, EBS and DynamoDB among the many Amazon AWS services. We evaluated the memory sub-system performance with CacheBench, the network performance with iperf, processor and network performance with the HPL benchmark application, and shared storage with NFS and PVFS in addition to S3. We also evaluated a real scientific computing application through the Swift parallel scripting system at scale. Armed with both detailed benchmarks to gauge expected performance and a detailed monetary cost analysis, we expect this paper will be a recipe cookbook for scientists to help them decide where to deploy and run their scientific applications between public clouds, private clouds, or hybrid clouds.« less

  3. SEED: A Suite of Instructional Laboratories for Computer Security Education

    ERIC Educational Resources Information Center

    Du, Wenliang; Wang, Ronghua

    2008-01-01

    The security and assurance of our computing infrastructure has become a national priority. To address this priority, higher education has gradually incorporated the principles of computer and information security into the mainstream undergraduate and graduate computer science curricula. To achieve effective education, learning security principles…

  4. Storing and using health data in a virtual private cloud.

    PubMed

    Regola, Nathan; Chawla, Nitesh V

    2013-03-13

    Electronic health records are being adopted at a rapid rate due to increased funding from the US federal government. Health data provide the opportunity to identify possible improvements in health care delivery by applying data mining and statistical methods to the data and will also enable a wide variety of new applications that will be meaningful to patients and medical professionals. Researchers are often granted access to health care data to assist in the data mining process, but HIPAA regulations mandate comprehensive safeguards to protect the data. Often universities (and presumably other research organizations) have an enterprise information technology infrastructure and a research infrastructure. Unfortunately, both of these infrastructures are generally not appropriate for sensitive research data such as HIPAA, as they require special accommodations on the part of the enterprise information technology (or increased security on the part of the research computing environment). Cloud computing, which is a concept that allows organizations to build complex infrastructures on leased resources, is rapidly evolving to the point that it is possible to build sophisticated network architectures with advanced security capabilities. We present a prototype infrastructure in Amazon's Virtual Private Cloud to allow researchers and practitioners to utilize the data in a HIPAA-compliant environment.

  5. Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing

    PubMed Central

    Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon

    2011-01-01

    Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811

  6. Design and development of a run-time monitor for multi-core architectures in cloud computing.

    PubMed

    Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon

    2011-01-01

    Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.

  7. An Open Computing Infrastructure that Facilitates Integrated Product and Process Development from a Decision-Based Perspective

    NASA Technical Reports Server (NTRS)

    Hale, Mark A.

    1996-01-01

    Computer applications for design have evolved rapidly over the past several decades, and significant payoffs are being achieved by organizations through reductions in design cycle times. These applications are overwhelmed by the requirements imposed during complex, open engineering systems design. Organizations are faced with a number of different methodologies, numerous legacy disciplinary tools, and a very large amount of data. Yet they are also faced with few interdisciplinary tools for design collaboration or methods for achieving the revolutionary product designs required to maintain a competitive advantage in the future. These organizations are looking for a software infrastructure that integrates current corporate design practices with newer simulation and solution techniques. Such an infrastructure must be robust to changes in both corporate needs and enabling technologies. In addition, this infrastructure must be user-friendly, modular and scalable. This need is the motivation for the research described in this dissertation. The research is focused on the development of an open computing infrastructure that facilitates product and process design. In addition, this research explicitly deals with human interactions during design through a model that focuses on the role of a designer as that of decision-maker. The research perspective here is taken from that of design as a discipline with a focus on Decision-Based Design, Theory of Languages, Information Science, and Integration Technology. Given this background, a Model of IPPD is developed and implemented along the lines of a traditional experimental procedure: with the steps of establishing context, formalizing a theory, building an apparatus, conducting an experiment, reviewing results, and providing recommendations. Based on this Model, Design Processes and Specification can be explored in a structured and implementable architecture. An architecture for exploring design called DREAMS (Developing Robust Engineering Analysis Models and Specifications) has been developed which supports the activities of both meta-design and actual design execution. This is accomplished through a systematic process which is comprised of the stages of Formulation, Translation, and Evaluation. During this process, elements from a Design Specification are integrated into Design Processes. In addition, a software infrastructure was developed and is called IMAGE (Intelligent Multidisciplinary Aircraft Generation Environment). This represents a virtual apparatus in the Design Experiment conducted in this research. IMAGE is an innovative architecture because it explicitly supports design-related activities. This is accomplished through a GUI driven and Agent-based implementation of DREAMS. A HSCT design has been adopted from the Framework for Interdisciplinary Design Optimization (FIDO) and is implemented in IMAGE. This problem shows how Design Processes and Specification interact in a design system. In addition, the problem utilizes two different solution models concurrently: optimal and satisfying. The satisfying model allows for more design flexibility and allows a designer to maintain design freedom. As a result of following this experimental procedure, this infrastructure is an open system that it is robust to changes in both corporate needs and computer technologies. The development of this infrastructure leads to a number of significant intellectual contributions: 1) A new approach to implementing IPPD with the aid of a computer; 2) A formal Design Experiment; 3) A combined Process and Specification architecture that is language-based; 4) An infrastructure for exploring design; 5) An integration strategy for implementing computer resources; and 6) A seamless modeling language. The need for these contributions is emphasized by the demand by industry and government agencies for the development of these technologies.

  8. Grid Computing at GSI for ALICE and FAIR - present and future

    NASA Astrophysics Data System (ADS)

    Schwarz, Kilian; Uhlig, Florian; Karabowicz, Radoslaw; Montiel-Gonzalez, Almudena; Zynovyev, Mykhaylo; Preuss, Carsten

    2012-12-01

    The future FAIR experiments CBM and PANDA have computing requirements that fall in a category that could currently not be satisfied by one single computing centre. One needs a larger, distributed computing infrastructure to cope with the amount of data to be simulated and analysed. Since 2002, GSI operates a tier2 center for ALICE@CERN. The central component of the GSI computing facility and hence the core of the ALICE tier2 centre is a LSF/SGE batch farm, currently split into three subclusters with a total of 15000 CPU cores shared by the participating experiments, and accessible both locally and soon also completely via Grid. In terms of data storage, a 5.5 PB Lustre file system, directly accessible from all worker nodes is maintained, as well as a 300 TB xrootd-based Grid storage element. Based on this existing expertise, and utilising ALICE's middleware ‘AliEn’, the Grid infrastructure for PANDA and CBM is being built. Besides a tier0 centre at GSI, the computing Grids of the two FAIR collaborations encompass now more than 17 sites in 11 countries and are constantly expanding. The operation of the distributed FAIR computing infrastructure benefits significantly from the experience gained with the ALICE tier2 centre. A close collaboration between ALICE Offline and FAIR provides mutual advantages. The employment of a common Grid middleware as well as compatible simulation and analysis software frameworks ensure significant synergy effects.

  9. Advanced Optical Burst Switched Network Concepts

    NASA Astrophysics Data System (ADS)

    Nejabati, Reza; Aracil, Javier; Castoldi, Piero; de Leenheer, Marc; Simeonidou, Dimitra; Valcarenghi, Luca; Zervas, Georgios; Wu, Jian

    In recent years, as the bandwidth and the speed of networks have increased significantly, a new generation of network-based applications using the concept of distributed computing and collaborative services is emerging (e.g., Grid computing applications). The use of the available fiber and DWDM infrastructure for these applications is a logical choice offering huge amounts of cheap bandwidth and ensuring global reach of computing resources [230]. Currently, there is a great deal of interest in deploying optical circuit (wavelength) switched network infrastructure for distributed computing applications that require long-lived wavelength paths and address the specific needs of a small number of well-known users. Typical users are particle physicists who, due to their international collaborations and experiments, generate enormous amounts of data (Petabytes per year). These users require a network infrastructures that can support processing and analysis of large datasets through globally distributed computing resources [230]. However, providing wavelength granularity bandwidth services is not an efficient and scalable solution for applications and services that address a wider base of user communities with different traffic profiles and connectivity requirements. Examples of such applications may be: scientific collaboration in smaller scale (e.g., bioinformatics, environmental research), distributed virtual laboratories (e.g., remote instrumentation), e-health, national security and defense, personalized learning environments and digital libraries, evolving broadband user services (i.e., high resolution home video editing, real-time rendering, high definition interactive TV). As a specific example, in e-health services and in particular mammography applications due to the size and quantity of images produced by remote mammography, stringent network requirements are necessary. Initial calculations have shown that for 100 patients to be screened remotely, the network would have to securely transport 1.2 GB of data every 30 s [230]. According to the above explanation it is clear that these types of applications need a new network infrastructure and transport technology that makes large amounts of bandwidth at subwavelength granularity, storage, computation, and visualization resources potentially available to a wide user base for specified time durations. As these types of collaborative and network-based applications evolve addressing a wide range and large number of users, it is infeasible to build dedicated networks for each application type or category. Consequently, there should be an adaptive network infrastructure able to support all application types, each with their own access, network, and resource usage patterns. This infrastructure should offer flexible and intelligent network elements and control mechanism able to deploy new applications quickly and efficiently.

  10. Facilitating NASA Earth Science Data Processing Using Nebula Cloud Computing

    NASA Astrophysics Data System (ADS)

    Chen, A.; Pham, L.; Kempler, S.; Theobald, M.; Esfandiari, A.; Campino, J.; Vollmer, B.; Lynnes, C.

    2011-12-01

    Cloud Computing technology has been used to offer high-performance and low-cost computing and storage resources for both scientific problems and business services. Several cloud computing services have been implemented in the commercial arena, e.g. Amazon's EC2 & S3, Microsoft's Azure, and Google App Engine. There are also some research and application programs being launched in academia and governments to utilize Cloud Computing. NASA launched the Nebula Cloud Computing platform in 2008, which is an Infrastructure as a Service (IaaS) to deliver on-demand distributed virtual computers. Nebula users can receive required computing resources as a fully outsourced service. NASA Goddard Earth Science Data and Information Service Center (GES DISC) migrated several GES DISC's applications to the Nebula as a proof of concept, including: a) The Simple, Scalable, Script-based Science Processor for Measurements (S4PM) for processing scientific data; b) the Atmospheric Infrared Sounder (AIRS) data process workflow for processing AIRS raw data; and c) the GES-DISC Interactive Online Visualization ANd aNalysis Infrastructure (GIOVANNI) for online access to, analysis, and visualization of Earth science data. This work aims to evaluate the practicability and adaptability of the Nebula. The initial work focused on the AIRS data process workflow to evaluate the Nebula. The AIRS data process workflow consists of a series of algorithms being used to process raw AIRS level 0 data and output AIRS level 2 geophysical retrievals. Migrating the entire workflow to the Nebula platform is challenging, but practicable. After installing several supporting libraries and the processing code itself, the workflow is able to process AIRS data in a similar fashion to its current (non-cloud) configuration. We compared the performance of processing 2 days of AIRS level 0 data through level 2 using a Nebula virtual computer and a local Linux computer. The result shows that Nebula has significantly better performance than the local machine. Much of the difference was due to newer equipment in the Nebula than the legacy computer, which is suggestive of a potential economic advantage beyond elastic power, i.e., access to up-to-date hardware vs. legacy hardware that must be maintained past its prime to amortize the cost. In addition to a trade study of advantages and challenges of porting complex processing to the cloud, a tutorial was developed to enable further progress in utilizing the Nebula for Earth Science applications and understanding better the potential for Cloud Computing in further data- and computing-intensive Earth Science research. In particular, highly bursty computing such as that experienced in the user-demand-driven Giovanni system may become more tractable in a Cloud environment. Our future work will continue to focus on migrating more GES DISC's applications/instances, e.g. Giovanni instances, to the Nebula platform and making matured migrated applications to be in operation on the Nebula.

  11. NGScloud: RNA-seq analysis of non-model species using cloud computing.

    PubMed

    Mora-Márquez, Fernando; Vázquez-Poletti, José Luis; López de Heredia, Unai

    2018-05-03

    RNA-seq analysis usually requires large computing infrastructures. NGScloud is a bioinformatic system developed to analyze RNA-seq data using the cloud computing services of Amazon that permit the access to ad hoc computing infrastructure scaled according to the complexity of the experiment, so its costs and times can be optimized. The application provides a user-friendly front-end to operate Amazon's hardware resources, and to control a workflow of RNA-seq analysis oriented to non-model species, incorporating the cluster concept, which allows parallel runs of common RNA-seq analysis programs in several virtual machines for faster analysis. NGScloud is freely available at https://github.com/GGFHF/NGScloud/. A manual detailing installation and how-to-use instructions is available with the distribution. unai.lopezdeheredia@upm.es.

  12. Bioinformatics clouds for big data manipulation.

    PubMed

    Dai, Lin; Gao, Xin; Guo, Yan; Xiao, Jingfa; Zhang, Zhang

    2012-11-28

    As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.

  13. A high performance computing framework for physics-based modeling and simulation of military ground vehicles

    NASA Astrophysics Data System (ADS)

    Negrut, Dan; Lamb, David; Gorsich, David

    2011-06-01

    This paper describes a software infrastructure made up of tools and libraries designed to assist developers in implementing computational dynamics applications running on heterogeneous and distributed computing environments. Together, these tools and libraries compose a so called Heterogeneous Computing Template (HCT). The heterogeneous and distributed computing hardware infrastructure is assumed herein to be made up of a combination of CPUs and Graphics Processing Units (GPUs). The computational dynamics applications targeted to execute on such a hardware topology include many-body dynamics, smoothed-particle hydrodynamics (SPH) fluid simulation, and fluid-solid interaction analysis. The underlying theme of the solution approach embraced by HCT is that of partitioning the domain of interest into a number of subdomains that are each managed by a separate core/accelerator (CPU/GPU) pair. Five components at the core of HCT enable the envisioned distributed computing approach to large-scale dynamical system simulation: (a) the ability to partition the problem according to the one-to-one mapping; i.e., spatial subdivision, discussed above (pre-processing); (b) a protocol for passing data between any two co-processors; (c) algorithms for element proximity computation; and (d) the ability to carry out post-processing in a distributed fashion. In this contribution the components (a) and (b) of the HCT are demonstrated via the example of the Discrete Element Method (DEM) for rigid body dynamics with friction and contact. The collision detection task required in frictional-contact dynamics (task (c) above), is shown to benefit on the GPU of a two order of magnitude gain in efficiency when compared to traditional sequential implementations. Note: Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not imply its endorsement, recommendation, or favoring by the United States Army. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Army, and shall not be used for advertising or product endorsement purposes.

  14. Securing the Data Storage and Processing in Cloud Computing Environment

    ERIC Educational Resources Information Center

    Owens, Rodney

    2013-01-01

    Organizations increasingly utilize cloud computing architectures to reduce costs and energy consumption both in the data warehouse and on mobile devices by better utilizing the computing resources available. However, the security and privacy issues with publicly available cloud computing infrastructures have not been studied to a sufficient depth…

  15. Building Efficient Wireless Infrastructures for Pervasive Computing Environments

    ERIC Educational Resources Information Center

    Sheng, Bo

    2010-01-01

    Pervasive computing is an emerging concept that thoroughly brings computing devices and the consequent technology into people's daily life and activities. Most of these computing devices are very small, sometimes even "invisible", and often embedded into the objects surrounding people. In addition, these devices usually are not isolated, but…

  16. Federated data storage and management infrastructure

    NASA Astrophysics Data System (ADS)

    Zarochentsev, A.; Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Hristov, P.

    2016-10-01

    The Large Hadron Collider (LHC)’ operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. Computing models for the High Luminosity LHC era anticipate a growth of storage needs of at least orders of magnitude; it will require new approaches in data storage organization and data handling. In our project we address the fundamental problem of designing of architecture to integrate a distributed heterogeneous disk resources for LHC experiments and other data- intensive science applications and to provide access to data from heterogeneous computing facilities. We have prototyped a federated storage for Russian T1 and T2 centers located in Moscow, St.-Petersburg and Gatchina, as well as Russian / CERN federation. We have conducted extensive tests of underlying network infrastructure and storage endpoints with synthetic performance measurement tools as well as with HENP-specific workloads, including the ones running on supercomputing platform, cloud computing and Grid for ALICE and ATLAS experiments. We will present our current accomplishments with running LHC data analysis remotely and locally to demonstrate our ability to efficiently use federated data storage experiment wide within National Academic facilities for High Energy and Nuclear Physics as well as for other data-intensive science applications, such as bio-informatics.

  17. DIRAC universal pilots

    NASA Astrophysics Data System (ADS)

    Stagni, F.; McNab, A.; Luzzi, C.; Krzemien, W.; Consortium, DIRAC

    2017-10-01

    In the last few years, new types of computing models, such as IAAS (Infrastructure as a Service) and IAAC (Infrastructure as a Client), gained popularity. New resources may come as part of pledged resources, while others are in the form of opportunistic ones. Most but not all of these new infrastructures are based on virtualization techniques. In addition, some of them, present opportunities for multi-processor computing slots to the users. Virtual Organizations are therefore facing heterogeneity of the available resources and the use of an Interware software like DIRAC to provide the transparent, uniform interface has become essential. The transparent access to the underlying resources is realized by implementing the pilot model. DIRAC’s newest generation of generic pilots (the so-called Pilots 2.0) are the “pilots for all the skies”, and have been successfully released in production more than a year ago. They use a plugin mechanism that makes them easily adaptable. Pilots 2.0 have been used for fetching and running jobs on every type of resource, being it a Worker Node (WN) behind a CREAM/ARC/HTCondor/DIRAC Computing element, a Virtual Machine running on IaaC infrastructures like Vac or BOINC, on IaaS cloud resources managed by Vcycle, the LHCb High Level Trigger farm nodes, and any type of opportunistic computing resource. Make a machine a “Pilot Machine”, and all diversities between them will disappear. This contribution describes how pilots are made suitable for different resources, and the recent steps taken towards a fully unified framework, including monitoring. Also, the cases of multi-processor computing slots either on real or virtual machines, with the whole node or a partition of it, is discussed.

  18. NiftyNet: a deep-learning platform for medical imaging.

    PubMed

    Gibson, Eli; Li, Wenqi; Sudre, Carole; Fidon, Lucas; Shakir, Dzhoshkun I; Wang, Guotai; Eaton-Rosen, Zach; Gray, Robert; Doel, Tom; Hu, Yipeng; Whyntie, Tom; Nachev, Parashkev; Modat, Marc; Barratt, Dean C; Ourselin, Sébastien; Cardoso, M Jorge; Vercauteren, Tom

    2018-05-01

    Medical image analysis and computer-assisted intervention problems are increasingly being addressed with deep-learning-based solutions. Established deep-learning platforms are flexible but do not provide specific functionality for medical image analysis and adapting them for this domain of application requires substantial implementation effort. Consequently, there has been substantial duplication of effort and incompatible infrastructure developed across many research groups. This work presents the open-source NiftyNet platform for deep learning in medical imaging. The ambition of NiftyNet is to accelerate and simplify the development of these solutions, and to provide a common mechanism for disseminating research outputs for the community to use, adapt and build upon. The NiftyNet infrastructure provides a modular deep-learning pipeline for a range of medical imaging applications including segmentation, regression, image generation and representation learning applications. Components of the NiftyNet pipeline including data loading, data augmentation, network architectures, loss functions and evaluation metrics are tailored to, and take advantage of, the idiosyncracies of medical image analysis and computer-assisted intervention. NiftyNet is built on the TensorFlow framework and supports features such as TensorBoard visualization of 2D and 3D images and computational graphs by default. We present three illustrative medical image analysis applications built using NiftyNet infrastructure: (1) segmentation of multiple abdominal organs from computed tomography; (2) image regression to predict computed tomography attenuation maps from brain magnetic resonance images; and (3) generation of simulated ultrasound images for specified anatomical poses. The NiftyNet infrastructure enables researchers to rapidly develop and distribute deep learning solutions for segmentation, regression, image generation and representation learning applications, or extend the platform to new applications. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  19. DICOMGrid: a middleware to integrate PACS and EELA-2 grid infrastructure

    NASA Astrophysics Data System (ADS)

    Moreno, Ramon A.; de Sá Rebelo, Marina; Gutierrez, Marco A.

    2010-03-01

    Medical images provide lots of information for physicians, but the huge amount of data produced by medical image equipments in a modern Health Institution is not completely explored in its full potential yet. Nowadays medical images are used in hospitals mostly as part of routine activities while its intrinsic value for research is underestimated. Medical images can be used for the development of new visualization techniques, new algorithms for patient care and new image processing techniques. These research areas usually require the use of huge volumes of data to obtain significant results, along with enormous computing capabilities. Such qualities are characteristics of grid computing systems such as EELA-2 infrastructure. The grid technologies allow the sharing of data in large scale in a safe and integrated environment and offer high computing capabilities. In this paper we describe the DicomGrid to store and retrieve medical images, properly anonymized, that can be used by researchers to test new processing techniques, using the computational power offered by grid technology. A prototype of the DicomGrid is under evaluation and permits the submission of jobs into the EELA-2 grid infrastructure while offering a simple interface that requires minimal understanding of the grid operation.

  20. Optimization of tomographic reconstruction workflows on geographically distributed resources

    DOE PAGES

    Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...

    2016-01-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less

  1. Optimization of tomographic reconstruction workflows on geographically distributed resources

    PubMed Central

    Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.

    2016-01-01

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149

  2. Optimization of tomographic reconstruction workflows on geographically distributed resources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar

    New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less

  3. Cyber-Physical Correlations for Infrastructure Resilience: A Game-Theoretic Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rao, Nageswara S; He, Fei; Ma, Chris Y. T.

    In several critical infrastructures, the cyber and physical parts are correlated so that disruptions to one affect the other and hence the whole system. These correlations may be exploited to strategically launch components attacks, and hence must be accounted for ensuring the infrastructure resilience, specified by its survival probability. We characterize the cyber-physical interactions at two levels: (i) the failure correlation function specifies the conditional survival probability of cyber sub-infrastructure given the physical sub-infrastructure as a function of their marginal probabilities, and (ii) the individual survival probabilities of both sub-infrastructures are characterized by first-order differential conditions. We formulate a resiliencemore » problem for infrastructures composed of discrete components as a game between the provider and attacker, wherein their utility functions consist of an infrastructure survival probability term and a cost term expressed in terms of the number of components attacked and reinforced. We derive Nash Equilibrium conditions and sensitivity functions that highlight the dependence of infrastructure resilience on the cost term, correlation function and sub-infrastructure survival probabilities. These results generalize earlier ones based on linear failure correlation functions and independent component failures. We apply the results to models of cloud computing infrastructures and energy grids.« less

  4. Note on evaluating safety performance of road infrastructure to motivate safety competition.

    PubMed

    Han, Sangjin

    2016-01-01

    Road infrastructures are usually developed and maintained by governments or public sectors. There is no competitor in the market of their jurisdiction. This monopolic feature discourages road authorities from improving the level of safety with proactive motivation. This study suggests how to apply a principle of competition for roads, in particular by means of performance evaluation. It first discusses why road infrastructure has been slow in safety oriented development and management in respect of its business model. Then it suggests some practical ways of how to promote road safety between road authorities, particularly by evaluating safety performance of road infrastructure. These are summarized as decision of safety performance indicators, classification of spatial boundaries, data collection, evaluation, and reporting. Some consideration points are also discussed to make safety performance evaluation on road infrastructure lead to better road safety management.

  5. Accessible high performance computing solutions for near real-time image processing for time critical applications

    NASA Astrophysics Data System (ADS)

    Bielski, Conrad; Lemoine, Guido; Syryczynski, Jacek

    2009-09-01

    High Performance Computing (HPC) hardware solutions such as grid computing and General Processing on a Graphics Processing Unit (GPGPU) are now accessible to users with general computing needs. Grid computing infrastructures in the form of computing clusters or blades are becoming common place and GPGPU solutions that leverage the processing power of the video card are quickly being integrated into personal workstations. Our interest in these HPC technologies stems from the need to produce near real-time maps from a combination of pre- and post-event satellite imagery in support of post-disaster management. Faster processing provides a twofold gain in this situation: 1. critical information can be provided faster and 2. more elaborate automated processing can be performed prior to providing the critical information. In our particular case, we test the use of the PANTEX index which is based on analysis of image textural measures extracted using anisotropic, rotation-invariant GLCM statistics. The use of this index, applied in a moving window, has been shown to successfully identify built-up areas in remotely sensed imagery. Built-up index image masks are important input to the structuring of damage assessment interpretation because they help optimise the workload. The performance of computing the PANTEX workflow is compared on two different HPC hardware architectures: (1) a blade server with 4 blades, each having dual quad-core CPUs and (2) a CUDA enabled GPU workstation. The reference platform is a dual CPU-quad core workstation and the PANTEX workflow total computing time is measured. Furthermore, as part of a qualitative evaluation, the differences in setting up and configuring various hardware solutions and the related software coding effort is presented.

  6. Challenges in Real-Time Prediction of Infectious Disease: A Case Study of Dengue in Thailand

    PubMed Central

    Lauer, Stephen A.; Sakrejda, Krzysztof; Iamsirithaworn, Sopon; Hinjoy, Soawapak; Suangtho, Paphanij; Suthachana, Suthanun; Clapham, Hannah E.; Salje, Henrik; Cummings, Derek A. T.; Lessler, Justin

    2016-01-01

    Epidemics of communicable diseases place a huge burden on public health infrastructures across the world. Producing accurate and actionable forecasts of infectious disease incidence at short and long time scales will improve public health response to outbreaks. However, scientists and public health officials face many obstacles in trying to create such real-time forecasts of infectious disease incidence. Dengue is a mosquito-borne virus that annually infects over 400 million people worldwide. We developed a real-time forecasting model for dengue hemorrhagic fever in the 77 provinces of Thailand. We created a practical computational infrastructure that generated multi-step predictions of dengue incidence in Thai provinces every two weeks throughout 2014. These predictions show mixed performance across provinces, out-performing seasonal baseline models in over half of provinces at a 1.5 month horizon. Additionally, to assess the degree to which delays in case reporting make long-range prediction a challenging task, we compared the performance of our real-time predictions with predictions made with fully reported data. This paper provides valuable lessons for the implementation of real-time predictions in the context of public health decision making. PMID:27304062

  7. GraphMeta: Managing HPC Rich Metadata in Graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dai, Dong; Chen, Yong; Carns, Philip

    High-performance computing (HPC) systems face increasingly critical metadata management challenges, especially in the approaching exascale era. These challenges arise not only from exploding metadata volumes, but also from increasingly diverse metadata, which contains data provenance and arbitrary user-defined attributes in addition to traditional POSIX metadata. This ‘rich’ metadata is becoming critical to supporting advanced data management functionality such as data auditing and validation. In our prior work, we identified a graph-based model as a promising solution to uniformly manage HPC rich metadata due to its flexibility and generality. However, at the same time, graph-based HPC rich metadata anagement also introducesmore » significant challenges to the underlying infrastructure. In this study, we first identify the challenges on the underlying infrastructure to support scalable, high-performance rich metadata management. Based on that, we introduce GraphMeta, a graphbased engine designed for this use case. It achieves performance scalability by introducing a new graph partitioning algorithm and a write-optimal storage engine. We evaluate GraphMeta under both synthetic and real HPC metadata workloads, compare it with other approaches, and demonstrate its advantages in terms of efficiency and usability for rich metadata management in HPC systems.« less

  8. Challenges in Real-Time Prediction of Infectious Disease: A Case Study of Dengue in Thailand.

    PubMed

    Reich, Nicholas G; Lauer, Stephen A; Sakrejda, Krzysztof; Iamsirithaworn, Sopon; Hinjoy, Soawapak; Suangtho, Paphanij; Suthachana, Suthanun; Clapham, Hannah E; Salje, Henrik; Cummings, Derek A T; Lessler, Justin

    2016-06-01

    Epidemics of communicable diseases place a huge burden on public health infrastructures across the world. Producing accurate and actionable forecasts of infectious disease incidence at short and long time scales will improve public health response to outbreaks. However, scientists and public health officials face many obstacles in trying to create such real-time forecasts of infectious disease incidence. Dengue is a mosquito-borne virus that annually infects over 400 million people worldwide. We developed a real-time forecasting model for dengue hemorrhagic fever in the 77 provinces of Thailand. We created a practical computational infrastructure that generated multi-step predictions of dengue incidence in Thai provinces every two weeks throughout 2014. These predictions show mixed performance across provinces, out-performing seasonal baseline models in over half of provinces at a 1.5 month horizon. Additionally, to assess the degree to which delays in case reporting make long-range prediction a challenging task, we compared the performance of our real-time predictions with predictions made with fully reported data. This paper provides valuable lessons for the implementation of real-time predictions in the context of public health decision making.

  9. Use of information and communication technology and retention of health workers in rural post-war conflict Northern Uganda: findings from a qualitative study.

    PubMed

    Yagos, Walter Onen; Tabo Olok, Geoffrey; Ovuga, Emilio

    2017-01-10

    Information and communication technologies have become a vital infrastructural asset for use in the retention of rural health workers. However, little is known about the potential influence of ICT use, perceptions of health workers on ICT in healthcare delivery, and contribution of ICT to health care providers' retention in rural and remote areas in rural post-war and conflict situations of northern Uganda. Data from interviews were transcribed, coded and thematically analysed. Participants generally exhibited low confidence, knowledge and low ICT skills. Majority of participants, however, perceived ICT as beneficial in relation to job performance and health care provider retention in rural areas. Common barriers for the implementation and use of ICT in health centres were inadequate ICT knowledge and skills, poor Internet networks, inadequate computers, inadequate power supply, lack of Internet Modems and expensive access to outside computer centres. This qualitative study showed low confidence, poor knowledge and skills in ICT usage but positive perceptions about the benefits and contributions of ICT. These findings suggest the need for specific investment in ICT infrastructural development for health care providers in remote rural areas of northern Uganda.

  10. SLEEC: Semantics-Rich Libraries for Effective Exascale Computation. Final Technical Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Milind, Kulkarni

    SLEEC (Semantics-rich Libraries for Effective Exascale Computation) was a project funded by the Department of Energy X-Stack Program, award number DE-SC0008629. The initial project period was September 2012–August 2015. The project was renewed for an additional year, expiring August 2016. Finally, the project received a no-cost extension, leading to a final expiry date of August 2017. Modern applications, especially those intended to run at exascale, are not written from scratch. Instead, they are built by stitching together various carefully-written, hand-tuned libraries. Correctly composing these libraries is difficult, but traditional compilers are unable to effectively analyze and transform across abstraction layers.more » Domain specific compilers integrate semantic knowledge into compilers, allowing them to transform applications that use particular domain-specific languages, or domain libraries. But they do not help when new domains are developed, or applications span multiple domains. SLEEC aims to fix these problems. To do so, we are building generic compiler and runtime infrastructures that are semantics-aware but not domain-specific. By performing optimizations related to the semantics of a domain library, the same infrastructure can be made generic and apply across multiple domains.« less

  11. DICOM relay over the cloud.

    PubMed

    Silva, Luís A Bastião; Costa, Carlos; Oliveira, José Luis

    2013-05-01

    Healthcare institutions worldwide have adopted picture archiving and communication system (PACS) for enterprise access to images, relying on Digital Imaging Communication in Medicine (DICOM) standards for data exchange. However, communication over a wider domain of independent medical institutions is not well standardized. A DICOM-compliant bridge was developed for extending and sharing DICOM services across healthcare institutions without requiring complex network setups or dedicated communication channels. A set of DICOM routers interconnected through a public cloud infrastructure was implemented to support medical image exchange among institutions. Despite the advantages of cloud computing, new challenges were encountered regarding data privacy, particularly when medical data are transmitted over different domains. To address this issue, a solution was introduced by creating a ciphered data channel between the entities sharing DICOM services. Two main DICOM services were implemented in the bridge: Storage and Query/Retrieve. The performance measures demonstrated it is quite simple to exchange information and processes between several institutions. The solution can be integrated with any currently installed PACS-DICOM infrastructure. This method works transparently with well-known cloud service providers. Cloud computing was introduced to augment enterprise PACS by providing standard medical imaging services across different institutions, offering communication privacy and enabling creation of wider PACS scenarios with suitable technical solutions.

  12. Nbody Simulations and Weak Gravitational Lensing using new HPC-Grid resources: the PI2S2 project

    NASA Astrophysics Data System (ADS)

    Becciani, U.; Antonuccio-Delogu, V.; Costa, A.; Comparato, M.

    2008-08-01

    We present the main project of the new grid infrastructure and the researches, that have been already started in Sicily and will be completed by next year. The PI2S2 project of the COMETA consortium is funded by the Italian Ministry of University and Research and will be completed in 2009. Funds are from the European Union Structural Funds for Objective 1 regions. The project, together with a similar project called Trinacria GRID Virtual Laboratory (Trigrid VL), aims to create in Sicily a computational grid for e-science and e-commerce applications with the main goal of increasing the technological innovation of local enterprises and their competition on the global market. PI2S2 project aims to build and develop an e-Infrastructure in Sicily, based on the grid paradigm, mainly for research activity using the grid environment and High Performance Computer systems. As an example we present the first results of a new grid version of FLY a tree Nbody code developed by INAF Astrophysical Observatory of Catania, already published in the CPC program Library, that will be used in the Weak Gravitational Lensing field.

  13. Standard requirements for GCP-compliant data management in multinational clinical trials.

    PubMed

    Ohmann, Christian; Kuchinke, Wolfgang; Canham, Steve; Lauritsen, Jens; Salas, Nader; Schade-Brittinger, Carmen; Wittenberg, Michael; McPherson, Gladys; McCourt, John; Gueyffier, Francois; Lorimer, Andrea; Torres, Ferràn

    2011-03-22

    A recent survey has shown that data management in clinical trials performed by academic trial units still faces many difficulties (e.g. heterogeneity of software products, deficits in quality management, limited human and financial resources and the complexity of running a local computer centre). Unfortunately, no specific, practical and open standard for both GCP-compliant data management and the underlying IT-infrastructure is available to improve the situation. For that reason the "Working Group on Data Centres" of the European Clinical Research Infrastructures Network (ECRIN) has developed a standard specifying the requirements for high quality GCP-compliant data management in multinational clinical trials. International, European and national regulations and guidelines relevant to GCP, data security and IT infrastructures, as well as ECRIN documents produced previously, were evaluated to provide a starting point for the development of standard requirements. The requirements were produced by expert consensus of the ECRIN Working group on Data Centres, using a structured and standardised process. The requirements were divided into two main parts: an IT part covering standards for the underlying IT infrastructure and computer systems in general, and a Data Management (DM) part covering requirements for data management applications in clinical trials. The standard developed includes 115 IT requirements, split into 15 separate sections, 107 DM requirements (in 12 sections) and 13 other requirements (2 sections). Sections IT01 to IT05 deal with the basic IT infrastructure while IT06 and IT07 cover validation and local software development. IT08 to IT015 concern the aspects of IT systems that directly support clinical trial management. Sections DM01 to DM03 cover the implementation of a specific clinical data management application, i.e. for a specific trial, whilst DM04 to DM12 address the data management of trials across the unit. Section IN01 is dedicated to international aspects and ST01 to the competence of a trials unit's staff. The standard is intended to provide an open and widely used set of requirements for GCP-compliant data management, particularly in academic trial units. It is the intention that ECRIN will use these requirements as the basis for the certification of ECRIN data centres.

  14. Integration of Russian Tier-1 Grid Center with High Performance Computers at NRC-KI for LHC experiments and beyond HENP

    NASA Astrophysics Data System (ADS)

    Belyaev, A.; Berezhnaya, A.; Betev, L.; Buncic, P.; De, K.; Drizhuk, D.; Klimentov, A.; Lazin, Y.; Lyalin, I.; Mashinistov, R.; Novikov, A.; Oleynik, D.; Polyakov, A.; Poyda, A.; Ryabinkin, E.; Teslyuk, A.; Tkachenko, I.; Yasnopolskiy, L.

    2015-12-01

    The LHC experiments are preparing for the precision measurements and further discoveries that will be made possible by higher LHC energies from April 2015 (LHC Run2). The need for simulation, data processing and analysis would overwhelm the expected capacity of grid infrastructure computing facilities deployed by the Worldwide LHC Computing Grid (WLCG). To meet this challenge the integration of the opportunistic resources into LHC computing model is highly important. The Tier-1 facility at Kurchatov Institute (NRC-KI) in Moscow is a part of WLCG and it will process, simulate and store up to 10% of total data obtained from ALICE, ATLAS and LHCb experiments. In addition Kurchatov Institute has supercomputers with peak performance 0.12 PFLOPS. The delegation of even a fraction of supercomputing resources to the LHC Computing will notably increase total capacity. In 2014 the development a portal combining a Tier-1 and a supercomputer in Kurchatov Institute was started to provide common interfaces and storage. The portal will be used not only for HENP experiments, but also by other data- and compute-intensive sciences like biology with genome sequencing analysis; astrophysics with cosmic rays analysis, antimatter and dark matter search, etc.

  15. Managing competing elastic Grid and Cloud scientific computing applications using OpenNebula

    NASA Astrophysics Data System (ADS)

    Bagnasco, S.; Berzano, D.; Lusso, S.; Masera, M.; Vallero, S.

    2015-12-01

    Elastic cloud computing applications, i.e. applications that automatically scale according to computing needs, work on the ideal assumption of infinite resources. While large public cloud infrastructures may be a reasonable approximation of this condition, scientific computing centres like WLCG Grid sites usually work in a saturated regime, in which applications compete for scarce resources through queues, priorities and scheduling policies, and keeping a fraction of the computing cores idle to allow for headroom is usually not an option. In our particular environment one of the applications (a WLCG Tier-2 Grid site) is much larger than all the others and cannot autoscale easily. Nevertheless, other smaller applications can benefit of automatic elasticity; the implementation of this property in our infrastructure, based on the OpenNebula cloud stack, will be described and the very first operational experiences with a small number of strategies for timely allocation and release of resources will be discussed.

  16. ENFIN--A European network for integrative systems biology.

    PubMed

    Kahlem, Pascal; Clegg, Andrew; Reisinger, Florian; Xenarios, Ioannis; Hermjakob, Henning; Orengo, Christine; Birney, Ewan

    2009-11-01

    Integration of biological data of various types and the development of adapted bioinformatics tools represent critical objectives to enable research at the systems level. The European Network of Excellence ENFIN is engaged in developing an adapted infrastructure to connect databases, and platforms to enable both the generation of new bioinformatics tools and the experimental validation of computational predictions. With the aim of bridging the gap existing between standard wet laboratories and bioinformatics, the ENFIN Network runs integrative research projects to bring the latest computational techniques to bear directly on questions dedicated to systems biology in the wet laboratory environment. The Network maintains internally close collaboration between experimental and computational research, enabling a permanent cycling of experimental validation and improvement of computational prediction methods. The computational work includes the development of a database infrastructure (EnCORE), bioinformatics analysis methods and a novel platform for protein function analysis FuncNet.

  17. A Big Data Platform for Storing, Accessing, Mining and Learning Geospatial Data

    NASA Astrophysics Data System (ADS)

    Yang, C. P.; Bambacus, M.; Duffy, D.; Little, M. M.

    2017-12-01

    Big Data is becoming a norm in geoscience domains. A platform that is capable to effiently manage, access, analyze, mine, and learn the big data for new information and knowledge is desired. This paper introduces our latest effort on developing such a platform based on our past years' experiences on cloud and high performance computing, analyzing big data, comparing big data containers, and mining big geospatial data for new information. The platform includes four layers: a) the bottom layer includes a computing infrastructure with proper network, computer, and storage systems; b) the 2nd layer is a cloud computing layer based on virtualization to provide on demand computing services for upper layers; c) the 3rd layer is big data containers that are customized for dealing with different types of data and functionalities; d) the 4th layer is a big data presentation layer that supports the effient management, access, analyses, mining and learning of big geospatial data.

  18. The cost of getting CCS wrong: Uncertainty, infrastructure design, and stranded CO 2

    DOE PAGES

    Middleton, Richard Stephen; Yaw, Sean Patrick

    2018-01-11

    Carbon capture, and storage (CCS) infrastructure will require industry—such as fossil-fuel power, ethanol production, and oil and gas extraction—to make massive investment in infrastructure. The cost of getting these investments wrong will be substantial and will impact the success of CCS technology. Multiple factors can and will impact the success of commercial-scale CCS, including significant uncertainties regarding capture, transport, and injection-storage decisions. Uncertainties throughout the CCS supply chain include policy, technology, engineering performance, economics, and market forces. In particular, large uncertainties exist for the injection and storage of CO 2. Even taking into account upfront investment in site characterization, themore » final performance of the storage phase is largely unknown until commercial-scale injection has started. We explore and quantify the impact of getting CCS infrastructure decisions wrong based on uncertain injection rates and uncertain CO 2 storage capacities using a case study managing CO 2 emissions from the Canadian oil sands industry in Alberta. We use SimCCS, a widely used CCS infrastructure design framework, to develop multiple CCS infrastructure scenarios. Each scenario consists of a CCS infrastructure network that connects CO 2 sources (oil sands extraction and processing) with CO 2 storage reservoirs (acid gas storage reservoirs) using a dedicated CO 2 pipeline network. Each scenario is analyzed under a range of uncertain storage estimates and infrastructure performance is assessed and quantified in terms of cost to build additional infrastructure to store all CO 2. We also include the role of stranded CO 2, CO 2 that a source was expecting to but cannot capture due substandard performance in the transport and storage infrastructure. Results show that the cost of getting the original infrastructure design wrong are significant and that comprehensive planning will be required to ensure that CCS becomes a successful climate mitigation technology. Here, we show that the concept of stranded CO 2 can transform a seemingly high-performing infrastructure design into the worst case scenario.« less

  19. The cost of getting CCS wrong: Uncertainty, infrastructure design, and stranded CO 2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Middleton, Richard Stephen; Yaw, Sean Patrick

    Carbon capture, and storage (CCS) infrastructure will require industry—such as fossil-fuel power, ethanol production, and oil and gas extraction—to make massive investment in infrastructure. The cost of getting these investments wrong will be substantial and will impact the success of CCS technology. Multiple factors can and will impact the success of commercial-scale CCS, including significant uncertainties regarding capture, transport, and injection-storage decisions. Uncertainties throughout the CCS supply chain include policy, technology, engineering performance, economics, and market forces. In particular, large uncertainties exist for the injection and storage of CO 2. Even taking into account upfront investment in site characterization, themore » final performance of the storage phase is largely unknown until commercial-scale injection has started. We explore and quantify the impact of getting CCS infrastructure decisions wrong based on uncertain injection rates and uncertain CO 2 storage capacities using a case study managing CO 2 emissions from the Canadian oil sands industry in Alberta. We use SimCCS, a widely used CCS infrastructure design framework, to develop multiple CCS infrastructure scenarios. Each scenario consists of a CCS infrastructure network that connects CO 2 sources (oil sands extraction and processing) with CO 2 storage reservoirs (acid gas storage reservoirs) using a dedicated CO 2 pipeline network. Each scenario is analyzed under a range of uncertain storage estimates and infrastructure performance is assessed and quantified in terms of cost to build additional infrastructure to store all CO 2. We also include the role of stranded CO 2, CO 2 that a source was expecting to but cannot capture due substandard performance in the transport and storage infrastructure. Results show that the cost of getting the original infrastructure design wrong are significant and that comprehensive planning will be required to ensure that CCS becomes a successful climate mitigation technology. Here, we show that the concept of stranded CO 2 can transform a seemingly high-performing infrastructure design into the worst case scenario.« less

  20. The SCEC Community Modeling Environment(SCEC/CME): A Collaboratory for Seismic Hazard Analysis

    NASA Astrophysics Data System (ADS)

    Maechling, P. J.; Jordan, T. H.; Minster, J. B.; Moore, R.; Kesselman, C.

    2005-12-01

    The SCEC Community Modeling Environment (SCEC/CME) Project is an NSF-supported Geosciences/IT partnership that is actively developing an advanced information infrastructure for system-level earthquake science in Southern California. This partnership includes SCEC, USC's Information Sciences Institute (ISI), the San Diego Supercomputer Center (SDSC), the Incorporated Institutions for Research in Seismology (IRIS), and the U.S. Geological Survey. The goal of the SCEC/CME is to develop seismological applications and information technology (IT) infrastructure to support the development of Seismic Hazard Analysis (SHA) programs and other geophysical simulations. The SHA application programs developed on the Project include a Probabilistic Seismic Hazard Analysis system called OpenSHA. OpenSHA computational elements that are currently available include a collection of attenuation relationships, and several Earthquake Rupture Forecasts (ERFs). Geophysicists in the collaboration have also developed Anelastic Wave Models (AWMs) using both finite-difference and finite-element approaches. Earthquake simulations using these codes have been run for a variety of earthquake sources. Rupture Dynamic Model (RDM) codes have also been developed that simulate friction-based fault slip. The SCEC/CME collaboration has also developed IT software and hardware infrastructure to support the development, execution, and analysis of these SHA programs. To support computationally expensive simulations, we have constructed a grid-based scientific workflow system. Using the SCEC grid, project collaborators can submit computations from the SCEC/CME servers to High Performance Computers at USC and TeraGrid High Performance Computing Centers. Data generated and archived by the SCEC/CME is stored in a digital library system, the Storage Resource Broker (SRB). This system provides a robust and secure system for maintaining the association between the data seta and their metadata. To provide an easy-to-use system for constructing SHA computations, a browser-based workflow assembly web portal has been developed. Users can compose complex SHA calculations, specifying SCEC/CME data sets as inputs to calculations, and calling SCEC/CME computational programs to process the data and the output. Knowledge-based software tools have been implemented that utilize ontological descriptions of SHA software and data can validate workflows created with this pathway assembly tool. Data visualization software developed by the collaboration supports analysis and validation of data sets. Several programs have been developed to visualize SCEC/CME data including GMT-based map making software for PSHA codes, 4D wavefield propagation visualization software based on OpenGL, and 3D Geowall-based visualization of earthquakes, faults, and seismic wave propagation. The SCEC/CME Project also helps to sponsor the SCEC UseIT Intern program. The UseIT Intern Program provides research opportunities in both Geosciences and Information Technology to undergraduate students in a variety of fields. The UseIT group has developed a 3D data visualization tool, called SCEC-VDO, as a part of this undergraduate research program.

  1. Geocomputation over Hybrid Computer Architecture and Systems: Prior Works and On-going Initiatives at UARK

    NASA Astrophysics Data System (ADS)

    Shi, X.

    2015-12-01

    As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.

  2. Resilient and Robust High Performance Computing Platforms for Scientific Computing Integrity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, Yier

    As technology advances, computer systems are subject to increasingly sophisticated cyber-attacks that compromise both their security and integrity. High performance computing platforms used in commercial and scientific applications involving sensitive, or even classified data, are frequently targeted by powerful adversaries. This situation is made worse by a lack of fundamental security solutions that both perform efficiently and are effective at preventing threats. Current security solutions fail to address the threat landscape and ensure the integrity of sensitive data. As challenges rise, both private and public sectors will require robust technologies to protect its computing infrastructure. The research outcomes from thismore » project try to address all these challenges. For example, we present LAZARUS, a novel technique to harden kernel Address Space Layout Randomization (KASLR) against paging-based side-channel attacks. In particular, our scheme allows for fine-grained protection of the virtual memory mappings that implement the randomization. We demonstrate the effectiveness of our approach by hardening a recent Linux kernel with LAZARUS, mitigating all of the previously presented side-channel attacks on KASLR. Our extensive evaluation shows that LAZARUS incurs only 0.943% overhead for standard benchmarks, and is therefore highly practical. We also introduced HA2lloc, a hardware-assisted allocator that is capable of leveraging an extended memory management unit to detect memory errors in the heap. We also perform testing using HA2lloc in a simulation environment and find that the approach is capable of preventing common memory vulnerabilities.« less

  3. Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

    PubMed Central

    Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438

  4. Experience of public procurement of Open Compute servers

    NASA Astrophysics Data System (ADS)

    Bärring, Olof; Guerri, Marco; Bonfillou, Eric; Valsan, Liviu; Grigore, Alexandru; Dore, Vincent; Gentit, Alain; Clement, Benoît; Grossir, Anthony

    2015-12-01

    The Open Compute Project. OCP (http://www.opencompute.org/). was launched by Facebook in 2011 with the objective of building efficient computing infrastructures at the lowest possible cost. The technologies are released as open hardware. with the goal to develop servers and data centres following the model traditionally associated with open source software projects. In 2013 CERN acquired a few OCP servers in order to compare performance and power consumption with standard hardware. The conclusions were that there are sufficient savings to motivate an attempt to procure a large scale installation. One objective is to evaluate if the OCP market is sufficiently mature and broad enough to meet the constraints of a public procurement. This paper summarizes this procurement. which started in September 2014 and involved the Request for information (RFI) to qualify bidders and Request for Tender (RFT).

  5. The Gain of Resource Delegation in Distributed Computing Environments

    NASA Astrophysics Data System (ADS)

    Fölling, Alexander; Grimme, Christian; Lepping, Joachim; Papaspyrou, Alexander

    In this paper, we address job scheduling in Distributed Computing Infrastructures, that is a loosely coupled network of autonomous acting High Performance Computing systems. In contrast to the common approach of mutual workload exchange, we consider the more intuitive operator's viewpoint of load-dependent resource reconfiguration. In case of a site's over-utilization, the scheduling system is able to lease resources from other sites to keep up service quality for its local user community. Contrary, the granting of idle resources can increase utilization in times of low local workload and thus ensure higher efficiency. The evaluation considers real workload data and is done with respect to common service quality indicators. For two simple resource exchange policies and three basic setups we show the possible gain of this approach and analyze the dynamics in workload-adaptive reconfiguration behavior.

  6. Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators

    PubMed Central

    2017-01-01

    In a 2016 survey of 704 National Science Foundation (NSF) Biological Sciences Directorate principal investigators (BIO PIs), nearly 90% indicated they are currently or will soon be analyzing large data sets. BIO PIs considered a range of computational needs important to their work, including high performance computing (HPC), bioinformatics support, multistep workflows, updated analysis software, and the ability to store, share, and publish data. Previous studies in the United States and Canada emphasized infrastructure needs. However, BIO PIs said the most pressing unmet needs are training in data integration, data management, and scaling analyses for HPC—acknowledging that data science skills will be required to build a deeper understanding of life. This portends a growing data knowledge gap in biology and challenges institutions and funding agencies to redouble their support for computational training in biology. PMID:29049281

  7. Forecasting techno-social systems: how physics and computing help to fight off global pandemics

    NASA Astrophysics Data System (ADS)

    Vespignani, Alessandro

    2010-03-01

    The crucial issue when planning for adequate public health interventions to mitigate the spread and impact of epidemics is risk evaluation and forecast. This amount to the anticipation of where, when and how strong the epidemic will strike. In the last decade advances in performance in computer technology, data acquisition, statistical physics and complex networks theory allow the generation of sophisticated simulations on supercomputer infrastructures to anticipate the spreading pattern of a pandemic. For the first time we are in the position of generating real time forecast of epidemic spreading. I will review the history of the current H1N1 pandemic, the major road-blocks the community has faced in its containment and mitigation and how physics and computing provide predictive tools that help us to battle epidemics.

  8. Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators.

    PubMed

    Barone, Lindsay; Williams, Jason; Micklos, David

    2017-10-01

    In a 2016 survey of 704 National Science Foundation (NSF) Biological Sciences Directorate principal investigators (BIO PIs), nearly 90% indicated they are currently or will soon be analyzing large data sets. BIO PIs considered a range of computational needs important to their work, including high performance computing (HPC), bioinformatics support, multistep workflows, updated analysis software, and the ability to store, share, and publish data. Previous studies in the United States and Canada emphasized infrastructure needs. However, BIO PIs said the most pressing unmet needs are training in data integration, data management, and scaling analyses for HPC-acknowledging that data science skills will be required to build a deeper understanding of life. This portends a growing data knowledge gap in biology and challenges institutions and funding agencies to redouble their support for computational training in biology.

  9. Integration and Exposure of Large Scale Computational Resources Across the Earth System Grid Federation (ESGF)

    NASA Astrophysics Data System (ADS)

    Duffy, D.; Maxwell, T. P.; Doutriaux, C.; Williams, D. N.; Chaudhary, A.; Ames, S.

    2015-12-01

    As the size of remote sensing observations and model output data grows, the volume of the data has become overwhelming, even to many scientific experts. As societies are forced to better understand, mitigate, and adapt to climate changes, the combination of Earth observation data and global climate model projects is crucial to not only scientists but to policy makers, downstream applications, and even the public. Scientific progress on understanding climate is critically dependent on the availability of a reliable infrastructure that promotes data access, management, and provenance. The Earth System Grid Federation (ESGF) has created such an environment for the Intergovernmental Panel on Climate Change (IPCC). ESGF provides a federated global cyber infrastructure for data access and management of model outputs generated for the IPCC Assessment Reports (AR). The current generation of the ESGF federated grid allows consumers of the data to find and download data with limited capabilities for server-side processing. Since the amount of data for future AR is expected to grow dramatically, ESGF is working on integrating server-side analytics throughout the federation. The ESGF Compute Working Team (CWT) has created a Web Processing Service (WPS) Application Programming Interface (API) to enable access scalable computational resources. The API is the exposure point to high performance computing resources across the federation. Specifically, the API allows users to execute simple operations, such as maximum, minimum, average, and anomalies, on ESGF data without having to download the data. These operations are executed at the ESGF data node site with access to large amounts of parallel computing capabilities. This presentation will highlight the WPS API, its capabilities, provide implementation details, and discuss future developments.

  10. Scaling to diversity: The DERECHOS distributed infrastructure for analyzing and sharing data

    NASA Astrophysics Data System (ADS)

    Rilee, M. L.; Kuo, K. S.; Clune, T.; Oloso, A.; Brown, P. G.

    2016-12-01

    Integrating Earth Science data from diverse sources such as satellite imagery and simulation output can be expensive and time-consuming, limiting scientific inquiry and the quality of our analyses. Reducing these costs will improve innovation and quality in science. The current Earth Science data infrastructure focuses on downloading data based on requests formed from the search and analysis of associated metadata. And while the data products provided by archives may use the best available data sharing technologies, scientist end-users generally do not have such resources (including staff) available to them. Furthermore, only once an end-user has received the data from multiple diverse sources and has integrated them can the actual analysis and synthesis begin. The cost of getting from idea to where synthesis can start dramatically slows progress. In this presentation we discuss a distributed computational and data storage framework that eliminates much of the aforementioned cost. The SciDB distributed array database is central as it is optimized for scientific computing involving very large arrays, performing better than less specialized frameworks like Spark. Adding spatiotemporal functions to the SciDB creates a powerful platform for analyzing and integrating massive, distributed datasets. SciDB allows Big Earth Data analysis to be performed "in place" without the need for expensive downloads and end-user resources. Spatiotemporal indexing technologies such as the hierarchical triangular mesh enable the compute and storage affinity needed to efficiently perform co-located and conditional analyses minimizing data transfers. These technologies automate the integration of diverse data sources using the framework, a critical step beyond current metadata search and analysis. Instead of downloading data into their idiosyncratic local environments, end-users can generate and share data products integrated from diverse multiple sources using a common shared environment, turning distributed active archive centers (DAACs) from warehouses into distributed active analysis centers.

  11. High Temporal Resolution Mapping of Seismic Noise Sources Using Heterogeneous Supercomputers

    NASA Astrophysics Data System (ADS)

    Paitz, P.; Gokhberg, A.; Ermert, L. A.; Fichtner, A.

    2017-12-01

    The time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems like earthquake fault zones, volcanoes, geothermal and hydrocarbon reservoirs. We present results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service providing seismic noise source maps for Central Europe with high temporal resolution. We use source imaging methods based on the cross-correlation of seismic noise records from all seismic stations available in the region of interest. The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept to provide the interested researchers worldwide with regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for the generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise source mapping itself rests on the measurement of logarithmic amplitude ratios in suitably pre-processed noise correlations, and the use of simplified sensitivity kernels. During the implementation we addressed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service-oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.

  12. High-performance parallel approaches for three-dimensional light detection and ranging point clouds gridding

    NASA Astrophysics Data System (ADS)

    Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon

    2017-01-01

    With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.

  13. Mediator infrastructure for information integration and semantic data integration environment for biomedical research.

    PubMed

    Grethe, Jeffrey S; Ross, Edward; Little, David; Sanders, Brian; Gupta, Amarnath; Astakhov, Vadim

    2009-01-01

    This paper presents current progress in the development of semantic data integration environment which is a part of the Biomedical Informatics Research Network (BIRN; http://www.nbirn.net) project. BIRN is sponsored by the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). A goal is the development of a cyberinfrastructure for biomedical research that supports advance data acquisition, data storage, data management, data integration, data mining, data visualization, and other computing and information processing services over the Internet. Each participating institution maintains storage of their experimental or computationally derived data. Mediator-based data integration system performs semantic integration over the databases to enable researchers to perform analyses based on larger and broader datasets than would be available from any single institution's data. This paper describes recent revision of the system architecture, implementation, and capabilities of the semantically based data integration environment for BIRN.

  14. Secure Enclaves: An Isolation-centric Approach for Creating Secure High Performance Computing Environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aderholdt, Ferrol; Caldwell, Blake A.; Hicks, Susan Elaine

    High performance computing environments are often used for a wide variety of workloads ranging from simulation, data transformation and analysis, and complex workflows to name just a few. These systems may process data at various security levels but in so doing are often enclaved at the highest security posture. This approach places significant restrictions on the users of the system even when processing data at a lower security level and exposes data at higher levels of confidentiality to a much broader population than otherwise necessary. The traditional approach of isolation, while effective in establishing security enclaves poses significant challenges formore » the use of shared infrastructure in HPC environments. This report details current state-of-the-art in virtualization, reconfigurable network enclaving via Software Defined Networking (SDN), and storage architectures and bridging techniques for creating secure enclaves in HPC environments.« less

  15. Fuzzy Logic Based Anomaly Detection for Embedded Network Security Cyber Sensor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ondrej Linda; Todd Vollmer; Jason Wright

    Resiliency and security in critical infrastructure control systems in the modern world of cyber terrorism constitute a relevant concern. Developing a network security system specifically tailored to the requirements of such critical assets is of a primary importance. This paper proposes a novel learning algorithm for anomaly based network security cyber sensor together with its hardware implementation. The presented learning algorithm constructs a fuzzy logic rule based model of normal network behavior. Individual fuzzy rules are extracted directly from the stream of incoming packets using an online clustering algorithm. This learning algorithm was specifically developed to comply with the constrainedmore » computational requirements of low-cost embedded network security cyber sensors. The performance of the system was evaluated on a set of network data recorded from an experimental test-bed mimicking the environment of a critical infrastructure control system.« less

  16. XRootD popularity on hadoop clusters

    NASA Astrophysics Data System (ADS)

    Meoni, Marco; Boccali, Tommaso; Magini, Nicolò; Menichetti, Luca; Giordano, Domenico; CMS Collaboration

    2017-10-01

    Performance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the efficiency of such monitoring. A large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - is being injected into a readily available Hadoop cluster, via several data streamers. The collected metadata is further organized running fast arbitrary queries; this offers the ability to test several Map&Reduce-based frameworks and measure the system speed-up when compared to the original database infrastructure. By leveraging a quality Hadoop data store and enabling an analytics framework on top, it is possible to design a mining platform to predict dataset popularity and discover patterns and correlations.

  17. Investigations into Gravitational Wave Emission from Compact Body Inspiral Into Massive Black Holes

    NASA Technical Reports Server (NTRS)

    Hughes, Scott A.

    2004-01-01

    Much of the grant's support (and associated time) was used in developmental activity, building infrastructure for the core of the work that the grant supports. Though infrastructure development was the bulk of the activity supported this year, important progress was made in research as well. The two most important "infrastructure" items were in computing hardware and personnel. Research activities were primarily focused on improving and extending. Hughes' Teukolsky-equation-based gravitational-wave generator. Several improvements have been incorporated into this generator.

  18. X-ray-induced acoustic computed tomography of concrete infrastructure

    NASA Astrophysics Data System (ADS)

    Tang, Shanshan; Ramseyer, Chris; Samant, Pratik; Xiang, Liangzhong

    2018-02-01

    X-ray-induced Acoustic Computed Tomography (XACT) takes advantage of both X-ray absorption contrast and high ultrasonic resolution in a single imaging modality by making use of the thermoacoustic effect. In XACT, X-ray absorption by defects and other structures in concrete create thermally induced pressure jumps that launch ultrasonic waves, which are then received by acoustic detectors to form images. In this research, XACT imaging was used to non-destructively test and identify defects in concrete. For concrete structures, we conclude that XACT imaging allows multiscale imaging at depths ranging from centimeters to meters, with spatial resolutions from sub-millimeter to centimeters. XACT imaging also holds promise for single-side testing of concrete infrastructure and provides an optimal solution for nondestructive inspection of existing bridges, pavement, nuclear power plants, and other concrete infrastructure.

  19. Models of Educational Computing @ Home: New Frontiers for Research on Technology in Learning.

    ERIC Educational Resources Information Center

    Kafai, Yasmin B.; Fishman, Barry J.; Bruckman, Amy S.; Rockman, Saul

    2002-01-01

    Discusses models of home educational computing that are linked to learning in school and recommends the need for research that addresses the home as a computer-based learning environment. Topics include a history of research on educational computing at home; technological infrastructure, including software and compatibility; Internet access;…

  20. !CHAOS: A cloud of controls

    NASA Astrophysics Data System (ADS)

    Angius, S.; Bisegni, C.; Ciuffetti, P.; Di Pirro, G.; Foggetta, L. G.; Galletti, F.; Gargana, R.; Gioscio, E.; Maselli, D.; Mazzitelli, G.; Michelotti, A.; Orrù, R.; Pistoni, M.; Spagnoli, F.; Spigone, D.; Stecchi, A.; Tonto, T.; Tota, M. A.; Catani, L.; Di Giulio, C.; Salina, G.; Buzzi, P.; Checcucci, B.; Lubrano, P.; Piccini, M.; Fattibene, E.; Michelotto, M.; Cavallaro, S. R.; Diana, B. F.; Enrico, F.; Pulvirenti, S.

    2016-01-01

    The paper is aimed to present the !CHAOS open source project aimed to develop a prototype of a national private Cloud Computing infrastructure, devoted to accelerator control systems and large experiments of High Energy Physics (HEP). The !CHAOS project has been financed by MIUR (Italian Ministry of Research and Education) and aims to develop a new concept of control system and data acquisition framework by providing, with a high level of aaabstraction, all the services needed for controlling and managing a large scientific, or non-scientific, infrastructure. A beta version of the !CHAOS infrastructure will be released at the end of December 2015 and will run on private Cloud infrastructures based on OpenStack.

  1. DREAMS and IMAGE: A Model and Computer Implementation for Concurrent, Life-Cycle Design of Complex Systems

    NASA Technical Reports Server (NTRS)

    Hale, Mark A.; Craig, James I.; Mistree, Farrokh; Schrage, Daniel P.

    1995-01-01

    Computing architectures are being assembled that extend concurrent engineering practices by providing more efficient execution and collaboration on distributed, heterogeneous computing networks. Built on the successes of initial architectures, requirements for a next-generation design computing infrastructure can be developed. These requirements concentrate on those needed by a designer in decision-making processes from product conception to recycling and can be categorized in two areas: design process and design information management. A designer both designs and executes design processes throughout design time to achieve better product and process capabilities while expanding fewer resources. In order to accomplish this, information, or more appropriately design knowledge, needs to be adequately managed during product and process decomposition as well as recomposition. A foundation has been laid that captures these requirements in a design architecture called DREAMS (Developing Robust Engineering Analysis Models and Specifications). In addition, a computing infrastructure, called IMAGE (Intelligent Multidisciplinary Aircraft Generation Environment), is being developed that satisfies design requirements defined in DREAMS and incorporates enabling computational technologies.

  2. Integration of a neuroimaging processing pipeline into a pan-canadian computing grid

    NASA Astrophysics Data System (ADS)

    Lavoie-Courchesne, S.; Rioux, P.; Chouinard-Decorte, F.; Sherif, T.; Rousseau, M.-E.; Das, S.; Adalat, R.; Doyon, J.; Craddock, C.; Margulies, D.; Chu, C.; Lyttelton, O.; Evans, A. C.; Bellec, P.

    2012-02-01

    The ethos of the neuroimaging field is quickly moving towards the open sharing of resources, including both imaging databases and processing tools. As a neuroimaging database represents a large volume of datasets and as neuroimaging processing pipelines are composed of heterogeneous, computationally intensive tools, such open sharing raises specific computational challenges. This motivates the design of novel dedicated computing infrastructures. This paper describes an interface between PSOM, a code-oriented pipeline development framework, and CBRAIN, a web-oriented platform for grid computing. This interface was used to integrate a PSOM-compliant pipeline for preprocessing of structural and functional magnetic resonance imaging into CBRAIN. We further tested the capacity of our infrastructure to handle a real large-scale project. A neuroimaging database including close to 1000 subjects was preprocessed using our interface and publicly released to help the participants of the ADHD-200 international competition. This successful experiment demonstrated that our integrated grid-computing platform is a powerful solution for high-throughput pipeline analysis in the field of neuroimaging.

  3. GOES-R GS Product Generation Infrastructure Operations

    NASA Astrophysics Data System (ADS)

    Blanton, M.; Gundy, J.

    2012-12-01

    GOES-R GS Product Generation Infrastructure Operations: The GOES-R Ground System (GS) will produce a much larger set of products with higher data density than previous GOES systems. This requires considerably greater compute and memory resources to achieve the necessary latency and availability for these products. Over time, new algorithms could be added and existing ones removed or updated, but the GOES-R GS cannot go down during this time. To meet these GOES-R GS processing needs, the Harris Corporation will implement a Product Generation (PG) infrastructure that is scalable, extensible, extendable, modular and reliable. The primary parts of the PG infrastructure are the Service Based Architecture (SBA), which includes the Distributed Data Fabric (DDF). The SBA is the middleware that encapsulates and manages science algorithms that generate products. The SBA is divided into three parts, the Executive, which manages and configures the algorithm as a service, the Dispatcher, which provides data to the algorithm, and the Strategy, which determines when the algorithm can execute with the available data. The SBA is a distributed architecture, with services connected to each other over a compute grid and is highly scalable. This plug-and-play architecture allows algorithms to be added, removed, or updated without affecting any other services or software currently running and producing data. Algorithms require product data from other algorithms, so a scalable and reliable messaging is necessary. The SBA uses the DDF to provide this data communication layer between algorithms. The DDF provides an abstract interface over a distributed and persistent multi-layered storage system (memory based caching above disk-based storage) and an event system that allows algorithm services to know when data is available and to get the data that they need to begin processing when they need it. Together, the SBA and the DDF provide a flexible, high performance architecture that can meet the needs of product processing now and as they grow in the future.

  4. Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure

    NASA Astrophysics Data System (ADS)

    Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei

    2011-09-01

    Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed. This work was presented in part at the 2010 Annual Meeting of the American Association of Physicists in Medicine (AAPM), Philadelphia, PA.

  5. Bioinformatics clouds for big data manipulation

    PubMed Central

    2012-01-01

    Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475

  6. Blast2GO goes grid: developing a grid-enabled prototype for functional genomics analysis.

    PubMed

    Aparicio, G; Götz, S; Conesa, A; Segrelles, D; Blanquer, I; García, J M; Hernandez, V; Robles, M; Talon, M

    2006-01-01

    The vast amount in complexity of data generated in Genomic Research implies that new dedicated and powerful computational tools need to be developed to meet their analysis requirements. Blast2GO (B2G) is a bioinformatics tool for Gene Ontology-based DNA or protein sequence annotation and function-based data mining. The application has been developed with the aim of affering an easy-to-use tool for functional genomics research. Typical B2G users are middle size genomics labs carrying out sequencing, ETS and microarray projects, handling datasets up to several thousand sequences. In the current version of B2G. The power and analytical potential of both annotation and function data-mining is somehow restricted to the computational power behind each particular installation. In order to be able to offer the possibility of an enhanced computational capacity within this bioinformatics application, a Grid component is being developed. A prototype has been conceived for the particular problem of speeding up the Blast searches to obtain fast results for large datasets. Many efforts have been done in the literature concerning the speeding up of Blast searches, but few of them deal with the use of large heterogeneous production Grid Infrastructures. These are the infrastructures that could reach the largest number of resources and the best load balancing for data access. The Grid Service under development will analyse requests based on the number of sequences, splitting them accordingly to the available resources. Lower-level computation will be performed through MPIBLAST. The software architecture is based on the WSRF standard.

  7. Rapid prototyping of soil moisture estimates using the NASA Land Information System

    NASA Astrophysics Data System (ADS)

    Anantharaj, V.; Mostovoy, G.; Li, B.; Peters-Lidard, C.; Houser, P.; Moorhead, R.; Kumar, S.

    2007-12-01

    The Land Information System (LIS), developed at the NASA Goddard Space Flight Center, is a functional Land Data Assimilation System (LDAS) that incorporates a suite of land models in an interoperable computational framework. LIS has been integrated into a computational Rapid Prototyping Capabilities (RPC) infrastructure. LIS consists of a core, a number of community land models, data servers, and visualization systems - integrated in a high-performance computing environment. The land surface models (LSM) in LIS incorporate surface and atmospheric parameters of temperature, snow/water, vegetation, albedo, soil conditions, topography, and radiation. Many of these parameters are available from in-situ observations, numerical model analysis, and from NASA, NOAA, and other remote sensing satellite platforms at various spatial and temporal resolutions. The computational resources, available to LIS via the RPC infrastructure, support e- Science experiments involving the global modeling of land-atmosphere studies at 1km spatial resolutions as well as regional studies at finer resolutions. The Noah Land Surface Model, available with-in the LIS is being used to rapidly prototype soil moisture estimates in order to evaluate the viability of other science applications for decision making purposes. For example, LIS has been used to further extend the utility of the USDA Soil Climate Analysis Network of in-situ soil moisture observations. In addition, LIS also supports data assimilation capabilities that are used to assimilate remotely sensed soil moisture retrievals from the AMSR-E instrument onboard the Aqua satellite. The rapid prototyping of soil moisture estimates using LIS and their applications will be illustrated during the presentation.

  8. Three-Dimensional Space to Assess Cloud Interoperability

    DTIC Science & Technology

    2013-03-01

    12 1. Portability and Mobility ...collection of network-enabled services that guarantees to provide a scalable, easy accessible, reliable, and personalized computing infrastructure , based on...are used in research to describe cloud models, such as SaaS (Software as a Service), PaaS (Platform as a service), IaaS ( Infrastructure as a Service

  9. Branch Campus Librarianship with Minimal Infrastructure: Rewards and Challenges

    ERIC Educational Resources Information Center

    Knickman, Elena; Walton, Kerry

    2014-01-01

    Delaware County Community College provides library services to its branch campus community members by stationing a librarian at a campus 5 to 20 hours each week, without any more library infrastructure than an Internet-enabled computer on the school network. Faculty and students have reacted favorably to the increased presence of librarians.…

  10. Quantifying Security Threats and Their Impact

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aissa, Anis Ben; Abercrombie, Robert K; Sheldon, Frederick T

    In earlier works, we present a computational infrastructure that allows an analyst to estimate the security of a system in terms of the loss that each stakeholder stands to sustain as a result of security breakdowns. In this paper we illustrate this infrastructure by means of a sample example involving an e-commerce application.

  11. Quantifying Security Threats and Their Potential Impacts: A Case Study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aissa, Anis Ben; Abercrombie, Robert K; Sheldon, Frederick T

    In earlier works, we present a computational infrastructure that allows an analyst to estimate the security of a system in terms of the loss that each stakeholder stands to sustain as a result of security breakdowns. In this paper, we illustrate this infrastructure by means of an e-commerce application.

  12. WLCG scale testing during CMS data challenges

    NASA Astrophysics Data System (ADS)

    Gutsche, O.; Hajdu, C.

    2008-07-01

    The CMS computing model to process and analyze LHC collision data follows a data-location driven approach and is using the WLCG infrastructure to provide access to GRID resources. As a preparation for data taking, CMS tests its computing model during dedicated data challenges. An important part of the challenges is the test of the user analysis which poses a special challenge for the infrastructure with its random distributed access patterns. The CMS Remote Analysis Builder (CRAB) handles all interactions with the WLCG infrastructure transparently for the user. During the 2006 challenge, CMS set its goal to test the infrastructure at a scale of 50,000 user jobs per day using CRAB. Both direct submissions by individual users and automated submissions by robots were used to achieve this goal. A report will be given about the outcome of the user analysis part of the challenge using both the EGEE and OSG parts of the WLCG. In particular, the difference in submission between both GRID middlewares (resource broker vs. direct submission) will be discussed. In the end, an outlook for the 2007 data challenge is given.

  13. Defense strategies for cloud computing multi-site server infrastructures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rao, Nageswara S.; Ma, Chris Y. T.; He, Fei

    We consider cloud computing server infrastructures for big data applications, which consist of multiple server sites connected over a wide-area network. The sites house a number of servers, network elements and local-area connections, and the wide-area network plays a critical, asymmetric role of providing vital connectivity between them. We model this infrastructure as a system of systems, wherein the sites and wide-area network are represented by their cyber and physical components. These components can be disabled by cyber and physical attacks, and also can be protected against them using component reinforcements. The effects of attacks propagate within the systems, andmore » also beyond them via the wide-area network.We characterize these effects using correlations at two levels using: (a) aggregate failure correlation function that specifies the infrastructure failure probability given the failure of an individual site or network, and (b) first-order differential conditions on system survival probabilities that characterize the component-level correlations within individual systems. We formulate a game between an attacker and a provider using utility functions composed of survival probability and cost terms. At Nash Equilibrium, we derive expressions for the expected capacity of the infrastructure given by the number of operational servers connected to the network for sum-form, product-form and composite utility functions.« less

  14. Storing and Using Health Data in a Virtual Private Cloud

    PubMed Central

    Regola, Nathan

    2013-01-01

    Electronic health records are being adopted at a rapid rate due to increased funding from the US federal government. Health data provide the opportunity to identify possible improvements in health care delivery by applying data mining and statistical methods to the data and will also enable a wide variety of new applications that will be meaningful to patients and medical professionals. Researchers are often granted access to health care data to assist in the data mining process, but HIPAA regulations mandate comprehensive safeguards to protect the data. Often universities (and presumably other research organizations) have an enterprise information technology infrastructure and a research infrastructure. Unfortunately, both of these infrastructures are generally not appropriate for sensitive research data such as HIPAA, as they require special accommodations on the part of the enterprise information technology (or increased security on the part of the research computing environment). Cloud computing, which is a concept that allows organizations to build complex infrastructures on leased resources, is rapidly evolving to the point that it is possible to build sophisticated network architectures with advanced security capabilities. We present a prototype infrastructure in Amazon’s Virtual Private Cloud to allow researchers and practitioners to utilize the data in a HIPAA-compliant environment. PMID:23485880

  15. Constructing Pairing-Friendly Elliptic Curves under Embedding Degree 1 for Securing Critical Infrastructures.

    PubMed

    Wang, Maocai; Dai, Guangming; Choo, Kim-Kwang Raymond; Jayaraman, Prem Prakash; Ranjan, Rajiv

    2016-01-01

    Information confidentiality is an essential requirement for cyber security in critical infrastructure. Identity-based cryptography, an increasingly popular branch of cryptography, is widely used to protect the information confidentiality in the critical infrastructure sector due to the ability to directly compute the user's public key based on the user's identity. However, computational requirements complicate the practical application of Identity-based cryptography. In order to improve the efficiency of identity-based cryptography, this paper presents an effective method to construct pairing-friendly elliptic curves with low hamming weight 4 under embedding degree 1. Based on the analysis of the Complex Multiplication(CM) method, the soundness of our method to calculate the characteristic of the finite field is proved. And then, three relative algorithms to construct pairing-friendly elliptic curve are put forward. 10 elliptic curves with low hamming weight 4 under 160 bits are presented to demonstrate the utility of our approach. Finally, the evaluation also indicates that it is more efficient to compute Tate pairing with our curves, than that of Bertoni et al.

  16. Constructing Pairing-Friendly Elliptic Curves under Embedding Degree 1 for Securing Critical Infrastructures

    PubMed Central

    Dai, Guangming

    2016-01-01

    Information confidentiality is an essential requirement for cyber security in critical infrastructure. Identity-based cryptography, an increasingly popular branch of cryptography, is widely used to protect the information confidentiality in the critical infrastructure sector due to the ability to directly compute the user’s public key based on the user’s identity. However, computational requirements complicate the practical application of Identity-based cryptography. In order to improve the efficiency of identity-based cryptography, this paper presents an effective method to construct pairing-friendly elliptic curves with low hamming weight 4 under embedding degree 1. Based on the analysis of the Complex Multiplication(CM) method, the soundness of our method to calculate the characteristic of the finite field is proved. And then, three relative algorithms to construct pairing-friendly elliptic curve are put forward. 10 elliptic curves with low hamming weight 4 under 160 bits are presented to demonstrate the utility of our approach. Finally, the evaluation also indicates that it is more efficient to compute Tate pairing with our curves, than that of Bertoni et al. PMID:27564373

  17. Changing the batch system in a Tier 1 computing center: why and how

    NASA Astrophysics Data System (ADS)

    Chierici, Andrea; Dal Pra, Stefano

    2014-06-01

    At the Italian Tierl Center at CNAF we are evaluating the possibility to change the current production batch system. This activity is motivated mainly because we are looking for a more flexible licensing model as well as to avoid vendor lock-in. We performed a technology tracking exercise and among many possible solutions we chose to evaluate Grid Engine as an alternative because its adoption is increasing in the HEPiX community and because it's supported by the EMI middleware that we currently use on our computing farm. Another INFN site evaluated Slurm and we will compare our results in order to understand pros and cons of the two solutions. We will present the results of our evaluation of Grid Engine, in order to understand if it can fit the requirements of a Tier 1 center, compared to the solution we adopted long ago. We performed a survey and a critical re-evaluation of our farming infrastructure: many production softwares (accounting and monitoring on top of all) rely on our current solution and changing it required us to write new wrappers and adapt the infrastructure to the new system. We believe the results of this investigation can be very useful to other Tier-ls and Tier-2s centers in a similar situation, where the effort of switching may appear too hard to stand. We will provide guidelines in order to understand how difficult this operation can be and how long the change may take.

  18. Large-Scale Data Collection Metadata Management at the National Computation Infrastructure

    NASA Astrophysics Data System (ADS)

    Wang, J.; Evans, B. J. K.; Bastrakova, I.; Ryder, G.; Martin, J.; Duursma, D.; Gohar, K.; Mackey, T.; Paget, M.; Siddeswara, G.

    2014-12-01

    Data Collection management has become an essential activity at the National Computation Infrastructure (NCI) in Australia. NCI's partners (CSIRO, Bureau of Meteorology, Australian National University, and Geoscience Australia), supported by the Australian Government and Research Data Storage Infrastructure (RDSI), have established a national data resource that is co-located with high-performance computing. This paper addresses the metadata management of these data assets over their lifetime. NCI manages 36 data collections (10+ PB) categorised as earth system sciences, climate and weather model data assets and products, earth and marine observations and products, geosciences, terrestrial ecosystem, water management and hydrology, astronomy, social science and biosciences. The data is largely sourced from NCI partners, the custodians of many of the national scientific records, and major research community organisations. The data is made available in a HPC and data-intensive environment - a ~56000 core supercomputer, virtual labs on a 3000 core cloud system, and data services. By assembling these large national assets, new opportunities have arisen to harmonise the data collections, making a powerful cross-disciplinary resource.To support the overall management, a Data Management Plan (DMP) has been developed to record the workflows, procedures, the key contacts and responsibilities. The DMP has fields that can be exported to the ISO19115 schema and to the collection level catalogue of GeoNetwork. The subset or file level metadata catalogues are linked with the collection level through parent-child relationship definition using UUID. A number of tools have been developed that support interactive metadata management, bulk loading of data, and support for computational workflows or data pipelines. NCI creates persistent identifiers for each of the assets. The data collection is tracked over its lifetime, and the recognition of the data providers, data owners, data generators and data aggregators are updated. A Digital Object Identifier is assigned using the Australian National Data Service (ANDS). Once the data has been quality assured, a DOI is minted and the metadata record updated. NCI's data citation policy establishes the relationship between research outcomes, data providers, and the data.

  19. The GENIUS Grid Portal and robot certificates: a new tool for e-Science

    PubMed Central

    Barbera, Roberto; Donvito, Giacinto; Falzone, Alberto; La Rocca, Giuseppe; Milanesi, Luciano; Maggi, Giorgio Pietro; Vicario, Saverio

    2009-01-01

    Background Grid technology is the computing model which allows users to share a wide pletora of distributed computational resources regardless of their geographical location. Up to now, the high security policy requested in order to access distributed computing resources has been a rather big limiting factor when trying to broaden the usage of Grids into a wide community of users. Grid security is indeed based on the Public Key Infrastructure (PKI) of X.509 certificates and the procedure to get and manage those certificates is unfortunately not straightforward. A first step to make Grids more appealing for new users has recently been achieved with the adoption of robot certificates. Methods Robot certificates have recently been introduced to perform automated tasks on Grids on behalf of users. They are extremely useful for instance to automate grid service monitoring, data processing production, distributed data collection systems. Basically these certificates can be used to identify a person responsible for an unattended service or process acting as client and/or server. Robot certificates can be installed on a smart card and used behind a portal by everyone interested in running the related applications in a Grid environment using a user-friendly graphic interface. In this work, the GENIUS Grid Portal, powered by EnginFrame, has been extended in order to support the new authentication based on the adoption of these robot certificates. Results The work carried out and reported in this manuscript is particularly relevant for all users who are not familiar with personal digital certificates and the technical aspects of the Grid Security Infrastructure (GSI). The valuable benefits introduced by robot certificates in e-Science can so be extended to users belonging to several scientific domains, providing an asset in raising Grid awareness to a wide number of potential users. Conclusion The adoption of Grid portals extended with robot certificates, can really contribute to creating transparent access to computational resources of Grid Infrastructures, enhancing the spread of this new paradigm in researchers' working life to address new global scientific challenges. The evaluated solution can of course be extended to other portals, applications and scientific communities. PMID:19534747

  20. The GENIUS Grid Portal and robot certificates: a new tool for e-Science.

    PubMed

    Barbera, Roberto; Donvito, Giacinto; Falzone, Alberto; La Rocca, Giuseppe; Milanesi, Luciano; Maggi, Giorgio Pietro; Vicario, Saverio

    2009-06-16

    Grid technology is the computing model which allows users to share a wide pletora of distributed computational resources regardless of their geographical location. Up to now, the high security policy requested in order to access distributed computing resources has been a rather big limiting factor when trying to broaden the usage of Grids into a wide community of users. Grid security is indeed based on the Public Key Infrastructure (PKI) of X.509 certificates and the procedure to get and manage those certificates is unfortunately not straightforward. A first step to make Grids more appealing for new users has recently been achieved with the adoption of robot certificates. Robot certificates have recently been introduced to perform automated tasks on Grids on behalf of users. They are extremely useful for instance to automate grid service monitoring, data processing production, distributed data collection systems. Basically these certificates can be used to identify a person responsible for an unattended service or process acting as client and/or server. Robot certificates can be installed on a smart card and used behind a portal by everyone interested in running the related applications in a Grid environment using a user-friendly graphic interface. In this work, the GENIUS Grid Portal, powered by EnginFrame, has been extended in order to support the new authentication based on the adoption of these robot certificates. The work carried out and reported in this manuscript is particularly relevant for all users who are not familiar with personal digital certificates and the technical aspects of the Grid Security Infrastructure (GSI). The valuable benefits introduced by robot certificates in e-Science can so be extended to users belonging to several scientific domains, providing an asset in raising Grid awareness to a wide number of potential users. The adoption of Grid portals extended with robot certificates, can really contribute to creating transparent access to computational resources of Grid Infrastructures, enhancing the spread of this new paradigm in researchers' working life to address new global scientific challenges. The evaluated solution can of course be extended to other portals, applications and scientific communities.

Top