high-performance computing infrastructure: Topics by Science.gov

Sample records for high-performance computing infrastructure

The High-Performance Computing and Communications program, the national information infrastructure and health care.

PubMed Central

Lindberg, D A; Humphreys, B L

1995-01-01

The High-Performance Computing and Communications (HPCC) program is a multiagency federal effort to advance the state of computing and communications and to provide the technologic platform on which the National Information Infrastructure (NII) can be built. The HPCC program supports the development of high-speed computers, high-speed telecommunications, related software and algorithms, education and training, and information infrastructure technology and applications. The vision of the NII is to extend access to high-performance computing and communications to virtually every U.S. citizen so that the technology can be used to improve the civil infrastructure, lifelong learning, energy management, health care, etc. Development of the NII will require resolution of complex economic and social issues, including information privacy. Health-related applications supported under the HPCC program and NII initiatives include connection of health care institutions to the Internet; enhanced access to gene sequence data; the "Visible Human" Project; and test-bed projects in telemedicine, electronic patient records, shared informatics tool development, and image systems. PMID:7614116
Performance measurement and modeling of component applications in a high performance computing environment : a case study.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Armstrong, Robert C.; Ray, Jaideep; Malony, A.

2003-11-01

We present a case study of performance measurement and modeling of a CCA (Common Component Architecture) component-based application in a high performance computing environment. We explore issues peculiar to component-based HPC applications and propose a performance measurement infrastructure for HPC based loosely on recent work done for Grid environments. A prototypical implementation of the infrastructure is used to collect data for a three components in a scientific application and construct performance models for two of them. Both computational and message-passing performance are addressed.
[Design and study of parallel computing environment of Monte Carlo simulation for particle therapy planning using a public cloud-computing infrastructure].

PubMed

Yokohama, Noriya

2013-07-01

This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.
Distributed Accounting on the Grid

NASA Technical Reports Server (NTRS)

Thigpen, William; Hacker, Thomas J.; McGinnis, Laura F.; Athey, Brian D.

2001-01-01

By the late 1990s, the Internet was adequately equipped to move vast amounts of data between HPC (High Performance Computing) systems, and efforts were initiated to link together the national infrastructure of high performance computational and data storage resources together into a general computational utility 'grid', analogous to the national electrical power grid infrastructure. The purpose of the Computational grid is to provide dependable, consistent, pervasive, and inexpensive access to computational resources for the computing community in the form of a computing utility. This paper presents a fully distributed view of Grid usage accounting and a methodology for allocating Grid computational resources for use on a Grid computing system.
Cooperative high-performance storage in the accelerated strategic computing initiative

NASA Technical Reports Server (NTRS)

Gary, Mark; Howard, Barry; Louis, Steve; Minuzzo, Kim; Seager, Mark

1996-01-01

The use and acceptance of new high-performance, parallel computing platforms will be impeded by the absence of an infrastructure capable of supporting orders-of-magnitude improvement in hierarchical storage and high-speed I/O (Input/Output). The distribution of these high-performance platforms and supporting infrastructures across a wide-area network further compounds this problem. We describe an architectural design and phased implementation plan for a distributed, Cooperative Storage Environment (CSE) to achieve the necessary performance, user transparency, site autonomy, communication, and security features needed to support the Accelerated Strategic Computing Initiative (ASCI). ASCI is a Department of Energy (DOE) program attempting to apply terascale platforms and Problem-Solving Environments (PSEs) toward real-world computational modeling and simulation problems. The ASCI mission must be carried out through a unified, multilaboratory effort, and will require highly secure, efficient access to vast amounts of data. The CSE provides a logically simple, geographically distributed, storage infrastructure of semi-autonomous cooperating sites to meet the strategic ASCI PSE goal of highperformance data storage and access at the user desktop.
RAPPORT: running scientific high-performance computing applications on the cloud.

PubMed

Cohen, Jeremy; Filippis, Ioannis; Woodbridge, Mark; Bauer, Daniela; Hong, Neil Chue; Jackson, Mike; Butcher, Sarah; Colling, David; Darlington, John; Fuchs, Brian; Harvey, Matt

2013-01-28

Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.
Defense of Cyber Infrastructures Against Cyber-Physical Attacks Using Game-Theoretic Models

DOE PAGES

Rao, Nageswara S. V.; Poole, Stephen W.; Ma, Chris Y. T.; ...

2015-04-06

The operation of cyber infrastructures relies on both cyber and physical components, which are subject to incidental and intentional degradations of different kinds. Within the context of network and computing infrastructures, we study the strategic interactions between an attacker and a defender using game-theoretic models that take into account both cyber and physical components. The attacker and defender optimize their individual utilities expressed as sums of cost and system terms. First, we consider a Boolean attack-defense model, wherein the cyber and physical sub-infrastructures may be attacked and reinforced as individual units. Second, we consider a component attack-defense model wherein theirmore » components may be attacked and defended, and the infrastructure requires minimum numbers of both to function. We show that the Nash equilibrium under uniform costs in both cases is computable in polynomial time, and it provides high-level deterministic conditions for the infrastructure survival. When probabilities of successful attack and defense, and of incidental failures are incorporated into the models, the results favor the attacker but otherwise remain qualitatively similar. This approach has been motivated and validated by our experiences with UltraScience Net infrastructure, which was built to support high-performance network experiments. In conclusion, the analytical results, however, are more general, and we apply them to simplified models of cloud and high-performance computing infrastructures.« less
Defense of Cyber Infrastructures Against Cyber-Physical Attacks Using Game-Theoretic Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rao, Nageswara S. V.; Poole, Stephen W.; Ma, Chris Y. T.

The operation of cyber infrastructures relies on both cyber and physical components, which are subject to incidental and intentional degradations of different kinds. Within the context of network and computing infrastructures, we study the strategic interactions between an attacker and a defender using game-theoretic models that take into account both cyber and physical components. The attacker and defender optimize their individual utilities expressed as sums of cost and system terms. First, we consider a Boolean attack-defense model, wherein the cyber and physical sub-infrastructures may be attacked and reinforced as individual units. Second, we consider a component attack-defense model wherein theirmore » components may be attacked and defended, and the infrastructure requires minimum numbers of both to function. We show that the Nash equilibrium under uniform costs in both cases is computable in polynomial time, and it provides high-level deterministic conditions for the infrastructure survival. When probabilities of successful attack and defense, and of incidental failures are incorporated into the models, the results favor the attacker but otherwise remain qualitatively similar. This approach has been motivated and validated by our experiences with UltraScience Net infrastructure, which was built to support high-performance network experiments. In conclusion, the analytical results, however, are more general, and we apply them to simplified models of cloud and high-performance computing infrastructures.« less
NCI's High Performance Computing (HPC) and High Performance Data (HPD) Computing Platform for Environmental and Earth System Data Science

NASA Astrophysics Data System (ADS)

Evans, Ben; Allen, Chris; Antony, Joseph; Bastrakova, Irina; Gohar, Kashif; Porter, David; Pugh, Tim; Santana, Fabiana; Smillie, Jon; Trenham, Claire; Wang, Jingbo; Wyborn, Lesley

2015-04-01

The National Computational Infrastructure (NCI) has established a powerful and flexible in-situ petascale computational environment to enable both high performance computing and Data-intensive Science across a wide spectrum of national environmental and earth science data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress so far to harmonise the underlying data collections for future interdisciplinary research across these large volume data collections. NCI has established 10+ PBytes of major national and international data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the major Australian national-scale scientific collections), leading research communities, and collaborating overseas organisations. New infrastructures created at NCI mean the data collections are now accessible within an integrated High Performance Computing and Data (HPC-HPD) environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large-scale high-bandwidth Lustre filesystems. The hardware was designed at inception to ensure that it would allow the layered software environment to flexibly accommodate the advancement of future data science. New approaches to software technology and data models have also had to be developed to enable access to these large and exponentially increasing data volumes at NCI. Traditional HPC and data environments are still made available in a way that flexibly provides the tools, services and supporting software systems on these new petascale infrastructures. But to enable the research to take place at this scale, the data, metadata and software now need to evolve together - creating a new integrated high performance infrastructure. The new infrastructure at NCI currently supports a catalogue of integrated, reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis. One of the challenges for NCI has been to support existing techniques and methods, while carefully preparing the underlying infrastructure for the transition needed for the next class of Data-intensive Science. In doing so, a flexible range of techniques and software can be made available for application across the corpus of data collections available, and to provide a new infrastructure for future interdisciplinary research.
Parallel high-performance grid computing: capabilities and opportunities of a novel demanding service and business class allowing highest resource efficiency.

PubMed

Kepper, Nick; Ettig, Ramona; Dickmann, Frank; Stehr, Rene; Grosveld, Frank G; Wedemann, Gero; Knoch, Tobias A

2010-01-01

Especially in the life-science and the health-care sectors the huge IT requirements are imminent due to the large and complex systems to be analysed and simulated. Grid infrastructures play here a rapidly increasing role for research, diagnostics, and treatment, since they provide the necessary large-scale resources efficiently. Whereas grids were first used for huge number crunching of trivially parallelizable problems, increasingly parallel high-performance computing is required. Here, we show for the prime example of molecular dynamic simulations how the presence of large grid clusters including very fast network interconnects within grid infrastructures allows now parallel high-performance grid computing efficiently and thus combines the benefits of dedicated super-computing centres and grid infrastructures. The demands for this service class are the highest since the user group has very heterogeneous requirements: i) two to many thousands of CPUs, ii) different memory architectures, iii) huge storage capabilities, and iv) fast communication via network interconnects, are all needed in different combinations and must be considered in a highly dedicated manner to reach highest performance efficiency. Beyond, advanced and dedicated i) interaction with users, ii) the management of jobs, iii) accounting, and iv) billing, not only combines classic with parallel high-performance grid usage, but more importantly is also able to increase the efficiency of IT resource providers. Consequently, the mere "yes-we-can" becomes a huge opportunity like e.g. the life-science and health-care sectors as well as grid infrastructures by reaching higher level of resource efficiency.
Defense of Cyber Infrastructures Against Cyber-Physical Attacks Using Game-Theoretic Models.

PubMed

Rao, Nageswara S V; Poole, Stephen W; Ma, Chris Y T; He, Fei; Zhuang, Jun; Yau, David K Y

2016-04-01

The operation of cyber infrastructures relies on both cyber and physical components, which are subject to incidental and intentional degradations of different kinds. Within the context of network and computing infrastructures, we study the strategic interactions between an attacker and a defender using game-theoretic models that take into account both cyber and physical components. The attacker and defender optimize their individual utilities, expressed as sums of cost and system terms. First, we consider a Boolean attack-defense model, wherein the cyber and physical subinfrastructures may be attacked and reinforced as individual units. Second, we consider a component attack-defense model wherein their components may be attacked and defended, and the infrastructure requires minimum numbers of both to function. We show that the Nash equilibrium under uniform costs in both cases is computable in polynomial time, and it provides high-level deterministic conditions for the infrastructure survival. When probabilities of successful attack and defense, and of incidental failures, are incorporated into the models, the results favor the attacker but otherwise remain qualitatively similar. This approach has been motivated and validated by our experiences with UltraScience Net infrastructure, which was built to support high-performance network experiments. The analytical results, however, are more general, and we apply them to simplified models of cloud and high-performance computing infrastructures. © 2015 Society for Risk Analysis.
Exploring Cloud Computing for Large-scale Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Guang; Han, Binh; Yin, Jian

This paper explores cloud computing for large-scale data-intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address thesemore » challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.« less
NASA's Participation in the National Computational Grid

NASA Technical Reports Server (NTRS)

Feiereisen, William J.; Zornetzer, Steve F. (Technical Monitor)

1998-01-01

Over the last several years it has become evident that the character of NASA's supercomputing needs has changed. One of the major missions of the agency is to support the design and manufacture of aero- and space-vehicles with technologies that will significantly reduce their cost. It is becoming clear that improvements in the process of aerospace design and manufacturing will require a high performance information infrastructure that allows geographically dispersed teams to draw upon resources that are broader than traditional supercomputing. A computational grid draws together our information resources into one system. We can foresee the time when a Grid will allow engineers and scientists to use the tools of supercomputers, databases and on line experimental devices in a virtual environment to collaborate with distant colleagues. The concept of a computational grid has been spoken of for many years, but several events in recent times are conspiring to allow us to actually build one. In late 1997 the National Science Foundation initiated the Partnerships for Advanced Computational Infrastructure (PACI) which is built around the idea of distributed high performance computing. The Alliance lead, by the National Computational Science Alliance (NCSA), and the National Partnership for Advanced Computational Infrastructure (NPACI), lead by the San Diego Supercomputing Center, have been instrumental in drawing together the "Grid Community" to identify the technology bottlenecks and propose a research agenda to address them. During the same period NASA has begun to reformulate parts of two major high performance computing research programs to concentrate on distributed high performance computing and has banded together with the PACI centers to address the research agenda in common.
Business Models of High Performance Computing Centres in Higher Education in Europe

ERIC Educational Resources Information Center

Eurich, Markus; Calleja, Paul; Boutellier, Roman

2013-01-01

High performance computing (HPC) service centres are a vital part of the academic infrastructure of higher education organisations. However, despite their importance for research and the necessary high capital expenditures, business research on HPC service centres is mostly missing. From a business perspective, it is important to find an answer to…
Towards Portable Large-Scale Image Processing with High-Performance Computing.

PubMed

Huo, Yuankai; Blaber, Justin; Damon, Stephen M; Boyd, Brian D; Bao, Shunxing; Parvathaneni, Prasanna; Noguera, Camilo Bermudez; Chaganti, Shikha; Nath, Vishwesh; Greer, Jasmine M; Lyu, Ilwoo; French, William R; Newton, Allen T; Rogers, Baxter P; Landman, Bennett A

2018-05-03

High-throughput, large-scale medical image computing demands tight integration of high-performance computing (HPC) infrastructure for data storage, job distribution, and image processing. The Vanderbilt University Institute for Imaging Science (VUIIS) Center for Computational Imaging (CCI) has constructed a large-scale image storage and processing infrastructure that is composed of (1) a large-scale image database using the eXtensible Neuroimaging Archive Toolkit (XNAT), (2) a content-aware job scheduling platform using the Distributed Automation for XNAT pipeline automation tool (DAX), and (3) a wide variety of encapsulated image processing pipelines called "spiders." The VUIIS CCI medical image data storage and processing infrastructure have housed and processed nearly half-million medical image volumes with Vanderbilt Advanced Computing Center for Research and Education (ACCRE), which is the HPC facility at the Vanderbilt University. The initial deployment was natively deployed (i.e., direct installations on a bare-metal server) within the ACCRE hardware and software environments, which lead to issues of portability and sustainability. First, it could be laborious to deploy the entire VUIIS CCI medical image data storage and processing infrastructure to another HPC center with varying hardware infrastructure, library availability, and software permission policies. Second, the spiders were not developed in an isolated manner, which has led to software dependency issues during system upgrades or remote software installation. To address such issues, herein, we describe recent innovations using containerization techniques with XNAT/DAX which are used to isolate the VUIIS CCI medical image data storage and processing infrastructure from the underlying hardware and software environments. The newly presented XNAT/DAX solution has the following new features: (1) multi-level portability from system level to the application level, (2) flexible and dynamic software development and expansion, and (3) scalable spider deployment compatible with HPC clusters and local workstations.
SAME4HPC: A Promising Approach in Building a Scalable and Mobile Environment for High-Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karthik, Rajasekar

2014-01-01

In this paper, an architecture for building Scalable And Mobile Environment For High-Performance Computing with spatial capabilities called SAME4HPC is described using cutting-edge technologies and standards such as Node.js, HTML5, ECMAScript 6, and PostgreSQL 9.4. Mobile devices are increasingly becoming powerful enough to run high-performance apps. At the same time, there exist a significant number of low-end and older devices that rely heavily on the server or the cloud infrastructure to do the heavy lifting. Our architecture aims to support both of these types of devices to provide high-performance and rich user experience. A cloud infrastructure consisting of OpenStack withmore » Ubuntu, GeoServer, and high-performance JavaScript frameworks are some of the key open-source and industry standard practices that has been adopted in this architecture.« less
High-performance computing with quantum processing units

DOE PAGES

Britt, Keith A.; Oak Ridge National Lab.; Humble, Travis S.; ...

2017-03-01

The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors into modern high-performance computing (HPC) systems. We examine how QPUs can be integrated into current and future HPC system architectures by accounting for func- tional and physical design requirements. We identify two integration pathways that are differentiated by infrastructure constraints on the QPU and the use cases expected for the HPC system. This includes a tight integration that assumes infrastructure bottlenecks can be overcome as well asmore » a loose integration that as- sumes they cannot. We find that the performance of both approaches is likely to depend on the quantum interconnect that serves to entangle multiple QPUs. As a result, we also identify several challenges in assessing QPU performance for HPC, and we consider new metrics that capture the interplay between system architecture and the quantum parallelism underlying computational performance.« less
High-performance computing with quantum processing units

DOE Office of Scientific and Technical Information (OSTI.GOV)

Britt, Keith A.; Oak Ridge National Lab.; Humble, Travis S.

The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors into modern high-performance computing (HPC) systems. We examine how QPUs can be integrated into current and future HPC system architectures by accounting for func- tional and physical design requirements. We identify two integration pathways that are differentiated by infrastructure constraints on the QPU and the use cases expected for the HPC system. This includes a tight integration that assumes infrastructure bottlenecks can be overcome as well asmore » a loose integration that as- sumes they cannot. We find that the performance of both approaches is likely to depend on the quantum interconnect that serves to entangle multiple QPUs. As a result, we also identify several challenges in assessing QPU performance for HPC, and we consider new metrics that capture the interplay between system architecture and the quantum parallelism underlying computational performance.« less
High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis

PubMed Central

Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E.; Tkachenko, Valery; Torcivia-Rodriguez, John; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja

2016-01-01

The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure. The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu PMID:26989153
High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis.

PubMed

Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E; Tkachenko, Valery; Torcivia-Rodriguez, John; Voskanian, Alin; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja

2016-01-01

The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu. © The Author(s) 2016. Published by Oxford University Press.

Ubiquitous Green Computing Techniques for High Demand Applications in Smart Environments

PubMed Central

Zapater, Marina; Sanchez, Cesar; Ayala, Jose L.; Moya, Jose M.; Risco-Martín, José L.

2012-01-01

Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time. PMID:23112621
Ubiquitous green computing techniques for high demand applications in Smart environments.

PubMed

Zapater, Marina; Sanchez, Cesar; Ayala, Jose L; Moya, Jose M; Risco-Martín, José L

2012-01-01

Ubiquitous sensor network deployments, such as the ones found in Smart cities and Ambient intelligence applications, require constantly increasing high computational demands in order to process data and offer services to users. The nature of these applications imply the usage of data centers. Research has paid much attention to the energy consumption of the sensor nodes in WSNs infrastructures. However, supercomputing facilities are the ones presenting a higher economic and environmental impact due to their very high power consumption. The latter problem, however, has been disregarded in the field of smart environment services. This paper proposes an energy-minimization workload assignment technique, based on heterogeneity and application-awareness, that redistributes low-demand computational tasks from high-performance facilities to idle nodes with low and medium resources in the WSN infrastructure. These non-optimal allocation policies reduce the energy consumed by the whole infrastructure and the total execution time.
Using high-performance networks to enable computational aerosciences applications

NASA Technical Reports Server (NTRS)

Johnson, Marjory J.

1992-01-01

One component of the U.S. Federal High Performance Computing and Communications Program (HPCCP) is the establishment of a gigabit network to provide a communications infrastructure for researchers across the nation. This gigabit network will provide new services and capabilities, in addition to increased bandwidth, to enable future applications. An understanding of these applications is necessary to guide the development of the gigabit network and other high-performance networks of the future. In this paper we focus on computational aerosciences applications run remotely using the Numerical Aerodynamic Simulation (NAS) facility located at NASA Ames Research Center. We characterize these applications in terms of network-related parameters and relate user experiences that reveal limitations imposed by the current wide-area networking infrastructure. Then we investigate how the development of a nationwide gigabit network would enable users of the NAS facility to work in new, more productive ways.
Large-scale parallel genome assembler over cloud computing environment.

PubMed

Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

2017-06-01

The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences

USDA-ARS?s Scientific Manuscript database

The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identify management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning m...
A cyber infrastructure for the SKA Telescope Manager

NASA Astrophysics Data System (ADS)

Barbosa, Domingos; Barraca, João. P.; Carvalho, Bruno; Maia, Dalmiro; Gupta, Yashwant; Natarajan, Swaminathan; Le Roux, Gerhard; Swart, Paul

2016-07-01

The Square Kilometre Array Telescope Manager (SKA TM) will be responsible for assisting the SKA Operations and Observation Management, carrying out System diagnosis and collecting Monitoring and Control data from the SKA subsystems and components. To provide adequate compute resources, scalability, operation continuity and high availability, as well as strict Quality of Service, the TM cyber-infrastructure (embodied in the Local Infrastructure - LINFRA) consists of COTS hardware and infrastructural software (for example: server monitoring software, host operating system, virtualization software, device firmware), providing a specially tailored Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) solution. The TM infrastructure provides services in the form of computational power, software defined networking, power, storage abstractions, and high level, state of the art IaaS and PaaS management interfaces. This cyber platform will be tailored to each of the two SKA Phase 1 telescopes (SKA_MID in South Africa and SKA_LOW in Australia) instances, each presenting different computational and storage infrastructures and conditioned by location. This cyber platform will provide a compute model enabling TM to manage the deployment and execution of its multiple components (observation scheduler, proposal submission tools, MandC components, Forensic tools and several Databases, etc). In this sense, the TM LINFRA is primarily focused towards the provision of isolated instances, mostly resorting to virtualization technologies, while defaulting to bare hardware if specifically required due to performance, security, availability, or other requirement.
Security and Cloud Outsourcing Framework for Economic Dispatch

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi

The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less
Security and Cloud Outsourcing Framework for Economic Dispatch

DOE PAGES

Sarker, Mushfiqur R.; Wang, Jianhui; Li, Zuyi; ...

2017-04-24

The computational complexity and problem sizes of power grid applications have increased significantly with the advent of renewable resources and smart grid technologies. The current paradigm of solving these issues consist of inhouse high performance computing infrastructures, which have drawbacks of high capital expenditures, maintenance, and limited scalability. Cloud computing is an ideal alternative due to its powerful computational capacity, rapid scalability, and high cost-effectiveness. A major challenge, however, remains in that the highly confidential grid data is susceptible for potential cyberattacks when outsourced to the cloud. In this work, a security and cloud outsourcing framework is developed for themore » Economic Dispatch (ED) linear programming application. As a result, the security framework transforms the ED linear program into a confidentiality-preserving linear program, that masks both the data and problem structure, thus enabling secure outsourcing to the cloud. Results show that for large grid test cases the performance gain and costs outperforms the in-house infrastructure.« less
Scientific Services on the Cloud

NASA Astrophysics Data System (ADS)

Chapman, David; Joshi, Karuna P.; Yesha, Yelena; Halem, Milt; Yesha, Yaacov; Nguyen, Phuong

Scientific Computing was one of the first every applications for parallel and distributed computation. To this date, scientific applications remain some of the most compute intensive, and have inspired creation of petaflop compute infrastructure such as the Oak Ridge Jaguar and Los Alamos RoadRunner. Large dedicated hardware infrastructure has become both a blessing and a curse to the scientific community. Scientists are interested in cloud computing for much the same reason as businesses and other professionals. The hardware is provided, maintained, and administrated by a third party. Software abstraction and virtualization provide reliability, and fault tolerance. Graduated fees allow for multi-scale prototyping and execution. Cloud computing resources are only a few clicks away, and by far the easiest high performance distributed platform to gain access to. There may still be dedicated infrastructure for ultra-scale science, but the cloud can easily play a major part of the scientific computing initiative.
Monitoring performance of a highly distributed and complex computing infrastructure in LHCb

NASA Astrophysics Data System (ADS)

Mathe, Z.; Haen, C.; Stagni, F.

2017-10-01

In order to ensure an optimal performance of the LHCb Distributed Computing, based on LHCbDIRAC, it is necessary to be able to inspect the behavior over time of many components: firstly the agents and services on which the infrastructure is built, but also all the computing tasks and data transfers that are managed by this infrastructure. This consists of recording and then analyzing time series of a large number of observables, for which the usage of SQL relational databases is far from optimal. Therefore within DIRAC we have been studying novel possibilities based on NoSQL databases (ElasticSearch, OpenTSDB and InfluxDB) as a result of this study we developed a new monitoring system based on ElasticSearch. It has been deployed on the LHCb Distributed Computing infrastructure for which it collects data from all the components (agents, services, jobs) and allows creating reports through Kibana and a web user interface, which is based on the DIRAC web framework. In this paper we describe this new implementation of the DIRAC monitoring system. We give details on the ElasticSearch implementation within the DIRAC general framework, as well as an overview of the advantages of the pipeline aggregation used for creating a dynamic bucketing of the time series. We present the advantages of using the ElasticSearch DSL high-level library for creating and running queries. Finally we shall present the performances of that system.
Big data computing: Building a vision for ARS information management

USDA-ARS?s Scientific Manuscript database

Improvements are needed within the ARS to increase scientific capacity and keep pace with new developments in computer technologies that support data acquisition and analysis. Enhancements in computing power and IT infrastructure are needed to provide scientists better access to high performance com...
In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms, a State-of-the-art (STAR) Report

DOE PAGES

Bethel, EW; Bauer, A; Abbasi, H; ...

2016-06-10

The considerable interest in the high performance computing (HPC) community regarding analyzing and visualization data without first writing to disk, i.e., in situ processing, is due to several factors. First is an I/O cost savings, where data is analyzed /visualized while being generated, without first storing to a filesystem. Second is the potential for increased accuracy, where fine temporal sampling of transient analysis might expose some complex behavior missed in coarse temporal sampling. Third is the ability to use all available resources, CPU’s and accelerators, in the computation of analysis products. This STAR paper brings together researchers, developers and practitionersmore » using in situ methods in extreme-scale HPC with the goal to present existing methods, infrastructures, and a range of computational science and engineering applications using in situ analysis and visualization.« less
Analysis of Pervasive Mobile Ad Hoc Routing Protocols

NASA Astrophysics Data System (ADS)

Qadri, Nadia N.; Liotta, Antonio

Mobile ad hoc networks (MANETs) are a fundamental element of pervasive networks and therefore, of pervasive systems that truly support pervasive computing, where user can communicate anywhere, anytime and on-the-fly. In fact, future advances in pervasive computing rely on advancements in mobile communication, which includes both infrastructure-based wireless networks and non-infrastructure-based MANETs. MANETs introduce a new communication paradigm, which does not require a fixed infrastructure - they rely on wireless terminals for routing and transport services. Due to highly dynamic topology, absence of established infrastructure for centralized administration, bandwidth constrained wireless links, and limited resources in MANETs, it is challenging to design an efficient and reliable routing protocol. This chapter reviews the key studies carried out so far on the performance of mobile ad hoc routing protocols. We discuss performance issues and metrics required for the evaluation of ad hoc routing protocols. This leads to a survey of existing work, which captures the performance of ad hoc routing algorithms and their behaviour from different perspectives and highlights avenues for future research.
A Queue Simulation Tool for a High Performance Scientific Computing Center

NASA Technical Reports Server (NTRS)

Spear, Carrie; McGalliard, James

2007-01-01

The NASA Center for Computational Sciences (NCCS) at the Goddard Space Flight Center provides high performance highly parallel processors, mass storage, and supporting infrastructure to a community of computational Earth and space scientists. Long running (days) and highly parallel (hundreds of CPUs) jobs are common in the workload. NCCS management structures batch queues and allocates resources to optimize system use and prioritize workloads. NCCS technical staff use a locally developed discrete event simulation tool to model the impacts of evolving workloads, potential system upgrades, alternative queue structures and resource allocation policies.
A Grid Infrastructure for Supporting Space-based Science Operations

NASA Technical Reports Server (NTRS)

Bradford, Robert N.; Redman, Sandra H.; McNair, Ann R. (Technical Monitor)

2002-01-01

Emerging technologies for computational grid infrastructures have the potential for revolutionizing the way computers are used in all aspects of our lives. Computational grids are currently being implemented to provide a large-scale, dynamic, and secure research and engineering environments based on standards and next-generation reusable software, enabling greater science and engineering productivity through shared resources and distributed computing for less cost than traditional architectures. Combined with the emerging technologies of high-performance networks, grids provide researchers, scientists and engineers the first real opportunity for an effective distributed collaborative environment with access to resources such as computational and storage systems, instruments, and software tools and services for the most computationally challenging applications.
Galaxy CloudMan: delivering cloud compute clusters.

PubMed

Afgan, Enis; Baker, Dannon; Coraor, Nate; Chapman, Brad; Nekrutenko, Anton; Taylor, James

2010-12-21

Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is "cloud computing", which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate "as is" use by experimental biologists. We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.
GSDC: A Unique Data Center in Korea for HEP research

NASA Astrophysics Data System (ADS)

Ahn, Sang-Un

2017-04-01

Global Science experimental Data hub Center (GSDC) at Korea Institute of Science and Technology Information (KISTI) is a unique data center in South Korea established for promoting the fundamental research fields by supporting them with the expertise on Information and Communication Technology (ICT) and the infrastructure for High Performance Computing (HPC), High Throughput Computing (HTC) and Networking. GSDC has supported various research fields in South Korea dealing with the large scale of data, e.g. RENO experiment for neutrino research, LIGO experiment for gravitational wave detection, Genome sequencing project for bio-medical, and HEP experiments such as CDF at FNAL, Belle at KEK, and STAR at BNL. In particular, GSDC has run a Tier-1 center for ALICE experiment using the LHC at CERN since 2013. In this talk, we present the overview on computing infrastructure that GSDC runs for the research fields and we discuss on the data center infrastructure management system deployed at GSDC.
International Symposium on Grids and Clouds (ISGC) 2016

NASA Astrophysics Data System (ADS)

The International Symposium on Grids and Clouds (ISGC) 2016 will be held at Academia Sinica in Taipei, Taiwan from 13-18 March 2016, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC). The theme of ISGC 2016 focuses on“Ubiquitous e-infrastructures and Applications”. Contemporary research is impossible without a strong IT component - researchers rely on the existence of stable and widely available e-infrastructures and their higher level functions and properties. As a result of these expectations, e-Infrastructures are becoming ubiquitous, providing an environment that supports large scale collaborations that deal with global challenges as well as smaller and temporal research communities focusing on particular scientific problems. To support those diversified communities and their needs, the e-Infrastructures themselves are becoming more layered and multifaceted, supporting larger groups of applications. Following the call for the last year conference, ISGC 2016 continues its aim to bring together users and application developers with those responsible for the development and operation of multi-purpose ubiquitous e-Infrastructures. Topics of discussion include Physics (including HEP) and Engineering Applications, Biomedicine & Life Sciences Applications, Earth & Environmental Sciences & Biodiversity Applications, Humanities, Arts, and Social Sciences (HASS) Applications, Virtual Research Environment (including Middleware, tools, services, workflow, etc.), Data Management, Big Data, Networking & Security, Infrastructure & Operations, Infrastructure Clouds and Virtualisation, Interoperability, Business Models & Sustainability, Highly Distributed Computing Systems, and High Performance & Technical Computing (HPTC), etc.
Galaxy CloudMan: delivering cloud compute clusters

PubMed Central

2010-01-01

Background Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is “cloud computing”, which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate “as is” use by experimental biologists. Results We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon’s EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. Conclusions The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge. PMID:21210983
Grids, virtualization, and clouds at Fermilab

DOE PAGES

Timm, S.; Chadwick, K.; Garzoglio, G.; ...

2014-06-11

Fermilab supports a scientific program that includes experiments and scientists located across the globe. To better serve this community, in 2004, the (then) Computing Division undertook the strategy of placing all of the High Throughput Computing (HTC) resources in a Campus Grid known as FermiGrid, supported by common shared services. In 2007, the FermiGrid Services group deployed a service infrastructure that utilized Xen virtualization, LVS network routing and MySQL circular replication to deliver highly available services that offered significant performance, reliability and serviceability improvements. This deployment was further enhanced through the deployment of a distributed redundant network core architecture andmore » the physical distribution of the systems that host the virtual machines across multiple buildings on the Fermilab Campus. In 2010, building on the experience pioneered by FermiGrid in delivering production services in a virtual infrastructure, the Computing Sector commissioned the FermiCloud, General Physics Computing Facility and Virtual Services projects to serve as platforms for support of scientific computing (FermiCloud 6 GPCF) and core computing (Virtual Services). Lastly, this work will present the evolution of the Fermilab Campus Grid, Virtualization and Cloud Computing infrastructure together with plans for the future.« less

Grids, virtualization, and clouds at Fermilab

NASA Astrophysics Data System (ADS)

Timm, S.; Chadwick, K.; Garzoglio, G.; Noh, S.

2014-06-01

Fermilab supports a scientific program that includes experiments and scientists located across the globe. To better serve this community, in 2004, the (then) Computing Division undertook the strategy of placing all of the High Throughput Computing (HTC) resources in a Campus Grid known as FermiGrid, supported by common shared services. In 2007, the FermiGrid Services group deployed a service infrastructure that utilized Xen virtualization, LVS network routing and MySQL circular replication to deliver highly available services that offered significant performance, reliability and serviceability improvements. This deployment was further enhanced through the deployment of a distributed redundant network core architecture and the physical distribution of the systems that host the virtual machines across multiple buildings on the Fermilab Campus. In 2010, building on the experience pioneered by FermiGrid in delivering production services in a virtual infrastructure, the Computing Sector commissioned the FermiCloud, General Physics Computing Facility and Virtual Services projects to serve as platforms for support of scientific computing (FermiCloud 6 GPCF) and core computing (Virtual Services). This work will present the evolution of the Fermilab Campus Grid, Virtualization and Cloud Computing infrastructure together with plans for the future.
Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters

ERIC Educational Resources Information Center

Younge, Andrew J.

2016-01-01

With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for…
Autonomic Management of Application Workflows on Hybrid Computing Infrastructure

DOE PAGES

Kim, Hyunjoo; el-Khamra, Yaakoub; Rodero, Ivan; ...

2011-01-01

In this paper, we present a programming and runtime framework that enables the autonomic management of complex application workflows on hybrid computing infrastructures. The framework is designed to address system and application heterogeneity and dynamics to ensure that application objectives and constraints are satisfied. The need for such autonomic system and application management is becoming critical as computing infrastructures become increasingly heterogeneous, integrating different classes of resources from high-end HPC systems to commodity clusters and clouds. For example, the framework presented in this paper can be used to provision the appropriate mix of resources based on application requirements and constraints.more » The framework also monitors the system/application state and adapts the application and/or resources to respond to changing requirements or environment. To demonstrate the operation of the framework and to evaluate its ability, we employ a workflow used to characterize an oil reservoir executing on a hybrid infrastructure composed of TeraGrid nodes and Amazon EC2 instances of various types. Specifically, we show how different applications objectives such as acceleration, conservation and resilience can be effectively achieved while satisfying deadline and budget constraints, using an appropriate mix of dynamically provisioned resources. Our evaluations also demonstrate that public clouds can be used to complement and reinforce the scheduling and usage of traditional high performance computing infrastructure.« less
A Framework for Debugging Geoscience Projects in a High Performance Computing Environment

NASA Astrophysics Data System (ADS)

Baxter, C.; Matott, L.

2012-12-01

High performance computing (HPC) infrastructure has become ubiquitous in today's world with the emergence of commercial cloud computing and academic supercomputing centers. Teams of geoscientists, hydrologists and engineers can take advantage of this infrastructure to undertake large research projects - for example, linking one or more site-specific environmental models with soft computing algorithms, such as heuristic global search procedures, to perform parameter estimation and predictive uncertainty analysis, and/or design least-cost remediation systems. However, the size, complexity and distributed nature of these projects can make identifying failures in the associated numerical experiments using conventional ad-hoc approaches both time- consuming and ineffective. To address these problems a multi-tiered debugging framework has been developed. The framework allows for quickly isolating and remedying a number of potential experimental failures, including: failures in the HPC scheduler; bugs in the soft computing code; bugs in the modeling code; and permissions and access control errors. The utility of the framework is demonstrated via application to a series of over 200,000 numerical experiments involving a suite of 5 heuristic global search algorithms and 15 mathematical test functions serving as cheap analogues for the simulation-based optimization of pump-and-treat subsurface remediation systems.
Cloud computing applications for biomedical science: A perspective.

PubMed

Navale, Vivek; Bourne, Philip E

2018-06-01

Biomedical research has become a digital data-intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research.
Cloud computing applications for biomedical science: A perspective

PubMed Central

2018-01-01

Biomedical research has become a digital data–intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research. PMID:29902176
Model-as-a-service (MaaS) using the cloud service innovation platform (CSIP)

USDA-ARS?s Scientific Manuscript database

Cloud infrastructures for modelling activities such as data processing, performing environmental simulations, or conducting model calibrations/optimizations provide a cost effective alternative to traditional high performance computing approaches. Cloud-based modelling examples emerged into the more...
New project to support scientific collaboration electronically

NASA Astrophysics Data System (ADS)

Clauer, C. R.; Rasmussen, C. E.; Niciejewski, R. J.; Killeen, T. L.; Kelly, J. D.; Zambre, Y.; Rosenberg, T. J.; Stauning, P.; Friis-Christensen, E.; Mende, S. B.; Weymouth, T. E.; Prakash, A.; McDaniel, S. E.; Olson, G. M.; Finholt, T. A.; Atkins, D. E.

A new multidisciplinary effort is linking research in the upper atmospheric and space, computer, and behavioral sciences to develop a prototype electronic environment for conducting team science worldwide. A real-world electronic collaboration testbed has been established to support scientific work centered around the experimental operations being conducted with instruments from the Sondrestrom Upper Atmospheric Research Facility in Kangerlussuaq, Greenland. Such group computing environments will become an important component of the National Information Infrastructure initiative, which is envisioned as the high-performance communications infrastructure to support national scientific research.
Perm State University HPC-hardware and software services: capabilities for aircraft engine aeroacoustics problems solving

NASA Astrophysics Data System (ADS)

Demenev, A. G.

2018-02-01

The present work is devoted to analyze high-performance computing (HPC) infrastructure capabilities for aircraft engine aeroacoustics problems solving at Perm State University. We explore here the ability to develop new computational aeroacoustics methods/solvers for computer-aided engineering (CAE) systems to handle complicated industrial problems of engine noise prediction. Leading aircraft engine engineering company, including “UEC-Aviadvigatel” JSC (our industrial partners in Perm, Russia), require that methods/solvers to optimize geometry of aircraft engine for fan noise reduction. We analysed Perm State University HPC-hardware resources and software services to use efficiently. The performed results demonstrate that Perm State University HPC-infrastructure are mature enough to face out industrial-like problems of development CAE-system with HPC-method and CFD-solvers.
User-level framework for performance monitoring of HPC applications

NASA Astrophysics Data System (ADS)

Hristova, R.; Goranov, G.

2013-10-01

HP-SEE is an infrastructure that links the existing HPC facilities in South East Europe in a common infrastructure. The analysis of the performance monitoring of the High-Performance Computing (HPC) applications in the infrastructure can be useful for the end user as diagnostic for the overall performance of his applications. The existing monitoring tools for HP-SEE provide to the end user only aggregated information for all applications. Usually, the user does not have permissions to select only the relevant information for him and for his applications. In this article we present a framework for performance monitoring of the HPC applications in the HP-SEE infrastructure. The framework provides standardized performance metrics, which every user can use in order to monitor his applications. Furthermore as a part of the framework a program interface is developed. The interface allows the user to publish metrics data from his application and to read and analyze gathered information. Publishing and reading through the framework is possible only with grid certificate valid for the infrastructure. Therefore the user is authorized to access only the data for his applications.
Using Cloud Computing infrastructure with CloudBioLinux, CloudMan and Galaxy

PubMed Central

Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

2012-01-01

Cloud computing has revolutionized availability and access to computing and storage resources; making it possible to provision a large computational infrastructure with only a few clicks in a web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this protocol, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatics analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to setup the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command line interface, and the web-based Galaxy interface. PMID:22700313
Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy.

PubMed

Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

2012-06-01

Cloud computing has revolutionized availability and access to computing and storage resources, making it possible to provision a large computational infrastructure with only a few clicks in a Web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this unit, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy, into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to set up the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command-line interface, and the Web-based Galaxy interface.
Building a Community Infrastructure for Scalable On-Line Performance Analysis Tools around Open|Speedshop

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, Barton

2014-06-30

Peta-scale computing environments pose significant challenges for both system and application developers and addressing them required more than simply scaling up existing tera-scale solutions. Performance analysis tools play an important role in gaining this understanding, but previous monolithic tools with fixed feature sets have not sufficed. Instead, this project worked on the design, implementation, and evaluation of a general, flexible tool infrastructure supporting the construction of performance tools as “pipelines” of high-quality tool building blocks. These tool building blocks provide common performance tool functionality, and are designed for scalability, lightweight data acquisition and analysis, and interoperability. For this project, wemore » built on Open|SpeedShop, a modular and extensible open source performance analysis tool set. The design and implementation of such a general and reusable infrastructure targeted for petascale systems required us to address several challenging research issues. All components needed to be designed for scale, a task made more difficult by the need to provide general modules. The infrastructure needed to support online data aggregation to cope with the large amounts of performance and debugging data. We needed to be able to map any combination of tool components to each target architecture. And we needed to design interoperable tool APIs and workflows that were concrete enough to support the required functionality, yet provide the necessary flexibility to address a wide range of tools. A major result of this project is the ability to use this scalable infrastructure to quickly create tools that match with a machine architecture and a performance problem that needs to be understood. Another benefit is the ability for application engineers to use the highly scalable, interoperable version of Open|SpeedShop, which are reassembled from the tool building blocks into a flexible, multi-user interface set of tools. This set of tools targeted at Office of Science Leadership Class computer systems and selected Office of Science application codes. We describe the contributions made by the team at the University of Wisconsin. The project built on the efforts in Open|SpeedShop funded by DOE/NNSA and the DOE/NNSA Tri-Lab community, extended Open|Speedshop to the Office of Science Leadership Class Computing Facilities, and addressed new challenges found on these cutting edge systems. Work done under this project at Wisconsin can be divided into two categories, new algorithms and techniques for debugging, and foundation infrastructure work on our Dyninst binary analysis and instrumentation toolkits and MRNet scalability infrastructure.« less
Global information infrastructure.

PubMed

Lindberg, D A

1994-01-01

The High Performance Computing and Communications Program (HPCC) is a multiagency federal initiative under the leadership of the White House Office of Science and Technology Policy, established by the High Performance Computing Act of 1991. It has been assigned a critical role in supporting the international collaboration essential to science and to health care. Goals of the HPCC are to extend USA leadership in high performance computing and networking technologies; to improve technology transfer for economic competitiveness, education, and national security; and to provide a key part of the foundation for the National Information Infrastructure. The first component of the National Institutes of Health to participate in the HPCC, the National Library of Medicine (NLM), recently issued a solicitation for proposals to address a range of issues, from privacy to 'testbed' networks, 'virtual reality,' and more. These efforts will build upon the NLM's extensive outreach program and other initiatives, including the Unified Medical Language System (UMLS), MEDLARS, and Grateful Med. New Internet search tools are emerging, such as Gopher and 'Knowbots'. Medicine will succeed in developing future intelligent agents to assist in utilizing computer networks. Our ability to serve patients is so often restricted by lack of information and knowledge at the time and place of medical decision-making. The new technologies, properly employed, will also greatly enhance our ability to serve the patient.
Signal and image processing algorithm performance in a virtual and elastic computing environment

NASA Astrophysics Data System (ADS)

Bennett, Kelly W.; Robertson, James

2013-05-01

The U.S. Army Research Laboratory (ARL) supports the development of classification, detection, tracking, and localization algorithms using multiple sensing modalities including acoustic, seismic, E-field, magnetic field, PIR, and visual and IR imaging. Multimodal sensors collect large amounts of data in support of algorithm development. The resulting large amount of data, and their associated high-performance computing needs, increases and challenges existing computing infrastructures. Purchasing computer power as a commodity using a Cloud service offers low-cost, pay-as-you-go pricing models, scalability, and elasticity that may provide solutions to develop and optimize algorithms without having to procure additional hardware and resources. This paper provides a detailed look at using a commercial cloud service provider, such as Amazon Web Services (AWS), to develop and deploy simple signal and image processing algorithms in a cloud and run the algorithms on a large set of data archived in the ARL Multimodal Signatures Database (MMSDB). Analytical results will provide performance comparisons with existing infrastructure. A discussion on using cloud computing with government data will discuss best security practices that exist within cloud services, such as AWS.
Design and implementation of a reliable and cost-effective cloud computing infrastructure: the INFN Napoli experience

NASA Astrophysics Data System (ADS)

Capone, V.; Esposito, R.; Pardi, S.; Taurino, F.; Tortone, G.

2012-12-01

Over the last few years we have seen an increasing number of services and applications needed to manage and maintain cloud computing facilities. This is particularly true for computing in high energy physics, which often requires complex configurations and distributed infrastructures. In this scenario a cost effective rationalization and consolidation strategy is the key to success in terms of scalability and reliability. In this work we describe an IaaS (Infrastructure as a Service) cloud computing system, with high availability and redundancy features, which is currently in production at INFN-Naples and ATLAS Tier-2 data centre. The main goal we intended to achieve was a simplified method to manage our computing resources and deliver reliable user services, reusing existing hardware without incurring heavy costs. A combined usage of virtualization and clustering technologies allowed us to consolidate our services on a small number of physical machines, reducing electric power costs. As a result of our efforts we developed a complete solution for data and computing centres that can be easily replicated using commodity hardware. Our architecture consists of 2 main subsystems: a clustered storage solution, built on top of disk servers running GlusterFS file system, and a virtual machines execution environment. GlusterFS is a network file system able to perform parallel writes on multiple disk servers, providing this way live replication of data. High availability is also achieved via a network configuration using redundant switches and multiple paths between hypervisor hosts and disk servers. We also developed a set of management scripts to easily perform basic system administration tasks such as automatic deployment of new virtual machines, adaptive scheduling of virtual machines on hypervisor hosts, live migration and automated restart in case of hypervisor failures.
Jungle Computing: Distributed Supercomputing Beyond Clusters, Grids, and Clouds

NASA Astrophysics Data System (ADS)

Seinstra, Frank J.; Maassen, Jason; van Nieuwpoort, Rob V.; Drost, Niels; van Kessel, Timo; van Werkhoven, Ben; Urbani, Jacopo; Jacobs, Ceriel; Kielmann, Thilo; Bal, Henri E.

In recent years, the application of high-performance and distributed computing in scientific practice has become increasingly wide spread. Among the most widely available platforms to scientists are clusters, grids, and cloud systems. Such infrastructures currently are undergoing revolutionary change due to the integration of many-core technologies, providing orders-of-magnitude speed improvements for selected compute kernels. With high-performance and distributed computing systems thus becoming more heterogeneous and hierarchical, programming complexity is vastly increased. Further complexities arise because urgent desire for scalability and issues including data distribution, software heterogeneity, and ad hoc hardware availability commonly force scientists into simultaneous use of multiple platforms (e.g., clusters, grids, and clouds used concurrently). A true computing jungle.
The International Symposium on Grids and Clouds

NASA Astrophysics Data System (ADS)

The International Symposium on Grids and Clouds (ISGC) 2012 will be held at Academia Sinica in Taipei from 26 February to 2 March 2012, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC). 2012 is the decennium anniversary of the ISGC which over the last decade has tracked the convergence, collaboration and innovation of individual researchers across the Asia Pacific region to a coherent community. With the continuous support and dedication from the delegates, ISGC has provided the primary international distributed computing platform where distinguished researchers and collaboration partners from around the world share their knowledge and experiences. The last decade has seen the wide-scale emergence of e-Infrastructure as a critical asset for the modern e-Scientist. The emergence of large-scale research infrastructures and instruments that has produced a torrent of electronic data is forcing a generational change in the scientific process and the mechanisms used to analyse the resulting data deluge. No longer can the processing of these vast amounts of data and production of relevant scientific results be undertaken by a single scientist. Virtual Research Communities that span organisations around the world, through an integrated digital infrastructure that connects the trust and administrative domains of multiple resource providers, have become critical in supporting these analyses. Topics covered in ISGC 2012 include: High Energy Physics, Biomedicine & Life Sciences, Earth Science, Environmental Changes and Natural Disaster Mitigation, Humanities & Social Sciences, Operations & Management, Middleware & Interoperability, Security and Networking, Infrastructure Clouds & Virtualisation, Business Models & Sustainability, Data Management, Distributed Volunteer & Desktop Grid Computing, High Throughput Computing, and High Performance, Manycore & GPU Computing.
A scalable infrastructure for CMS data analysis based on OpenStack Cloud and Gluster file system

NASA Astrophysics Data System (ADS)

Toor, S.; Osmani, L.; Eerola, P.; Kraemer, O.; Lindén, T.; Tarkoma, S.; White, J.

2014-06-01

The challenge of providing a resilient and scalable computational and data management solution for massive scale research environments requires continuous exploration of new technologies and techniques. In this project the aim has been to design a scalable and resilient infrastructure for CERN HEP data analysis. The infrastructure is based on OpenStack components for structuring a private Cloud with the Gluster File System. We integrate the state-of-the-art Cloud technologies with the traditional Grid middleware infrastructure. Our test results show that the adopted approach provides a scalable and resilient solution for managing resources without compromising on performance and high availability.
Scalable Analysis Methods and In Situ Infrastructure for Extreme Scale Knowledge Discovery

DOE Office of Scientific and Technical Information (OSTI.GOV)

Duque, Earl P.N.; Whitlock, Brad J.

High performance computers have for many years been on a trajectory that gives them extraordinary compute power with the addition of more and more compute cores. At the same time, other system parameters such as the amount of memory per core and bandwidth to storage have remained constant or have barely increased. This creates an imbalance in the computer, giving it the ability to compute a lot of data that it cannot reasonably save out due to time and storage constraints. While technologies have been invented to mitigate this problem (burst buffers, etc.), software has been adapting to employ inmore » situ libraries which perform data analysis and visualization on simulation data while it is still resident in memory. This avoids the need to ever have to pay the costs of writing many terabytes of data files. Instead, in situ enables the creation of more concentrated data products such as statistics, plots, and data extracts, which are all far smaller than the full-sized volume data. With the increasing popularity of in situ, multiple in situ infrastructures have been created, each with its own mechanism for integrating with a simulation. To make it easier to instrument a simulation with multiple in situ infrastructures and include custom analysis algorithms, this project created the SENSEI framework.« less

Development of a SaaS application probe to the physical properties of the Earth's interior: An attempt at moving HPC to the cloud

NASA Astrophysics Data System (ADS)

Huang, Qian

2014-09-01

Scientific computing often requires the availability of a massive number of computers for performing large-scale simulations, and computing in mineral physics is no exception. In order to investigate physical properties of minerals at extreme conditions in computational mineral physics, parallel computing technology is used to speed up the performance by utilizing multiple computer resources to process a computational task simultaneously thereby greatly reducing computation time. Traditionally, parallel computing has been addressed by using High Performance Computing (HPC) solutions and installed facilities such as clusters and super computers. Today, it has been seen that there is a tremendous growth in cloud computing. Infrastructure as a Service (IaaS), the on-demand and pay-as-you-go model, creates a flexible and cost-effective mean to access computing resources. In this paper, a feasibility report of HPC on a cloud infrastructure is presented. It is found that current cloud services in IaaS layer still need to improve performance to be useful to research projects. On the other hand, Software as a Service (SaaS), another type of cloud computing, is introduced into an HPC system for computing in mineral physics, and an application of which is developed. In this paper, an overall description of this SaaS application is presented. This contribution can promote cloud application development in computational mineral physics, and cross-disciplinary studies.
Challenges and opportunities of cloud computing for atmospheric sciences

NASA Astrophysics Data System (ADS)

Pérez Montes, Diego A.; Añel, Juan A.; Pena, Tomás F.; Wallom, David C. H.

2016-04-01

Cloud computing is an emerging technological solution widely used in many fields. Initially developed as a flexible way of managing peak demand it has began to make its way in scientific research. One of the greatest advantages of cloud computing for scientific research is independence of having access to a large cyberinfrastructure to fund or perform a research project. Cloud computing can avoid maintenance expenses for large supercomputers and has the potential to 'democratize' the access to high-performance computing, giving flexibility to funding bodies for allocating budgets for the computational costs associated with a project. Two of the most challenging problems in atmospheric sciences are computational cost and uncertainty in meteorological forecasting and climate projections. Both problems are closely related. Usually uncertainty can be reduced with the availability of computational resources to better reproduce a phenomenon or to perform a larger number of experiments. Here we expose results of the application of cloud computing resources for climate modeling using cloud computing infrastructures of three major vendors and two climate models. We show how the cloud infrastructure compares in performance to traditional supercomputers and how it provides the capability to complete experiments in shorter periods of time. The monetary cost associated is also analyzed. Finally we discuss the future potential of this technology for meteorological and climatological applications, both from the point of view of operational use and research.
Resilient workflows for computational mechanics platforms

NASA Astrophysics Data System (ADS)

Nguyên, Toàn; Trifan, Laurentiu; Désidéri, Jean-Antoine

2010-06-01

Workflow management systems have recently been the focus of much interest and many research and deployment for scientific applications worldwide [26, 27]. Their ability to abstract the applications by wrapping application codes have also stressed the usefulness of such systems for multidiscipline applications [23, 24]. When complex applications need to provide seamless interfaces hiding the technicalities of the computing infrastructures, their high-level modeling, monitoring and execution functionalities help giving production teams seamless and effective facilities [25, 31, 33]. Software integration infrastructures based on programming paradigms such as Python, Mathlab and Scilab have also provided evidence of the usefulness of such approaches for the tight coupling of multidisciplne application codes [22, 24]. Also high-performance computing based on multi-core multi-cluster infrastructures open new opportunities for more accurate, more extensive and effective robust multi-discipline simulations for the decades to come [28]. This supports the goal of full flight dynamics simulation for 3D aircraft models within the next decade, opening the way to virtual flight-tests and certification of aircraft in the future [23, 24, 29].
National research and education network

NASA Technical Reports Server (NTRS)

Villasenor, Tony

1991-01-01

Some goals of this network are as follows: Extend U.S. technological leadership in high performance computing and computer communications; Provide wide dissemination and application of the technologies both to the speed and the pace of innovation and to serve the national economy, national security, education, and the global environment; and Spur gains in the U.S. productivity and industrial competitiveness by making high performance computing and networking technologies an integral part of the design and production process. Strategies for achieving these goals are as follows: Support solutions to important scientific and technical challenges through a vigorous R and D effort; Reduce the uncertainties to industry for R and D and use of this technology through increased cooperation between government, industry, and universities and by the continued use of government and government funded facilities as a prototype user for early commercial HPCC products; and Support underlying research, network, and computational infrastructures on which U.S. high performance computing technology is based.
A parallel-processing approach to computing for the geographic sciences; applications and systems enhancements

USGS Publications Warehouse

Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Liu, Shu-Guang; Nichols, Erin; Haga, Jim; Maddox, Brian; Bilderback, Chris; Feller, Mark; Homer, George

2001-01-01

The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost, personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting information science research into parallel computing systems and applications.
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simulation

NASA Technical Reports Server (NTRS)

Salmon, Ellen; Duffy, Daniel; Spear, Carrie; Sinno, Scott; Vaughan, Garrison; Bowen, Michael

2018-01-01

This talk will describe recent developments at the NASA Center for Climate Simulation, which is funded by NASAs Science Mission Directorate, and supports the specialized data storage and computational needs of weather, ocean, and climate researchers, as well as astrophysicists, heliophysicists, and planetary scientists. To meet requirements for higher-resolution, higher-fidelity simulations, the NCCS augments its High Performance Computing (HPC) and storage retrieval environment. As the petabytes of model and observational data grow, the NCCS is broadening data services offerings and deploying and expanding virtualization resources for high performance analytics.
SCALING AN URBAN EMERGENCY EVACUATION FRAMEWORK: CHALLENGES AND PRACTICES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karthik, Rajasekar; Lu, Wei

2014-01-01

Critical infrastructure disruption, caused by severe weather events, natural disasters, terrorist attacks, etc., has significant impacts on urban transportation systems. We built a computational framework to simulate urban transportation systems under critical infrastructure disruption in order to aid real-time emergency evacuation. This framework will use large scale datasets to provide a scalable tool for emergency planning and management. Our framework, World-Wide Emergency Evacuation (WWEE), integrates population distribution and urban infrastructure networks to model travel demand in emergency situations at global level. Also, a computational model of agent-based traffic simulation is used to provide an optimal evacuation plan for traffic operationmore » purpose [1]. In addition, our framework provides a web-based high resolution visualization tool for emergency evacuation modelers and practitioners. We have successfully tested our framework with scenarios in both United States (Alexandria, VA) and Europe (Berlin, Germany) [2]. However, there are still some major drawbacks for scaling this framework to handle big data workloads in real time. On our back-end, lack of proper infrastructure limits us in ability to process large amounts of data, run the simulation efficiently and quickly, and provide fast retrieval and serving of data. On the front-end, the visualization performance of microscopic evacuation results is still not efficient enough due to high volume data communication between server and client. We are addressing these drawbacks by using cloud computing and next-generation web technologies, namely Node.js, NoSQL, WebGL, Open Layers 3 and HTML5 technologies. We will describe briefly about each one and how we are using and leveraging these technologies to provide an efficient tool for emergency management organizations. Our early experimentation demonstrates that using above technologies is a promising approach to build a scalable and high performance urban emergency evacuation framework that can improve traffic mobility and safety under critical infrastructure disruption in today s socially connected world.« less
SBSI: an extensible distributed software infrastructure for parameter estimation in systems biology.

PubMed

Adams, Richard; Clark, Allan; Yamaguchi, Azusa; Hanlon, Neil; Tsorman, Nikos; Ali, Shakir; Lebedeva, Galina; Goltsov, Alexey; Sorokin, Anatoly; Akman, Ozgur E; Troein, Carl; Millar, Andrew J; Goryanin, Igor; Gilmore, Stephen

2013-03-01

Complex computational experiments in Systems Biology, such as fitting model parameters to experimental data, can be challenging to perform. Not only do they frequently require a high level of computational power, but the software needed to run the experiment needs to be usable by scientists with varying levels of computational expertise, and modellers need to be able to obtain up-to-date experimental data resources easily. We have developed a software suite, the Systems Biology Software Infrastructure (SBSI), to facilitate the parameter-fitting process. SBSI is a modular software suite composed of three major components: SBSINumerics, a high-performance library containing parallelized algorithms for performing parameter fitting; SBSIDispatcher, a middleware application to track experiments and submit jobs to back-end servers; and SBSIVisual, an extensible client application used to configure optimization experiments and view results. Furthermore, we have created a plugin infrastructure to enable project-specific modules to be easily installed. Plugin developers can take advantage of the existing user-interface and application framework to customize SBSI for their own uses, facilitated by SBSI's use of standard data formats. All SBSI binaries and source-code are freely available from http://sourceforge.net/projects/sbsi under an Apache 2 open-source license. The server-side SBSINumerics runs on any Unix-based operating system; both SBSIVisual and SBSIDispatcher are written in Java and are platform independent, allowing use on Windows, Linux and Mac OS X. The SBSI project website at http://www.sbsi.ed.ac.uk provides documentation and tutorials.
Modeling the Cloud to Enhance Capabilities for Crises and Catastrophe Management

DTIC Science & Technology

2016-11-16

order for cloud computing infrastructures to be successfully deployed in real world scenarios as tools for crisis and catastrophe management, where...Statement of the Problem Studied As cloud computing becomes the dominant computational infrastructure[1] and cloud technologies make a transition to hosting...1. Formulate rigorous mathematical models representing technological capabilities and resources in cloud computing for performance modeling and
Quantum Accelerators for High-performance Computing Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Humble, Travis S.; Britt, Keith A.; Mohiyaddin, Fahd A.

We define some of the programming and system-level challenges facing the application of quantum processing to high-performance computing. Alongside barriers to physical integration, prominent differences in the execution of quantum and conventional programs challenges the intersection of these computational models. Following a brief overview of the state of the art, we discuss recent advances in programming and execution models for hybrid quantum-classical computing. We discuss a novel quantum-accelerator framework that uses specialized kernels to offload select workloads while integrating with existing computing infrastructure. We elaborate on the role of the host operating system to manage these unique accelerator resources, themore » prospects for deploying quantum modules, and the requirements placed on the language hierarchy connecting these different system components. We draw on recent advances in the modeling and simulation of quantum computing systems with the development of architectures for hybrid high-performance computing systems and the realization of software stacks for controlling quantum devices. Finally, we present simulation results that describe the expected system-level behavior of high-performance computing systems composed from compute nodes with quantum processing units. We describe performance for these hybrid systems in terms of time-to-solution, accuracy, and energy consumption, and we use simple application examples to estimate the performance advantage of quantum acceleration.« less
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences.

PubMed

Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker

2016-01-01

The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant's platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses.
Integrating Reconfigurable Hardware-Based Grid for High Performance Computing

PubMed Central

Dondo Gazzano, Julio; Sanchez Molina, Francisco; Rincon, Fernando; López, Juan Carlos

2015-01-01

FPGAs have shown several characteristics that make them very attractive for high performance computing (HPC). The impressive speed-up factors that they are able to achieve, the reduced power consumption, and the easiness and flexibility of the design process with fast iterations between consecutive versions are examples of benefits obtained with their use. However, there are still some difficulties when using reconfigurable platforms as accelerator that need to be addressed: the need of an in-depth application study to identify potential acceleration, the lack of tools for the deployment of computational problems in distributed hardware platforms, and the low portability of components, among others. This work proposes a complete grid infrastructure for distributed high performance computing based on dynamically reconfigurable FPGAs. Besides, a set of services designed to facilitate the application deployment is described. An example application and a comparison with other hardware and software implementations are shown. Experimental results show that the proposed architecture offers encouraging advantages for deployment of high performance distributed applications simplifying development process. PMID:25874241
PRACE - The European HPC Infrastructure

NASA Astrophysics Data System (ADS)

Stadelmeyer, Peter

2014-05-01

The mission of PRACE (Partnership for Advanced Computing in Europe) is to enable high impact scientific discovery and engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. PRACE seeks to realize this mission by offering world class computing and data management resources and services through a peer review process. This talk gives a general overview about PRACE and the PRACE research infrastructure (RI). PRACE is established as an international not-for-profit association and the PRACE RI is a pan-European supercomputing infrastructure which offers access to computing and data management resources at partner sites distributed throughout Europe. Besides a short summary about the organization, history, and activities of PRACE, it is explained how scientists and researchers from academia and industry from around the world can access PRACE systems and which education and training activities are offered by PRACE. The overview also contains a selection of PRACE contributions to societal challenges and ongoing activities. Examples of the latter are beside others petascaling, application benchmark suite, best practice guides for efficient use of key architectures, application enabling / scaling, new programming models, and industrial applications. The Partnership for Advanced Computing in Europe (PRACE) is an international non-profit association with its seat in Brussels. The PRACE Research Infrastructure provides a persistent world-class high performance computing service for scientists and researchers from academia and industry in Europe. The computer systems and their operations accessible through PRACE are provided by 4 PRACE members (BSC representing Spain, CINECA representing Italy, GCS representing Germany and GENCI representing France). The Implementation Phase of PRACE receives funding from the EU's Seventh Framework Programme (FP7/2007-2013) under grant agreements RI-261557, RI-283493 and RI-312763. For more information, see www.prace-ri.eu
Early experiences in developing and managing the neuroscience gateway.

PubMed

Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas T

2015-02-01

The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway.
Early experiences in developing and managing the neuroscience gateway

PubMed Central

Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas. T.

2015-01-01

SUMMARY The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway. PMID:26523124
Multi-objective optimization of GENIE Earth system models.

PubMed

Price, Andrew R; Myerscough, Richard J; Voutchkov, Ivan I; Marsh, Robert; Cox, Simon J

2009-07-13

The tuning of parameters in climate models is essential to provide reliable long-term forecasts of Earth system behaviour. We apply a multi-objective optimization algorithm to the problem of parameter estimation in climate models. This optimization process involves the iterative evaluation of response surface models (RSMs), followed by the execution of multiple Earth system simulations. These computations require an infrastructure that provides high-performance computing for building and searching the RSMs and high-throughput computing for the concurrent evaluation of a large number of models. Grid computing technology is therefore essential to make this algorithm practical for members of the GENIE project.
Elastic Cloud Computing Infrastructures in the Open Cirrus Testbed Implemented via Eucalyptus

NASA Astrophysics Data System (ADS)

Baun, Christian; Kunze, Marcel

Cloud computing realizes the advantages and overcomes some restrictionsof the grid computing paradigm. Elastic infrastructures can easily be createdand managed by cloud users. In order to accelerate the research ondata center management and cloud services the OpenCirrusTM researchtestbed has been started by HP, Intel and Yahoo!. Although commercialcloud offerings are proprietary, Open Source solutions exist in the field ofIaaS with Eucalyptus, PaaS with AppScale and at the applications layerwith Hadoop MapReduce. This paper examines the I/O performance ofcloud computing infrastructures implemented with Eucalyptus in contrastto Amazon S3.
A simple grid implementation with Berkeley Open Infrastructure for Network Computing using BLAST as a model

PubMed Central

Pinthong, Watthanai; Muangruen, Panya

2016-01-01

Development of high-throughput technologies, such as Next-generation sequencing, allows thousands of experiments to be performed simultaneously while reducing resource requirement. Consequently, a massive amount of experiment data is now rapidly generated. Nevertheless, the data are not readily usable or meaningful until they are further analysed and interpreted. Due to the size of the data, a high performance computer (HPC) is required for the analysis and interpretation. However, the HPC is expensive and difficult to access. Other means were developed to allow researchers to acquire the power of HPC without a need to purchase and maintain one such as cloud computing services and grid computing system. In this study, we implemented grid computing in a computer training center environment using Berkeley Open Infrastructure for Network Computing (BOINC) as a job distributor and data manager combining all desktop computers to virtualize the HPC. Fifty desktop computers were used for setting up a grid system during the off-hours. In order to test the performance of the grid system, we adapted the Basic Local Alignment Search Tools (BLAST) to the BOINC system. Sequencing results from Illumina platform were aligned to the human genome database by BLAST on the grid system. The result and processing time were compared to those from a single desktop computer and HPC. The estimated durations of BLAST analysis for 4 million sequence reads on a desktop PC, HPC and the grid system were 568, 24 and 5 days, respectively. Thus, the grid implementation of BLAST by BOINC is an efficient alternative to the HPC for sequence alignment. The grid implementation by BOINC also helped tap unused computing resources during the off-hours and could be easily modified for other available bioinformatics software. PMID:27547555
National Laboratory for Advanced Scientific Visualization at UNAM - Mexico

NASA Astrophysics Data System (ADS)

Manea, Marina; Constantin Manea, Vlad; Varela, Alfredo

2016-04-01

In 2015, the National Autonomous University of Mexico (UNAM) joined the family of Universities and Research Centers where advanced visualization and computing plays a key role to promote and advance missions in research, education, community outreach, as well as business-oriented consulting. This initiative provides access to a great variety of advanced hardware and software resources and offers a range of consulting services that spans a variety of areas related to scientific visualization, among which are: neuroanatomy, embryonic development, genome related studies, geosciences, geography, physics and mathematics related disciplines. The National Laboratory for Advanced Scientific Visualization delivers services through three main infrastructure environments: the 3D fully immersive display system Cave, the high resolution parallel visualization system Powerwall, the high resolution spherical displays Earth Simulator. The entire visualization infrastructure is interconnected to a high-performance-computing-cluster (HPCC) called ADA in honor to Ada Lovelace, considered to be the first computer programmer. The Cave is an extra large 3.6m wide room with projected images on the front, left and right, as well as floor walls. Specialized crystal eyes LCD-shutter glasses provide a strong stereo depth perception, and a variety of tracking devices allow software to track the position of a user's hand, head and wand. The Powerwall is designed to bring large amounts of complex data together through parallel computing for team interaction and collaboration. This system is composed by 24 (6x4) high-resolution ultra-thin (2 mm) bezel monitors connected to a high-performance GPU cluster. The Earth Simulator is a large (60") high-resolution spherical display used for global-scale data visualization like geophysical, meteorological, climate and ecology data. The HPCC-ADA, is a 1000+ computing core system, which offers parallel computing resources to applications that requires large quantity of memory as well as large and fast parallel storage systems. The entire system temperature is controlled by an energy and space efficient cooling solution, based on large rear door liquid cooled heat exchangers. This state-of-the-art infrastructure will boost research activities in the region, offer a powerful scientific tool for teaching at undergraduate and graduate levels, and enhance association and cooperation with business-oriented organizations.
High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis.

PubMed

Simonyan, Vahan; Mazumder, Raja

2014-09-30

The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis.

High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis

PubMed Central

Simonyan, Vahan; Mazumder, Raja

2014-01-01

The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis. PMID:25271953
Low cost, high performance processing of single particle cryo-electron microscopy data in the cloud.

PubMed

Cianfrocco, Michael A; Leschziner, Andres E

2015-05-08

The advent of a new generation of electron microscopes and direct electron detectors has realized the potential of single particle cryo-electron microscopy (cryo-EM) as a technique to generate high-resolution structures. Calculating these structures requires high performance computing clusters, a resource that may be limiting to many likely cryo-EM users. To address this limitation and facilitate the spread of cryo-EM, we developed a publicly available 'off-the-shelf' computing environment on Amazon's elastic cloud computing infrastructure. This environment provides users with single particle cryo-EM software packages and the ability to create computing clusters with 16-480+ CPUs. We tested our computing environment using a publicly available 80S yeast ribosome dataset and estimate that laboratories could determine high-resolution cryo-EM structures for $50 to $1500 per structure within a timeframe comparable to local clusters. Our analysis shows that Amazon's cloud computing environment may offer a viable computing environment for cryo-EM.
Heads in the Cloud: A Primer on Neuroimaging Applications of High Performance Computing.

PubMed

Shatil, Anwar S; Younas, Sohail; Pourreza, Hossein; Figley, Chase R

2015-01-01

With larger data sets and more sophisticated analyses, it is becoming increasingly common for neuroimaging researchers to push (or exceed) the limitations of standalone computer workstations. Nonetheless, although high-performance computing platforms such as clusters, grids and clouds are already in routine use by a small handful of neuroimaging researchers to increase their storage and/or computational power, the adoption of such resources by the broader neuroimaging community remains relatively uncommon. Therefore, the goal of the current manuscript is to: 1) inform prospective users about the similarities and differences between computing clusters, grids and clouds; 2) highlight their main advantages; 3) discuss when it may (and may not) be advisable to use them; 4) review some of their potential problems and barriers to access; and finally 5) give a few practical suggestions for how interested new users can start analyzing their neuroimaging data using cloud resources. Although the aim of cloud computing is to hide most of the complexity of the infrastructure management from end-users, we recognize that this can still be an intimidating area for cognitive neuroscientists, psychologists, neurologists, radiologists, and other neuroimaging researchers lacking a strong computational background. Therefore, with this in mind, we have aimed to provide a basic introduction to cloud computing in general (including some of the basic terminology, computer architectures, infrastructure and service models, etc.), a practical overview of the benefits and drawbacks, and a specific focus on how cloud resources can be used for various neuroimaging applications.
Heads in the Cloud: A Primer on Neuroimaging Applications of High Performance Computing

PubMed Central

Shatil, Anwar S.; Younas, Sohail; Pourreza, Hossein; Figley, Chase R.

2015-01-01

With larger data sets and more sophisticated analyses, it is becoming increasingly common for neuroimaging researchers to push (or exceed) the limitations of standalone computer workstations. Nonetheless, although high-performance computing platforms such as clusters, grids and clouds are already in routine use by a small handful of neuroimaging researchers to increase their storage and/or computational power, the adoption of such resources by the broader neuroimaging community remains relatively uncommon. Therefore, the goal of the current manuscript is to: 1) inform prospective users about the similarities and differences between computing clusters, grids and clouds; 2) highlight their main advantages; 3) discuss when it may (and may not) be advisable to use them; 4) review some of their potential problems and barriers to access; and finally 5) give a few practical suggestions for how interested new users can start analyzing their neuroimaging data using cloud resources. Although the aim of cloud computing is to hide most of the complexity of the infrastructure management from end-users, we recognize that this can still be an intimidating area for cognitive neuroscientists, psychologists, neurologists, radiologists, and other neuroimaging researchers lacking a strong computational background. Therefore, with this in mind, we have aimed to provide a basic introduction to cloud computing in general (including some of the basic terminology, computer architectures, infrastructure and service models, etc.), a practical overview of the benefits and drawbacks, and a specific focus on how cloud resources can be used for various neuroimaging applications. PMID:27279746
Modeling, Simulation and Analysis of Public Key Infrastructure

NASA Technical Reports Server (NTRS)

Liu, Yuan-Kwei; Tuey, Richard; Ma, Paul (Technical Monitor)

1998-01-01

Security is an essential part of network communication. The advances in cryptography have provided solutions to many of the network security requirements. Public Key Infrastructure (PKI) is the foundation of the cryptography applications. The main objective of this research is to design a model to simulate a reliable, scalable, manageable, and high-performance public key infrastructure. We build a model to simulate the NASA public key infrastructure by using SimProcess and MatLab Software. The simulation is from top level all the way down to the computation needed for encryption, decryption, digital signature, and secure web server. The application of secure web server could be utilized in wireless communications. The results of the simulation are analyzed and confirmed by using queueing theory.
Irregular Applications: Architectures & Algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, John T.; Villa, Oreste; Tumeo, Antonino

Irregular applications are characterized by irregular data structures, control and communication patterns. Novel irregular high performance applications which deal with large data sets and require have recently appeared. Unfortunately, current high performance systems and software infrastructures executes irregular algorithms poorly. Only coordinated efforts by end user, area specialists and computer scientists that consider both the architecture and the software stack may be able to provide solutions to the challenges of modern irregular applications.
Technology Policy: Information Infrastructure [Information Superhighways and High Performance Computing]. Hearings before the Subcommittee on Technology, Environment and Aviation of the Committee on Science, Space, and Technology. House of Representatives, One Hundred Third Congress, First Session (March 23 and 25, 1993).

ERIC Educational Resources Information Center

Congress of the U.S., Washington, DC. House Committee on Science, Space and Technology.

This document contains transcriptions of testimony and prepared statements on national technology policy with a focus on President Clinton and Vice President Gore's initiatives to support the development of a national information infrastructure. On the first day of the hearing testimony was received from Edward H. Salmon, Chairman, New Jersey…
Developing an Index to Measure Health System Performance: Measurement for Districts of Nepal.

PubMed

Kandel, N; Fric, A; Lamichhane, J

2014-01-01

Various frameworks for measuring health system performance have been proposed and discussed. The scope of using performance indicators are broad, ranging from examining national health system to individual patients at various levels of health system. Development of innovative and easy index is essential to measure multidimensionality of health systems. We used indicators, which also serve as proxy to the set of activities, whose primary goal is to maintain and improve health. We used eleven indicators of MDGs, which represent all dimensions of health to develop index. These indicators are computed with similar methodology that of human development index. We used published data of Nepal for computation of the index for districts of Nepal as an illustration. To validate our finding, we compared the indices of these districts with other development indices of Nepal. An index for each district has been computed from eleven indicators. Then indices are compared with that of human development index, socio-economic and infrastructure development indices and findings has shown the similarity on distribution of districts. Categories of low and high performing districts on health system performance are also having low and high human development, socio-economic, and infrastructure indices respectively. This methodology of computing index from various indicators could assist policy makers and program managers to prioritize activities based on their performance. Validation of the findings with that of other development indicators show that this can be one of the tools, which can assist on assessing health system performance for policy makers, program managers and others.
A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vineyard, Craig Michael; Verzi, Stephen Joseph

As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilizemore » memory.« less
A Cloud-based Infrastructure and Architecture for Environmental System Research

NASA Astrophysics Data System (ADS)

Wang, D.; Wei, Y.; Shankar, M.; Quigley, J.; Wilson, B. E.

2016-12-01

The present availability of high-capacity networks, low-cost computers and storage devices, and the widespread adoption of hardware virtualization and service-oriented architecture provide a great opportunity to enable data and computing infrastructure sharing between closely related research activities. By taking advantage of these approaches, along with the world-class high computing and data infrastructure located at Oak Ridge National Laboratory, a cloud-based infrastructure and architecture has been developed to efficiently deliver essential data and informatics service and utilities to the environmental system research community, and will provide unique capabilities that allows terrestrial ecosystem research projects to share their software utilities (tools), data and even data submission workflow in a straightforward fashion. The infrastructure will minimize large disruptions from current project-based data submission workflows for better acceptances from existing projects, since many ecosystem research projects already have their own requirements or preferences for data submission and collection. The infrastructure will eliminate scalability problems with current project silos by provide unified data services and infrastructure. The Infrastructure consists of two key components (1) a collection of configurable virtual computing environments and user management systems that expedite data submission and collection from environmental system research community, and (2) scalable data management services and system, originated and development by ORNL data centers.
Quantifying the Digital Divide: A Scientific Overview of Network Connectivity and Grid Infrastructure in South Asian Countries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Khan, Shahryar Muhammad; /SLAC /NUST, Rawalpindi; Cottrell, R.Les

2007-10-30

The future of Computing in High Energy Physics (HEP) applications depends on both the Network and Grid infrastructure. South Asian countries such as India and Pakistan are making significant progress by building clusters as well as improving their network infrastructure However to facilitate the use of these resources, they need to manage the issues of network connectivity to be among the leading participants in Computing for HEP experiments. In this paper we classify the connectivity for academic and research institutions of South Asia. The quantitative measurements are carried out using the PingER methodology; an approach that induces minimal ICMP trafficmore » to gather active end-to-end network statistics. The PingER project has been measuring the Internet performance for the last decade. Currently the measurement infrastructure comprises of over 700 hosts in more than 130 countries which collectively represents approximately 99% of the world's Internet-connected population. Thus, we are well positioned to characterize the world's connectivity. Here we present the current state of the National Research and Educational Networks (NRENs) and Grid Infrastructure in the South Asian countries and identify the areas of concern. We also present comparisons between South Asia and other developing as well as developed regions. We show that there is a strong correlation between the Network performance and several Human Development indices.« less
The Computing and Data Grid Approach: Infrastructure for Distributed Science Applications

NASA Technical Reports Server (NTRS)

Johnston, William E.

2002-01-01

With the advent of Grids - infrastructure for using and managing widely distributed computing and data resources in the science environment - there is now an opportunity to provide a standard, large-scale, computing, data, instrument, and collaboration environment for science that spans many different projects and provides the required infrastructure and services in a relatively uniform and supportable way. Grid technology has evolved over the past several years to provide the services and infrastructure needed for building 'virtual' systems and organizations. We argue that Grid technology provides an excellent basis for the creation of the integrated environments that can combine the resources needed to support the large- scale science projects located at multiple laboratories and universities. We present some science case studies that indicate that a paradigm shift in the process of science will come about as a result of Grids providing transparent and secure access to advanced and integrated information and technologies infrastructure: powerful computing systems, large-scale data archives, scientific instruments, and collaboration tools. These changes will be in the form of services that can be integrated with the user's work environment, and that enable uniform and highly capable access to these computers, data, and instruments, regardless of the location or exact nature of these resources. These services will integrate transient-use resources like computing systems, scientific instruments, and data caches (e.g., as they are needed to perform a simulation or analyze data from a single experiment); persistent-use resources. such as databases, data catalogues, and archives, and; collaborators, whose involvement will continue for the lifetime of a project or longer. While we largely address large-scale science in this paper, Grids, particularly when combined with Web Services, will address a broad spectrum of science scenarios. both large and small scale.
Infrastructures for Distributed Computing: the case of BESIII

NASA Astrophysics Data System (ADS)

Pellegrino, J.

2018-05-01

The BESIII is an electron-positron collision experiment hosted at BEPCII in Beijing and aimed to investigate Tau-Charm physics. Now BESIII has been running for several years and gathered more than 1PB raw data. In order to analyze these data and perform massive Monte Carlo simulations, a large amount of computing and storage resources is needed. The distributed computing system is based up on DIRAC and it is in production since 2012. It integrates computing and storage resources from different institutes and a variety of resource types such as cluster, grid, cloud or volunteer computing. About 15 sites from BESIII Collaboration from all over the world joined this distributed computing infrastructure, giving a significant contribution to the IHEP computing facility. Nowadays cloud computing is playing a key role in the HEP computing field, due to its scalability and elasticity. Cloud infrastructures take advantages of several tools, such as VMDirac, to manage virtual machines through cloud managers according to the job requirements. With the virtually unlimited resources from commercial clouds, the computing capacity could scale accordingly in order to deal with any burst demands. General computing models have been discussed in the talk and are addressed herewith, with particular focus on the BESIII infrastructure. Moreover new computing tools and upcoming infrastructures will be addressed.
Building Nationally-Focussed, Globally Federated, High Performance Earth Science Platforms to Solve Next Generation Social and Economic Issues.

NASA Astrophysics Data System (ADS)

Wyborn, Lesley; Evans, Ben; Foster, Clinton; Pugh, Timothy; Uhlherr, Alfred

2015-04-01

Digital geoscience data and information are integral to informing decisions on the social, economic and environmental management of natural resources. Traditionally, such decisions were focused on regional or national viewpoints only, but it is increasingly being recognised that global perspectives are required to meet new challenges such as predicting impacts of climate change; sustainably exploiting scarce water, mineral and energy resources; and protecting our communities through better prediction of the behaviour of natural hazards. In recent years, technical advances in scientific instruments have resulted in a surge in data volumes, with data now being collected at unprecedented rates and at ever increasing resolutions. The size of many earth science data sets now exceed the computational capacity of many government and academic organisations to locally store and dynamically access the data sets; to internally process and analyse them to high resolutions; and then to deliver them online to clients, partners and stakeholders. Fortunately, at the same time, computational capacities have commensurately increased (both cloud and HPC): these can now provide the capability to effectively access the ever-growing data assets within realistic time frames. However, to achieve this, data and computing need to be co-located: bandwidth limits the capacity to move the large data sets; the data transfers are too slow; and latencies to access them are too high. These scenarios are driving the move towards more centralised High Performance (HP) Infrastructures. The rapidly increasing scale of data, the growing complexity of software and hardware environments, combined with the energy costs of running such infrastructures is creating a compelling economic argument for just having one or two major national (or continental) HP facilities that can be federated internationally to enable earth and environmental issues to be tackled at global scales. But at the same time, if properly constructed, these infrastructures can also service very small-scale research projects. The National Computational Infrastructure (NCI) at the Australian National University (ANU) has built such an HP infrastructure as part of the Australian Government's National Collaborative Research Infrastructure Strategy. NCI operates as a formal partnership between the ANU and the three major Australian National Government Scientific Agencies: the Commonwealth Scientific and Industrial Research Organisation (CSIRO), the Bureau of Meteorology and Geoscience Australia. The government partners agreed to explore the new opportunities offered within the partnership with NCI, rather than each running their own separate agenda independently. The data from these national agencies, as well as from collaborating overseas organisations (e.g., NASA, NOAA, USGS, CMIP, etc.) are either replicated to, or produced at, NCI. By co-locating and harmonising these vast data collections within the integrated HP computing environments at NCI, new opportunities have arisen for Data-intensive Interdisciplinary Science at scales and resolutions not hitherto possible. The new NCI infrastructure has also enabled the blending of research by the university sector with the more operational business of government science agencies, with the fundamental shift being that researchers from both sectors work and collaborate within a federated data and computational environment that contains both national and international data collections.
Partnership in Computational Science

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huray, Paul G.

1999-02-24

This is the final report for the "Partnership in Computational Science" (PICS) award in an amount of $500,000 for the period January 1, 1993 through December 31, 1993. A copy of the proposal with its budget is attached as Appendix A. This report first describes the consequent significance of the DOE award in building infrastructure of high performance computing in the Southeast and then describes the work accomplished under this grant and a list of publications resulting from it.
The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences

PubMed Central

Merchant, Nirav; Lyons, Eric; Goff, Stephen; Vaughn, Matthew; Ware, Doreen; Micklos, David; Antin, Parker

2016-01-01

The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identity management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning material, and best practice resources to help all researchers make the best use of their data, expand their computational skill set, and effectively manage their data and computation when working as distributed teams. iPlant’s platform permits researchers to easily deposit and share their data and deploy new computational tools and analysis workflows, allowing the broader community to easily use and reuse those data and computational analyses. PMID:26752627
A Screen Space GPGPU Surface LIC Algorithm for Distributed Memory Data Parallel Sort Last Rendering Infrastructures

NASA Astrophysics Data System (ADS)

Loring, B.; Karimabadi, H.; Rortershteyn, V.

2015-10-01

The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not. We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.
A Screen Space GPGPU Surface LIC Algorithm for Distributed Memory Data Parallel Sort Last Rendering Infrastructures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loring, Burlen; Karimabadi, Homa; Rortershteyn, Vadim

2014-07-01

The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not.more » We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.« less
The Reduced Basis Method in Geosciences: Practical examples for numerical forward simulations

NASA Astrophysics Data System (ADS)

Degen, D.; Veroy, K.; Wellmann, F.

2017-12-01

Due to the highly heterogeneous character of the earth's subsurface, the complex coupling of thermal, hydrological, mechanical, and chemical processes, and the limited accessibility we have to face high-dimensional problems associated with high uncertainties in geosciences. Performing the obviously necessary uncertainty quantifications with a reasonable number of parameters is often not possible due to the high-dimensional character of the problem. Therefore, we are presenting the reduced basis (RB) method, being a model order reduction (MOR) technique, that constructs low-order approximations to, for instance, the finite element (FE) space. We use the RB method to address this computationally challenging simulations because this method significantly reduces the degrees of freedom. The RB method is decomposed into an offline and online stage, allowing to make the expensive pre-computations beforehand to get real-time results during field campaigns. Generally, the RB approach is most beneficial in the many-query and real-time context.We will illustrate the advantages of the RB method for the field of geosciences through two examples of numerical forward simulations.The first example is a geothermal conduction problem demonstrating the implementation of the RB method for a steady-state case. The second examples, a Darcy flow problem, shows the benefits for transient scenarios. In both cases, a quality evaluation of the approximations is given. Additionally, the runtimes for both the FE and the RB simulations are compared. We will emphasize the advantages of this method for repetitive simulations by showing the speed-up for the RB solution in contrast to the FE solution. Finally, we will demonstrate how the used implementation is usable in high-performance computing (HPC) infrastructures and evaluate its performance for such infrastructures. Hence, we will especially point out its scalability, yielding in an optimal usage on HPC infrastructures and normal working stations.
Cloud Computing and Its Applications in GIS

NASA Astrophysics Data System (ADS)

Kang, Cao

2011-12-01

Cloud computing is a novel computing paradigm that offers highly scalable and highly available distributed computing services. The objectives of this research are to: 1. analyze and understand cloud computing and its potential for GIS; 2. discover the feasibilities of migrating truly spatial GIS algorithms to distributed computing infrastructures; 3. explore a solution to host and serve large volumes of raster GIS data efficiently and speedily. These objectives thus form the basis for three professional articles. The first article is entitled "Cloud Computing and Its Applications in GIS". This paper introduces the concept, structure, and features of cloud computing. Features of cloud computing such as scalability, parallelization, and high availability make it a very capable computing paradigm. Unlike High Performance Computing (HPC), cloud computing uses inexpensive commodity computers. The uniform administration systems in cloud computing make it easier to use than GRID computing. Potential advantages of cloud-based GIS systems such as lower barrier to entry are consequently presented. Three cloud-based GIS system architectures are proposed: public cloud- based GIS systems, private cloud-based GIS systems and hybrid cloud-based GIS systems. Public cloud-based GIS systems provide the lowest entry barriers for users among these three architectures, but their advantages are offset by data security and privacy related issues. Private cloud-based GIS systems provide the best data protection, though they have the highest entry barriers. Hybrid cloud-based GIS systems provide a compromise between these extremes. The second article is entitled "A cloud computing algorithm for the calculation of Euclidian distance for raster GIS". Euclidean distance is a truly spatial GIS algorithm. Classical algorithms such as the pushbroom and growth ring techniques require computational propagation through the entire raster image, which makes it incompatible with the distributed nature of cloud computing. This paper presents a parallel Euclidean distance algorithm that works seamlessly with the distributed nature of cloud computing infrastructures. The mechanism of this algorithm is to subdivide a raster image into sub-images and wrap them with a one pixel deep edge layer of individually computed distance information. Each sub-image is then processed by a separate node, after which the resulting sub-images are reassembled into the final output. It is shown that while any rectangular sub-image shape can be used, those approximating squares are computationally optimal. This study also serves as a demonstration of this subdivide and layer-wrap strategy, which would enable the migration of many truly spatial GIS algorithms to cloud computing infrastructures. However, this research also indicates that certain spatial GIS algorithms such as cost distance cannot be migrated by adopting this mechanism, which presents significant challenges for the development of cloud-based GIS systems. The third article is entitled "A Distributed Storage Schema for Cloud Computing based Raster GIS Systems". This paper proposes a NoSQL Database Management System (NDDBMS) based raster GIS data storage schema. NDDBMS has good scalability and is able to use distributed commodity computers, which make it superior to Relational Database Management Systems (RDBMS) in a cloud computing environment. In order to provide optimized data service performance, the proposed storage schema analyzes the nature of commonly used raster GIS data sets. It discriminates two categories of commonly used data sets, and then designs corresponding data storage models for both categories. As a result, the proposed storage schema is capable of hosting and serving enormous volumes of raster GIS data speedily and efficiently on cloud computing infrastructures. In addition, the scheme also takes advantage of the data compression characteristics of Quadtrees, thus promoting efficient data storage. Through this assessment of cloud computing technology, the exploration of the challenges and solutions to the migration of GIS algorithms to cloud computing infrastructures, and the examination of strategies for serving large amounts of GIS data in a cloud computing infrastructure, this dissertation lends support to the feasibility of building a cloud-based GIS system. However, there are still challenges that need to be addressed before a full-scale functional cloud-based GIS system can be successfully implemented. (Abstract shortened by UMI.)

EGI-EUDAT integration activity - Pair data and high-throughput computing resources together

NASA Astrophysics Data System (ADS)

Scardaci, Diego; Viljoen, Matthew; Vitlacil, Dejan; Fiameni, Giuseppe; Chen, Yin; sipos, Gergely; Ferrari, Tiziana

2016-04-01

EGI (www.egi.eu) is a publicly funded e-infrastructure put together to give scientists access to more than 530,000 logical CPUs, 200 PB of disk capacity and 300 PB of tape storage to drive research and innovation in Europe. The infrastructure provides both high throughput computing and cloud compute/storage capabilities. Resources are provided by about 350 resource centres which are distributed across 56 countries in Europe, the Asia-Pacific region, Canada and Latin America. EUDAT (www.eudat.eu) is a collaborative Pan-European infrastructure providing research data services, training and consultancy for researchers, research communities, research infrastructures and data centres. EUDAT's vision is to enable European researchers and practitioners from any research discipline to preserve, find, access, and process data in a trusted environment, as part of a Collaborative Data Infrastructure (CDI) conceived as a network of collaborating, cooperating centres, combining the richness of numerous community-specific data repositories with the permanence and persistence of some of Europe's largest scientific data centres. EGI and EUDAT, in the context of their flagship projects, EGI-Engage and EUDAT2020, started in March 2015 a collaboration to harmonise the two infrastructures, including technical interoperability, authentication, authorisation and identity management, policy and operations. The main objective of this work is to provide end-users with a seamless access to an integrated infrastructure offering both EGI and EUDAT services and, then, pairing data and high-throughput computing resources together. To define the roadmap of this collaboration, EGI and EUDAT selected a set of relevant user communities, already collaborating with both infrastructures, which could bring requirements and help to assign the right priorities to each of them. In this way, from the beginning, this activity has been really driven by the end users. The identified user communities are relevant European Research infrastructure in the field of Earth Science (EPOS and ICOS), Bioinformatics (BBMRI and ELIXIR) and Space Physics (EISCAT-3D). The first outcome of this activity has been the definition of a generic use case that captures the typical user scenario with respect the integrated use of the EGI and EUDAT infrastructures. This generic use case allows a user to instantiate a set of Virtual Machine images on the EGI Federated Cloud to perform computational jobs that analyse data previously stored on EUDAT long-term storage systems. The results of such analysis can be staged back to EUDAT storages, and if needed, allocated with Permanent identifyers (PIDs) for future use. The implementation of this generic use case requires the following integration activities between EGI and EUDAT: (1) harmonisation of the user authentication and authorisation models, (2) implementing interface connectors between the relevant EGI and EUDAT services, particularly EGI Cloud compute facilities and EUDAT long-term storage and PID systems. In the presentation, the collected user requirements and the implementation status of the universal use case will be showed. Furthermore, how the universal use case is currently applied to satisfy EPOS and ICOS needs will be described.
A hybrid computational strategy to address WGS variant analysis in >5000 samples.

PubMed

Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan; Carroll, Andrew; Gibbs, Richard; Boerwinkle, Eric; Venkata, Manjunath Gorentla; Yu, Fuli

2016-09-10

The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.
Low cost, high performance processing of single particle cryo-electron microscopy data in the cloud

PubMed Central

Cianfrocco, Michael A; Leschziner, Andres E

2015-01-01

The advent of a new generation of electron microscopes and direct electron detectors has realized the potential of single particle cryo-electron microscopy (cryo-EM) as a technique to generate high-resolution structures. Calculating these structures requires high performance computing clusters, a resource that may be limiting to many likely cryo-EM users. To address this limitation and facilitate the spread of cryo-EM, we developed a publicly available ‘off-the-shelf’ computing environment on Amazon's elastic cloud computing infrastructure. This environment provides users with single particle cryo-EM software packages and the ability to create computing clusters with 16–480+ CPUs. We tested our computing environment using a publicly available 80S yeast ribosome dataset and estimate that laboratories could determine high-resolution cryo-EM structures for $50 to $1500 per structure within a timeframe comparable to local clusters. Our analysis shows that Amazon's cloud computing environment may offer a viable computing environment for cryo-EM. DOI: http://dx.doi.org/10.7554/eLife.06664.001 PMID:25955969
High-Performance Java Codes for Computational Fluid Dynamics

NASA Technical Reports Server (NTRS)

Riley, Christopher; Chatterjee, Siddhartha; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

The computational science community is reluctant to write large-scale computationally -intensive applications in Java due to concerns over Java's poor performance, despite the claimed software engineering advantages of its object-oriented features. Naive Java implementations of numerical algorithms can perform poorly compared to corresponding Fortran or C implementations. To achieve high performance, Java applications must be designed with good performance as a primary goal. This paper presents the object-oriented design and implementation of two real-world applications from the field of Computational Fluid Dynamics (CFD): a finite-volume fluid flow solver (LAURA, from NASA Langley Research Center), and an unstructured mesh adaptation algorithm (2D_TAG, from NASA Ames Research Center). This work builds on our previous experience with the design of high-performance numerical libraries in Java. We examine the performance of the applications using the currently available Java infrastructure and show that the Java version of the flow solver LAURA performs almost within a factor of 2 of the original procedural version. Our Java version of the mesh adaptation algorithm 2D_TAG performs within a factor of 1.5 of its original procedural version on certain platforms. Our results demonstrate that object-oriented software design principles are not necessarily inimical to high performance.
SBSI: an extensible distributed software infrastructure for parameter estimation in systems biology

PubMed Central

Adams, Richard; Clark, Allan; Yamaguchi, Azusa; Hanlon, Neil; Tsorman, Nikos; Ali, Shakir; Lebedeva, Galina; Goltsov, Alexey; Sorokin, Anatoly; Akman, Ozgur E.; Troein, Carl; Millar, Andrew J.; Goryanin, Igor; Gilmore, Stephen

2013-01-01

Summary: Complex computational experiments in Systems Biology, such as fitting model parameters to experimental data, can be challenging to perform. Not only do they frequently require a high level of computational power, but the software needed to run the experiment needs to be usable by scientists with varying levels of computational expertise, and modellers need to be able to obtain up-to-date experimental data resources easily. We have developed a software suite, the Systems Biology Software Infrastructure (SBSI), to facilitate the parameter-fitting process. SBSI is a modular software suite composed of three major components: SBSINumerics, a high-performance library containing parallelized algorithms for performing parameter fitting; SBSIDispatcher, a middleware application to track experiments and submit jobs to back-end servers; and SBSIVisual, an extensible client application used to configure optimization experiments and view results. Furthermore, we have created a plugin infrastructure to enable project-specific modules to be easily installed. Plugin developers can take advantage of the existing user-interface and application framework to customize SBSI for their own uses, facilitated by SBSI’s use of standard data formats. Availability and implementation: All SBSI binaries and source-code are freely available from http://sourceforge.net/projects/sbsi under an Apache 2 open-source license. The server-side SBSINumerics runs on any Unix-based operating system; both SBSIVisual and SBSIDispatcher are written in Java and are platform independent, allowing use on Windows, Linux and Mac OS X. The SBSI project website at http://www.sbsi.ed.ac.uk provides documentation and tutorials. Contact: stg@inf.ed.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23329415
High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away

NASA Astrophysics Data System (ADS)

Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.

2012-09-01

By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.
Integration of High-Performance Computing into Cloud Computing Services

NASA Astrophysics Data System (ADS)

Vouk, Mladen A.; Sills, Eric; Dreher, Patrick

High-Performance Computing (HPC) projects span a spectrum of computer hardware implementations ranging from peta-flop supercomputers, high-end tera-flop facilities running a variety of operating systems and applications, to mid-range and smaller computational clusters used for HPC application development, pilot runs and prototype staging clusters. What they all have in common is that they operate as a stand-alone system rather than a scalable and shared user re-configurable resource. The advent of cloud computing has changed the traditional HPC implementation. In this article, we will discuss a very successful production-level architecture and policy framework for supporting HPC services within a more general cloud computing infrastructure. This integrated environment, called Virtual Computing Lab (VCL), has been operating at NC State since fall 2004. Nearly 8,500,000 HPC CPU-Hrs were delivered by this environment to NC State faculty and students during 2009. In addition, we present and discuss operational data that show that integration of HPC and non-HPC (or general VCL) services in a cloud can substantially reduce the cost of delivering cloud services (down to cents per CPU hour).
Public-Private Partnership: Joint recommendations to improve downloads of large Earth observation data

NASA Astrophysics Data System (ADS)

Ramachandran, R.; Murphy, K. J.; Baynes, K.; Lynnes, C.

2016-12-01

With the volume of Earth observation data expanding rapidly, cloud computing is quickly changing the way Earth observation data is processed, analyzed, and visualized. The cloud infrastructure provides the flexibility to scale up to large volumes of data and handle high velocity data streams efficiently. Having freely available Earth observation data collocated on a cloud infrastructure creates opportunities for innovation and value-added data re-use in ways unforeseen by the original data provider. These innovations spur new industries and applications and spawn new scientific pathways that were previously limited due to data volume and computational infrastructure issues. NASA, in collaboration with Amazon, Google, and Microsoft, have jointly developed a set of recommendations to enable efficient transfer of Earth observation data from existing data systems to a cloud computing infrastructure. The purpose of these recommendations is to provide guidelines against which all data providers can evaluate existing data systems and be used to improve any issues uncovered to enable efficient search, access, and use of large volumes of data. Additionally, these guidelines ensure that all cloud providers utilize a common methodology for bulk-downloading data from data providers thus preventing the data providers from building custom capabilities to meet the needs of individual cloud providers. The intent is to share these recommendations with other Federal agencies and organizations that serve Earth observation to enable efficient search, access, and use of large volumes of data. Additionally, the adoption of these recommendations will benefit data users interested in moving large volumes of data from data systems to any other location. These data users include the cloud providers, cloud users such as scientists, and other users working in a high performance computing environment who need to move large volumes of data.
High-Performance I/O: HDF5 for Lattice QCD

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kurth, Thorsten; Pochinsky, Andrew; Sarje, Abhinav

2015-01-01

Practitioners of lattice QCD/QFT have been some of the primary pioneer users of the state-of-the-art high-performance-computing systems, and contribute towards the stress tests of such new machines as soon as they become available. As with all aspects of high-performance-computing, I/O is becoming an increasingly specialized component of these systems. In order to take advantage of the latest available high-performance I/O infrastructure, to ensure reliability and backwards compatibility of data files, and to help unify the data structures used in lattice codes, we have incorporated parallel HDF5 I/O into the SciDAC supported USQCD software stack. Here we present the design andmore » implementation of this I/O framework. Our HDF5 implementation outperforms optimized QIO at the 10-20% level and leaves room for further improvement by utilizing appropriate dataset chunking.« less
High-Performance I/O: HDF5 for Lattice QCD

DOE PAGES

Kurth, Thorsten; Pochinsky, Andrew; Sarje, Abhinav; ...

2017-05-09

Practitioners of lattice QCD/QFT have been some of the primary pioneer users of the state-of-the-art high-performance-computing systems, and contribute towards the stress tests of such new machines as soon as they become available. As with all aspects of high-performance-computing, I/O is becoming an increasingly specialized component of these systems. In order to take advantage of the latest available high-performance I/O infrastructure, to ensure reliability and backwards compatibility of data files, and to help unify the data structures used in lattice codes, we have incorporated parallel HDF5 I/O into the SciDAC supported USQCD software stack. Here we present the design andmore » implementation of this I/O framework. Our HDF5 implementation outperforms optimized QIO at the 10-20% level and leaves room for further improvement by utilizing appropriate dataset chunking.« less
Dynamic Collaboration Infrastructure for Hydrologic Science

NASA Astrophysics Data System (ADS)

Tarboton, D. G.; Idaszak, R.; Castillo, C.; Yi, H.; Jiang, F.; Jones, N.; Goodall, J. L.

2016-12-01

Data and modeling infrastructure is becoming increasingly accessible to water scientists. HydroShare is a collaborative environment that currently offers water scientists the ability to access modeling and data infrastructure in support of data intensive modeling and analysis. It supports the sharing of and collaboration around "resources" which are social objects defined to include both data and models in a structured standardized format. Users collaborate around these objects via comments, ratings, and groups. HydroShare also supports web services and cloud based computation for the execution of hydrologic models and analysis and visualization of hydrologic data. However, the quantity and variety of data and modeling infrastructure available that can be accessed from environments like HydroShare is increasing. Storage infrastructure can range from one's local PC to campus or organizational storage to storage in the cloud. Modeling or computing infrastructure can range from one's desktop to departmental clusters to national HPC resources to grid and cloud computing resources. How does one orchestrate this vast number of data and computing infrastructure without needing to correspondingly learn each new system? A common limitation across these systems is the lack of efficient integration between data transport mechanisms and the corresponding high-level services to support large distributed data and compute operations. A scientist running a hydrology model from their desktop may require processing a large collection of files across the aforementioned storage and compute resources and various national databases. To address these community challenges a proof-of-concept prototype was created integrating HydroShare with RADII (Resource Aware Data-centric collaboration Infrastructure) to provide software infrastructure to enable the comprehensive and rapid dynamic deployment of what we refer to as "collaborative infrastructure." In this presentation we discuss the results of this proof-of-concept prototype which enabled HydroShare users to readily instantiate virtual infrastructure marshaling arbitrary combinations, varieties, and quantities of distributed data and computing infrastructure in addressing big problems in hydrology.
Grids and clouds in the Czech NGI

NASA Astrophysics Data System (ADS)

Kundrát, Jan; Adam, Martin; Adamová, Dagmar; Chudoba, Jiří; Kouba, Tomáš; Lokajíček, Miloš; Mikula, Alexandr; Říkal, Václav; Švec, Jan; Vohnout, Rudolf

2016-09-01

There are several infrastructure operators within the Czech Republic NGI (National Grid Initiative) which provide users with access to high-performance computing facilities over a grid and cloud interface. This article focuses on those where the primary author has personal first-hand experience. We cover some operational issues as well as the history of these facilities.
A Cross-Platform Infrastructure for Scalable Runtime Application Performance Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jack Dongarra; Shirley Moore; Bart Miller, Jeffrey Hollingsworth

2005-03-15

The purpose of this project was to build an extensible cross-platform infrastructure to facilitate the development of accurate and portable performance analysis tools for current and future high performance computing (HPC) architectures. Major accomplishments include tools and techniques for multidimensional performance analysis, as well as improved support for dynamic performance monitoring of multithreaded and multiprocess applications. Previous performance tool development has been limited by the burden of having to re-write a platform-dependent low-level substrate for each architecture/operating system pair in order to obtain the necessary performance data from the system. Manual interpretation of performance data is not scalable for large-scalemore » long-running applications. The infrastructure developed by this project provides a foundation for building portable and scalable performance analysis tools, with the end goal being to provide application developers with the information they need to analyze, understand, and tune the performance of terascale applications on HPC architectures. The backend portion of the infrastructure provides runtime instrumentation capability and access to hardware performance counters, with thread-safety for shared memory environments and a communication substrate to support instrumentation of multiprocess and distributed programs. Front end interfaces provides tool developers with a well-defined, platform-independent set of calls for requesting performance data. End-user tools have been developed that demonstrate runtime data collection, on-line and off-line analysis of performance data, and multidimensional performance analysis. The infrastructure is based on two underlying performance instrumentation technologies. These technologies are the PAPI cross-platform library interface to hardware performance counters and the cross-platform Dyninst library interface for runtime modification of executable images. The Paradyn and KOJAK projects have made use of this infrastructure to build performance measurement and analysis tools that scale to long-running programs on large parallel and distributed systems and that automate much of the search for performance bottlenecks.« less
Research infrastructure support to address ecosystem dynamics

NASA Astrophysics Data System (ADS)

Los, Wouter

2014-05-01

Predicting the evolution of ecosystems to climate change or human pressures is a challenge. Even understanding past or current processes is complicated as a result of the many interactions and feedbacks that occur within and between components of the system. This talk will present an example of current research on changes in landscape evolution, hydrology, soil biogeochemical processes, zoological food webs, and plant community succession, and how these affect feedbacks to components of the systems, including the climate system. Multiple observations, experiments, and simulations provide a wealth of data, but not necessarily understanding. Model development on the coupled processes on different spatial and temporal scales is sensitive for variations in data and of parameter change. Fast high performance computing may help to visualize the effect of these changes and the potential stability (and reliability) of the models. This may than allow for iteration between data production and models towards stable models reducing uncertainty and improving the prediction of change. The role of research infrastructures becomes crucial is overcoming barriers for such research. Environmental infrastructures are covering physical site facilities, dedicated instrumentation and e-infrastructure. The LifeWatch infrastructure for biodiversity and ecosystem research will provide services for data integration, analysis and modeling. But it has to cooperate intensively with the other kinds of infrastructures in order to support the iteration between data production and model computation. The cooperation in the ENVRI project (Common operations of environmental research infrastructures) is one of the initiatives to foster such multidisciplinary research.
The multi-modal Australian ScienceS Imaging and Visualization Environment (MASSIVE) high performance computing infrastructure: applications in neuroscience and neuroinformatics research

PubMed Central

Goscinski, Wojtek J.; McIntosh, Paul; Felzmann, Ulrich; Maksimenko, Anton; Hall, Christopher J.; Gureyev, Timur; Thompson, Darren; Janke, Andrew; Galloway, Graham; Killeen, Neil E. B.; Raniga, Parnesh; Kaluza, Owen; Ng, Amanda; Poudel, Govinda; Barnes, David G.; Nguyen, Toan; Bonnington, Paul; Egan, Gary F.

2014-01-01

The Multi-modal Australian ScienceS Imaging and Visualization Environment (MASSIVE) is a national imaging and visualization facility established by Monash University, the Australian Synchrotron, the Commonwealth Scientific Industrial Research Organization (CSIRO), and the Victorian Partnership for Advanced Computing (VPAC), with funding from the National Computational Infrastructure and the Victorian Government. The MASSIVE facility provides hardware, software, and expertise to drive research in the biomedical sciences, particularly advanced brain imaging research using synchrotron x-ray and infrared imaging, functional and structural magnetic resonance imaging (MRI), x-ray computer tomography (CT), electron microscopy and optical microscopy. The development of MASSIVE has been based on best practice in system integration methodologies, frameworks, and architectures. The facility has: (i) integrated multiple different neuroimaging analysis software components, (ii) enabled cross-platform and cross-modality integration of neuroinformatics tools, and (iii) brought together neuroimaging databases and analysis workflows. MASSIVE is now operational as a nationally distributed and integrated facility for neuroinfomatics and brain imaging research. PMID:24734019
A proto-Data Processing Center for LISA

NASA Astrophysics Data System (ADS)

Cavet, Cécile; Petiteau, Antoine; Le Jeune, Maude; Plagnol, Eric; Marin-Martholaz, Etienne; Bayle, Jean-Baptiste

2017-05-01

The LISA project preparation requires to study and define a new data analysis framework, capable of dealing with highly heterogeneous CPU needs and of exploiting the emergent information technologies. In this context, a prototype of the mission’s Data Processing Center (DPC) has been initiated. The DPC is designed to efficiently manage computing constraints and to offer a common infrastructure where the whole collaboration can contribute to development work. Several tools such as continuous integration (CI) have already been delivered to the collaboration and are presently used for simulations and performance studies. This article presents the progress made regarding this collaborative environment and discusses also the possible next steps towards an on-demand computing infrastructure. This activity is supported by CNES as part of the French contribution to LISA.
VMEbus based computer and real-time UNIX as infrastructure of DAQ

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yasu, Y.; Fujii, H.; Nomachi, M.

1994-12-31

This paper describes what the authors have constructed as the infrastructure of data acquisition system (DAQ). The paper reports recent developments concerned with HP VME board computer with LynxOS (HP742rt/HP-RT) and Alpha/OSF1 with VMEbus adapter. The paper also reports current status of developing a Benchmark Suite for Data Acquisition (DAQBENCH) for measuring not only the performance of VME/CAMAC access but also that of the context switching, the inter-process communications and so on, for various computers including Workstation-based systems and VME board computers.
Science of Security Lablet - Scalability and Usability

DTIC Science & Technology

2014-12-16

mobile computing [19]. However, the high-level infrastructure design and our own implementation (both described throughout this paper) can easily...critical and infrastructural systems demands high levels of sophistication in the technical aspects of cybersecurity, software and hardware design...Forget, S. Komanduri, Alessandro Acquisti, Nicolas Christin, Lorrie Cranor, Rahul Telang. "Security Behavior Observatory: Infrastructure for Long-term
Advancing Cyberinfrastructure to support high resolution water resources modeling

NASA Astrophysics Data System (ADS)

Tarboton, D. G.; Ogden, F. L.; Jones, N.; Horsburgh, J. S.

2012-12-01

Addressing the problem of how the availability and quality of water resources at large scales are sensitive to climate variability, watershed alterations and management activities requires computational resources that combine data from multiple sources and support integrated modeling. Related cyberinfrastructure challenges include: 1) how can we best structure data and computer models to address this scientific problem through the use of high-performance and data-intensive computing, and 2) how can we do this in a way that discipline scientists without extensive computational and algorithmic knowledge and experience can take advantage of advances in cyberinfrastructure? This presentation will describe a new system called CI-WATER that is being developed to address these challenges and advance high resolution water resources modeling in the Western U.S. We are building on existing tools that enable collaboration to develop model and data interfaces that link integrated system models running within an HPC environment to multiple data sources. Our goal is to enhance the use of computational simulation and data-intensive modeling to better understand water resources. Addressing water resource problems in the Western U.S. requires simulation of natural and engineered systems, as well as representation of legal (water rights) and institutional constraints alongside the representation of physical processes. We are establishing data services to represent the engineered infrastructure and legal and institutional systems in a way that they can be used with high resolution multi-physics watershed modeling at high spatial resolution. These services will enable incorporation of location-specific information on water management infrastructure and systems into the assessment of regional water availability in the face of growing demands, uncertain future meteorological forcings, and existing prior-appropriations water rights. This presentation will discuss the informatics challenges involved with data management and easy-to-use access to high performance computing being tackled in this project.
Advances in Spatial Data Infrastructure, Acquisition, Analysis, Archiving and Dissemination

NASA Technical Reports Server (NTRS)

Ramapriyan, Hampapuran K.; Rochon, Gilbert L.; Duerr, Ruth; Rank, Robert; Nativi, Stefano; Stocker, Erich Franz

2010-01-01

The authors review recent contributions to the state-of-thescience and benign proliferation of satellite remote sensing, spatial data infrastructure, near-real-time data acquisition, analysis on high performance computing platforms, sapient archiving, multi-modal dissemination and utilization for a wide array of scientific applications. The authors also address advances in Geoinformatics and its growing ubiquity, as evidenced by its inclusion as a focus area within the American Geophysical Union (AGU), European Geosciences Union (EGU), as well as by the evolution of the IEEE Geoscience and Remote Sensing Society's (GRSS) Data Archiving and Distribution Technical Committee (DAD TC).

The EPOS Vision for the Open Science Cloud

NASA Astrophysics Data System (ADS)

Jeffery, Keith; Harrison, Matt; Cocco, Massimo

2016-04-01

Cloud computing offers dynamic elastic scalability for data processing on demand. For much research activity, demand for computing is uneven over time and so CLOUD computing offers both cost-effectiveness and capacity advantages. However, as reported repeatedly by the EC Cloud Expert Group, there are barriers to the uptake of Cloud Computing: (1) security and privacy; (2) interoperability (avoidance of lock-in); (3) lack of appropriate systems development environments for application programmers to characterise their applications to allow CLOUD middleware to optimize their deployment and execution. From CERN, the Helix-Nebula group has proposed the architecture for the European Open Science Cloud. They are discussing with other e-Infrastructure groups such as EGI (GRIDs), EUDAT (data curation), AARC (network authentication and authorisation) and also with the EIROFORUM group of 'international treaty' RIs (Research Infrastructures) and the ESFRI (European Strategic Forum for Research Infrastructures) RIs including EPOS. Many of these RIs are either e-RIs (electronic-RIs) or have an e-RI interface for access and use. The EPOS architecture is centred on a portal: ICS (Integrated Core Services). The architectural design already allows for access to e-RIs (which may include any or all of data, software, users and resources such as computers or instruments). Those within any one domain (subject area) of EPOS are considered within the TCS (Thematic Core Services). Those outside, or available across multiple domains of EPOS, are ICS-d (Integrated Core Services-Distributed) since the intention is that they will be used by any or all of the TCS via the ICS. Another such service type is CES (Computational Earth Science); effectively an ICS-d specializing in high performance computation, analytics, simulation or visualization offered by a TCS for others to use. Already discussions are underway between EPOS and EGI, EUDAT, AARC and Helix-Nebula for those offerings to be considered as ICS-ds by EPOS.. Provision of access to ICS-Ds from ICS-C concerns several aspects: (a) Technical : it may be more or less difficult to connect and pass from ICS-C to the ICS-d/ CES the 'package' (probably a virtual machine) of data and software; (b) Security/privacy : including passing personal information e.g. related to AAAI (Authentication, authorization, accounting Infrastructure); (c) financial and legal : such as payment, licence conditions; Appropriate interfaces from ICS-C to ICS-d are being designed to accommodate these aspects. The Open Science Cloud is timely because it provides a framework to discuss governance and sustainability for computational resource provision as well as an effective interpretation of federated approach to HPC(High Performance Computing) -HTC (High Throughput Computing). It will be a unique opportunity to share and adopt procurement policies to provide access to computational resources for RIs. The current state of discussions and expected roadmap for the EPOS-Open Science Cloud relationship are presented.
The Human Physiome: how standards, software and innovative service infrastructures are providing the building blocks to make it achievable

PubMed Central

2016-01-01

Reconstructing and understanding the Human Physiome virtually is a complex mathematical problem, and a highly demanding computational challenge. Mathematical models spanning from the molecular level through to whole populations of individuals must be integrated, then personalized. This requires interoperability with multiple disparate and geographically separated data sources, and myriad computational software tools. Extracting and producing knowledge from such sources, even when the databases and software are readily available, is a challenging task. Despite the difficulties, researchers must frequently perform these tasks so that available knowledge can be continually integrated into the common framework required to realize the Human Physiome. Software and infrastructures that support the communities that generate these, together with their underlying standards to format, describe and interlink the corresponding data and computer models, are pivotal to the Human Physiome being realized. They provide the foundations for integrating, exchanging and re-using data and models efficiently, and correctly, while also supporting the dissemination of growing knowledge in these forms. In this paper, we explore the standards, software tooling, repositories and infrastructures that support this work, and detail what makes them vital to realizing the Human Physiome. PMID:27051515
The Human Physiome: how standards, software and innovative service infrastructures are providing the building blocks to make it achievable.

PubMed

Nickerson, David; Atalag, Koray; de Bono, Bernard; Geiger, Jörg; Goble, Carole; Hollmann, Susanne; Lonien, Joachim; Müller, Wolfgang; Regierer, Babette; Stanford, Natalie J; Golebiewski, Martin; Hunter, Peter

2016-04-06

Reconstructing and understanding the Human Physiome virtually is a complex mathematical problem, and a highly demanding computational challenge. Mathematical models spanning from the molecular level through to whole populations of individuals must be integrated, then personalized. This requires interoperability with multiple disparate and geographically separated data sources, and myriad computational software tools. Extracting and producing knowledge from such sources, even when the databases and software are readily available, is a challenging task. Despite the difficulties, researchers must frequently perform these tasks so that available knowledge can be continually integrated into the common framework required to realize the Human Physiome. Software and infrastructures that support the communities that generate these, together with their underlying standards to format, describe and interlink the corresponding data and computer models, are pivotal to the Human Physiome being realized. They provide the foundations for integrating, exchanging and re-using data and models efficiently, and correctly, while also supporting the dissemination of growing knowledge in these forms. In this paper, we explore the standards, software tooling, repositories and infrastructures that support this work, and detail what makes them vital to realizing the Human Physiome.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kurth, Thorsten; Pochinsky, Andrew; Sarje, Abhinav

Practitioners of lattice QCD/QFT have been some of the primary pioneer users of the state-of-the-art high-performance-computing systems, and contribute towards the stress tests of such new machines as soon as they become available. As with all aspects of high-performance-computing, I/O is becoming an increasingly specialized component of these systems. In order to take advantage of the latest available high-performance I/O infrastructure, to ensure reliability and backwards compatibility of data files, and to help unify the data structures used in lattice codes, we have incorporated parallel HDF5 I/O into the SciDAC supported USQCD software stack. Here we present the design andmore » implementation of this I/O framework. Our HDF5 implementation outperforms optimized QIO at the 10-20% level and leaves room for further improvement by utilizing appropriate dataset chunking.« less
Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data

PubMed Central

2013-01-01

Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made. PMID:23800020
Cyberdyn supercomputer - a tool for imaging geodinamic processes

NASA Astrophysics Data System (ADS)

Pomeran, Mihai; Manea, Vlad; Besutiu, Lucian; Zlagnean, Luminita

2014-05-01

More and more physical processes developed within the deep interior of our planet, but with significant impact on the Earth's shape and structure, become subject to numerical modelling by using high performance computing facilities. Nowadays, worldwide an increasing number of research centers decide to make use of such powerful and fast computers for simulating complex phenomena involving fluid dynamics and get deeper insight to intricate problems of Earth's evolution. With the CYBERDYN cybernetic infrastructure (CCI), the Solid Earth Dynamics Department in the Institute of Geodynamics of the Romanian Academy boldly steps into the 21st century by entering the research area of computational geodynamics. The project that made possible this advancement, has been jointly supported by EU and Romanian Government through the Structural and Cohesion Funds. It lasted for about three years, ending October 2013. CCI is basically a modern high performance Beowulf-type supercomputer (HPCC), combined with a high performance visualization cluster (HPVC) and a GeoWall. The infrastructure is mainly structured around 1344 cores and 3 TB of RAM. The high speed interconnect is provided by a Qlogic InfiniBand switch, able to transfer up to 40 Gbps. The CCI storage component is a 40 TB Panasas NAS. The operating system is Linux (CentOS). For control and maintenance, the Bright Cluster Manager package is used. The SGE job scheduler manages the job queues. CCI has been designed for a theoretical peak performance up to 11.2 TFlops. Speed tests showed that a high resolution numerical model (256 × 256 × 128 FEM elements) could be resolved with a mean computational speed of 1 time step at 30 seconds, by employing only a fraction of the computing power (20%). After passing the mandatory tests, the CCI has been involved in numerical modelling of various scenarios related to the East Carpathians tectonic and geodynamic evolution, including the Neogene magmatic activity, and the intriguing intermediate-depth seismicity within the so-called Vrancea zone. The CFD code for numerical modelling is CitcomS, a widely employed open source package specifically developed for earth sciences. Several preliminary 3D geodynamic models for simulating an assumed subduction or the effect of a mantle plume will be presented and discussed.
ERDC MSRC Resource. High Performance Computing for the Warfighter. Spring 2006

DTIC Science & Technology

2006-01-01

named Ruby, and the HP/Compaq SC45, named Emerald , continue to add their unique sparkle to the ERDC MSRC computer infrastructure. ERDC invited the...configuration on B-52H purchased additional memory for the login nodes so that this part of the solution process could be done as a preprocessing step. On...application and system services. Of the service nodes, 10 are login nodes and 23 are input/output (I/O) server nodes for the Lustre file system (i.e., the
Sandia QIS Capabilities.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Muller, Richard P.

2017-07-01

Sandia National Laboratories has developed a broad set of capabilities in quantum information science (QIS), including elements of quantum computing, quantum communications, and quantum sensing. The Sandia QIS program is built atop unique DOE investments at the laboratories, including the MESA microelectronics fabrication facility, the Center for Integrated Nanotechnologies (CINT) facilities (joint with LANL), the Ion Beam Laboratory, and ASC High Performance Computing (HPC) facilities. Sandia has invested $75 M of LDRD funding over 12 years to develop unique, differentiating capabilities that leverage these DOE infrastructure investments.
CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community

PubMed Central

Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J.; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius

2016-01-01

The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data. PMID:28785418
CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community.

PubMed

Connor, Thomas R; Loman, Nicholas J; Thompson, Simon; Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius; Sheppard, Samuel K; Pallen, Mark J

2016-09-01

The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data.
A parallel-processing approach to computing for the geographic sciences

USGS Publications Warehouse

Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Haga, Jim; Maddox, Brian; Feller, Mark

2001-01-01

The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting research into various areas, such as advanced computer architecture, algorithms to meet the processing needs for real-time image and data processing, the creation of custom datasets from seamless source data, rapid turn-around of products for emergency response, and support for computationally intense spatial and temporal modeling.
Optimization and parallelization of the thermal–hydraulic subchannel code CTF for high-fidelity multi-physics applications

DOE PAGES

Salko, Robert K.; Schmidt, Rodney C.; Avramova, Maria N.

2014-11-23

This study describes major improvements to the computational infrastructure of the CTF subchannel code so that full-core, pincell-resolved (i.e., one computational subchannel per real bundle flow channel) simulations can now be performed in much shorter run-times, either in stand-alone mode or as part of coupled-code multi-physics calculations. These improvements support the goals of the Department Of Energy Consortium for Advanced Simulation of Light Water Reactors (CASL) Energy Innovation Hub to develop high fidelity multi-physics simulation tools for nuclear energy design and analysis.
Benchmarking infrastructure for mutation text mining

PubMed Central

2014-01-01

Background Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. Results We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. Conclusion We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption. PMID:24568600
Benchmarking infrastructure for mutation text mining.

PubMed

Klein, Artjom; Riazanov, Alexandre; Hindle, Matthew M; Baker, Christopher Jo

2014-02-25

Experimental research on the automatic extraction of information about mutations from texts is greatly hindered by the lack of consensus evaluation infrastructure for the testing and benchmarking of mutation text mining systems. We propose a community-oriented annotation and benchmarking infrastructure to support development, testing, benchmarking, and comparison of mutation text mining systems. The design is based on semantic standards, where RDF is used to represent annotations, an OWL ontology provides an extensible schema for the data and SPARQL is used to compute various performance metrics, so that in many cases no programming is needed to analyze results from a text mining system. While large benchmark corpora for biological entity and relation extraction are focused mostly on genes, proteins, diseases, and species, our benchmarking infrastructure fills the gap for mutation information. The core infrastructure comprises (1) an ontology for modelling annotations, (2) SPARQL queries for computing performance metrics, and (3) a sizeable collection of manually curated documents, that can support mutation grounding and mutation impact extraction experiments. We have developed the principal infrastructure for the benchmarking of mutation text mining tasks. The use of RDF and OWL as the representation for corpora ensures extensibility. The infrastructure is suitable for out-of-the-box use in several important scenarios and is ready, in its current state, for initial community adoption.
Parallel, distributed and GPU computing technologies in single-particle electron microscopy

PubMed Central

Schmeisser, Martin; Heisen, Burkhard C.; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

2009-01-01

Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today’s technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined. PMID:19564686
Parallel, distributed and GPU computing technologies in single-particle electron microscopy.

PubMed

Schmeisser, Martin; Heisen, Burkhard C; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

2009-07-01

Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today's technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined.
Role of the ATLAS Grid Information System (AGIS) in Distributed Data Analysis and Simulation

NASA Astrophysics Data System (ADS)

Anisenkov, A. V.

2018-03-01

In modern high-energy physics experiments, particular attention is paid to the global integration of information and computing resources into a unified system for efficient storage and processing of experimental data. Annually, the ATLAS experiment performed at the Large Hadron Collider at the European Organization for Nuclear Research (CERN) produces tens of petabytes raw data from the recording electronics and several petabytes of data from the simulation system. For processing and storage of such super-large volumes of data, the computing model of the ATLAS experiment is based on heterogeneous geographically distributed computing environment, which includes the worldwide LHC computing grid (WLCG) infrastructure and is able to meet the requirements of the experiment for processing huge data sets and provide a high degree of their accessibility (hundreds of petabytes). The paper considers the ATLAS grid information system (AGIS) used by the ATLAS collaboration to describe the topology and resources of the computing infrastructure, to configure and connect the high-level software systems of computer centers, to describe and store all possible parameters, control, configuration, and other auxiliary information required for the effective operation of the ATLAS distributed computing applications and services. The role of the AGIS system in the development of a unified description of the computing resources provided by grid sites, supercomputer centers, and cloud computing into a consistent information model for the ATLAS experiment is outlined. This approach has allowed the collaboration to extend the computing capabilities of the WLCG project and integrate the supercomputers and cloud computing platforms into the software components of the production and distributed analysis workload management system (PanDA, ATLAS).
Model development, testing and experimentation in a CyberWorkstation for Brain-Machine Interface research.

PubMed

Rattanatamrong, Prapaporn; Matsunaga, Andrea; Raiturkar, Pooja; Mesa, Diego; Zhao, Ming; Mahmoudi, Babak; Digiovanna, Jack; Principe, Jose; Figueiredo, Renato; Sanchez, Justin; Fortes, Jose

2010-01-01

The CyberWorkstation (CW) is an advanced cyber-infrastructure for Brain-Machine Interface (BMI) research. It allows the development, configuration and execution of BMI computational models using high-performance computing resources. The CW's concept is implemented using a software structure in which an "experiment engine" is used to coordinate all software modules needed to capture, communicate and process brain signals and motor-control commands. A generic BMI-model template, which specifies a common interface to the CW's experiment engine, and a common communication protocol enable easy addition, removal or replacement of models without disrupting system operation. This paper reviews the essential components of the CW and shows how templates can facilitate the processes of BMI model development, testing and incorporation into the CW. It also discusses the ongoing work towards making this process infrastructure independent.
Qualifying for the Green500: Experience with the newest generation of supercomputers at LANL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yilk, Todd

The High Performance Computing Division of Los Alamos National Laboratory recently brought four new supercomputing platforms on line: Trinity with separate partitions built around the Haswell and Knights Landing CPU architectures for capability computing and Grizzly, Fire, and Ice for capacity computing applications. The power monitoring infrastructure of these machines is significantly enhanced over previous supercomputing generations at LANL and all were qualified at the highest level of the Green500 benchmark. Here, this paper discusses supercomputing at LANL, the Green500 benchmark, and notes on our experience meeting the Green500's reporting requirements.
Qualifying for the Green500: Experience with the newest generation of supercomputers at LANL

DOE PAGES

Yilk, Todd

2018-02-17

The High Performance Computing Division of Los Alamos National Laboratory recently brought four new supercomputing platforms on line: Trinity with separate partitions built around the Haswell and Knights Landing CPU architectures for capability computing and Grizzly, Fire, and Ice for capacity computing applications. The power monitoring infrastructure of these machines is significantly enhanced over previous supercomputing generations at LANL and all were qualified at the highest level of the Green500 benchmark. Here, this paper discusses supercomputing at LANL, the Green500 benchmark, and notes on our experience meeting the Green500's reporting requirements.

Testing of SWMM Model’s LID Modules

EPA Science Inventory

EPA’s Storm Water Management Model (SWMM) is a computational code heavily relied upon by industry for the simulation of wastewater and stormwater infrastructure performance to . design and build multi-billion-dollar, multi-decade infrastructure upgrades. Since the 1970’s, EPA a...
Flexible services for the support of research.

PubMed

Turilli, Matteo; Wallom, David; Williams, Chris; Gough, Steve; Curran, Neal; Tarrant, Richard; Bretherton, Dan; Powell, Andy; Johnson, Matt; Harmer, Terry; Wright, Peter; Gordon, John

2013-01-28

Cloud computing has been increasingly adopted by users and providers to promote a flexible, scalable and tailored access to computing resources. Nonetheless, the consolidation of this paradigm has uncovered some of its limitations. Initially devised by corporations with direct control over large amounts of computational resources, cloud computing is now being endorsed by organizations with limited resources or with a more articulated, less direct control over these resources. The challenge for these organizations is to leverage the benefits of cloud computing while dealing with limited and often widely distributed computing resources. This study focuses on the adoption of cloud computing by higher education institutions and addresses two main issues: flexible and on-demand access to a large amount of storage resources, and scalability across a heterogeneous set of cloud infrastructures. The proposed solutions leverage a federated approach to cloud resources in which users access multiple and largely independent cloud infrastructures through a highly customizable broker layer. This approach allows for a uniform authentication and authorization infrastructure, a fine-grained policy specification and the aggregation of accounting and monitoring. Within a loosely coupled federation of cloud infrastructures, users can access vast amount of data without copying them across cloud infrastructures and can scale their resource provisions when the local cloud resources become insufficient.
Accelerating epistasis analysis in human genetics with consumer graphics hardware.

PubMed

Sinnott-Armstrong, Nicholas A; Greene, Casey S; Cancare, Fabio; Moore, Jason H

2009-07-24

Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500. Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.
RFA Guardian: Comprehensive Simulation of Radiofrequency Ablation Treatment of Liver Tumors.

PubMed

Voglreiter, Philip; Mariappan, Panchatcharam; Pollari, Mika; Flanagan, Ronan; Blanco Sequeiros, Roberto; Portugaller, Rupert Horst; Fütterer, Jurgen; Schmalstieg, Dieter; Kolesnik, Marina; Moche, Michael

2018-01-15

The RFA Guardian is a comprehensive application for high-performance patient-specific simulation of radiofrequency ablation of liver tumors. We address a wide range of usage scenarios. These include pre-interventional planning, sampling of the parameter space for uncertainty estimation, treatment evaluation and, in the worst case, failure analysis. The RFA Guardian is the first of its kind that exhibits sufficient performance for simulating treatment outcomes during the intervention. We achieve this by combining a large number of high-performance image processing, biomechanical simulation and visualization techniques into a generalized technical workflow. Further, we wrap the feature set into a single, integrated application, which exploits all available resources of standard consumer hardware, including massively parallel computing on graphics processing units. This allows us to predict or reproduce treatment outcomes on a single personal computer with high computational performance and high accuracy. The resulting low demand for infrastructure enables easy and cost-efficient integration into the clinical routine. We present a number of evaluation cases from the clinical practice where users performed the whole technical workflow from patient-specific modeling to final validation and highlight the opportunities arising from our fast, accurate prediction techniques.
A bioinformatics knowledge discovery in text application for grid computing

PubMed Central

Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco

2009-01-01

Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities. PMID:19534749
A bioinformatics knowledge discovery in text application for grid computing.

PubMed

Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco

2009-06-16

A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lingerfelt, Eric J; Endeve, Eirik; Hui, Yawei

Improvements in scientific instrumentation allow imaging at mesoscopic to atomic length scales, many spectroscopic modes, and now--with the rise of multimodal acquisition systems and the associated processing capability--the era of multidimensional, informationally dense data sets has arrived. Technical issues in these combinatorial scientific fields are exacerbated by computational challenges best summarized as a necessity for drastic improvement in the capability to transfer, store, and analyze large volumes of data. The Bellerophon Environment for Analysis of Materials (BEAM) platform provides material scientists the capability to directly leverage the integrated computational and analytical power of High Performance Computing (HPC) to perform scalablemore » data analysis and simulation and manage uploaded data files via an intuitive, cross-platform client user interface. This framework delivers authenticated, "push-button" execution of complex user workflows that deploy data analysis algorithms and computational simulations utilizing compute-and-data cloud infrastructures and HPC environments like Titan at the Oak Ridge Leadershp Computing Facility (OLCF).« less
Editorial [Special issue on software defined networks and infrastructures, network function virtualisation, autonomous systems and network management

DOE PAGES

Biswas, Amitava; Liu, Chen; Monga, Inder; ...

2016-01-01

For last few years, there has been a tremendous growth in data traffic due to high adoption rate of mobile devices and cloud computing. Internet of things (IoT) will stimulate even further growth. This is increasing scale and complexity of telecom/internet service provider (SP) and enterprise data centre (DC) compute and network infrastructures. As a result, managing these large network-compute converged infrastructures is becoming complex and cumbersome. To cope up, network and DC operators are trying to automate network and system operations, administrations and management (OAM) functions. OAM includes all non-functional mechanisms which keep the network running.
Cloud Computing Boosts Business Intelligence of Telecommunication Industry

NASA Astrophysics Data System (ADS)

Xu, Meng; Gao, Dan; Deng, Chao; Luo, Zhiguo; Sun, Shaoling

Business Intelligence becomes an attracting topic in today's data intensive applications, especially in telecommunication industry. Meanwhile, Cloud Computing providing IT supporting Infrastructure with excellent scalability, large scale storage, and high performance becomes an effective way to implement parallel data processing and data mining algorithms. BC-PDM (Big Cloud based Parallel Data Miner) is a new MapReduce based parallel data mining platform developed by CMRI (China Mobile Research Institute) to fit the urgent requirements of business intelligence in telecommunication industry. In this paper, the architecture, functionality and performance of BC-PDM are presented, together with the experimental evaluation and case studies of its applications. The evaluation result demonstrates both the usability and the cost-effectiveness of Cloud Computing based Business Intelligence system in applications of telecommunication industry.
Big-BOE: Fusing Spanish Official Gazette with Big Data Technology.

PubMed

Basanta-Val, Pablo; Sánchez-Fernández, Luis

2018-06-01

The proliferation of new data sources, stemmed from the adoption of open-data schemes, in combination with an increasing computing capacity causes the inception of new type of analytics that process Internet of things with low-cost engines to speed up data processing using parallel computing. In this context, the article presents an initiative, called BIG-Boletín Oficial del Estado (BOE), designed to process the Spanish official government gazette (BOE) with state-of-the-art processing engines, to reduce computation time and to offer additional speed up for big data analysts. The goal of including a big data infrastructure is to be able to process different BOE documents in parallel with specific analytics, to search for several issues in different documents. The application infrastructure processing engine is described from an architectural perspective and from performance, showing evidence on how this type of infrastructure improves the performance of different types of simple analytics as several machines cooperate.
The performance of low-cost commercial cloud computing as an alternative in computational chemistry.

PubMed

Thackston, Russell; Fortenberry, Ryan C

2015-05-05

The growth of commercial cloud computing (CCC) as a viable means of computational infrastructure is largely unexplored for the purposes of quantum chemistry. In this work, the PSI4 suite of computational chemistry programs is installed on five different types of Amazon World Services CCC platforms. The performance for a set of electronically excited state single-point energies is compared between these CCC platforms and typical, "in-house" physical machines. Further considerations are made for the number of cores or virtual CPUs (vCPUs, for the CCC platforms), but no considerations are made for full parallelization of the program (even though parallelization of the BLAS library is implemented), complete high-performance computing cluster utilization, or steal time. Even with this most pessimistic view of the computations, CCC resources are shown to be more cost effective for significant numbers of typical quantum chemistry computations. Large numbers of large computations are still best utilized by more traditional means, but smaller-scale research may be more effectively undertaken through CCC services. © 2015 Wiley Periodicals, Inc.
The GÉANT network: addressing current and future needs of the HEP community

NASA Astrophysics Data System (ADS)

Capone, Vincenzo; Usman, Mian

2015-12-01

The GÉANT infrastructure is the backbone that serves the scientific communities in Europe for their data movement needs and their access to international research and education networks. Using the extensive fibre footprint and infrastructure in Europe the GÉANT network delivers a portfolio of services aimed to best fit the specific needs of the users, including Authentication and Authorization Infrastructure, end-to-end performance monitoring, advanced network services (dynamic circuits, L2-L3VPN, MD-VPN). This talk will outline the factors that help the GÉANT network to respond to the needs of the High Energy Physics community, both in Europe and worldwide. The Pan-European network provides the connectivity between 40 European national research and education networks. In addition, GÉANT also connects the European NRENs to the R&E networks in other world region and has reach to over 110 NREN worldwide, making GÉANT the best connected Research and Education network, with its multiple intercontinental links to different continents e.g. North and South America, Africa and Asia-Pacific. The High Energy Physics computational needs have always had (and will keep having) a leading role among the scientific user groups of the GÉANT network: the LHCONE overlay network has been built, in collaboration with the other big world REN, specifically to address the peculiar needs of the LHC data movement. Recently, as a result of a series of coordinated efforts, the LHCONE network has been expanded to the Asia-Pacific area, and is going to include some of the main regional R&E network in the area. The LHC community is not the only one that is actively using a distributed computing model (hence the need for a high-performance network); new communities are arising, as BELLE II. GÉANT is deeply involved also with the BELLE II Experiment, to provide full support to their distributed computing model, along with a perfSONAR-based network monitoring system. GÉANT has also coordinated the setup of the network infrastructure to perform the BELLE II Trans-Atlantic Data Challenge, and has been active on helping the BELLE II community to sort out their end-to-end performance issues. In this talk we will provide information about the current GÉANT network architecture and of the international connectivity, along with the upcoming upgrades and the planned and foreseeable improvements. We will also describe the implementation of the solutions provided to support the LHC and BELLE II experiments.
How to keep the Grid full and working with ATLAS production and physics jobs

NASA Astrophysics Data System (ADS)

Pacheco Pagés, A.; Barreiro Megino, F. H.; Cameron, D.; Fassi, F.; Filipcic, A.; Di Girolamo, A.; González de la Hoz, S.; Glushkov, I.; Maeno, T.; Walker, R.; Yang, W.; ATLAS Collaboration

2017-10-01

The ATLAS production system provides the infrastructure to process millions of events collected during the LHC Run 1 and the first two years of Run 2 using grid, clouds and high performance computing. We address in this contribution the strategies and improvements that have been implemented to the production system for optimal performance and to achieve the highest efficiency of available resources from operational perspective. We focus on the recent developments.
Climate simulations and services on HPC, Cloud and Grid infrastructures

NASA Astrophysics Data System (ADS)

Cofino, Antonio S.; Blanco, Carlos; Minondo Tshuma, Antonio

2017-04-01

Cloud, Grid and High Performance Computing have changed the accessibility and availability of computing resources for Earth Science research communities, specially for Climate community. These paradigms are modifying the way how climate applications are being executed. By using these technologies the number, variety and complexity of experiments and resources are increasing substantially. But, although computational capacity is increasing, traditional applications and tools used by the community are not good enough to manage this large volume and variety of experiments and computing resources. In this contribution, we evaluate the challenges to run climate simulations and services on Grid, Cloud and HPC infrestructures and how to tackle them. The Grid and Cloud infrastructures provided by EGI's VOs ( esr , earth.vo.ibergrid and fedcloud.egi.eu) will be evaluated, as well as HPC resources from PRACE infrastructure and institutional clusters. To solve those challenges, solutions using DRM4G framework will be shown. DRM4G provides a good framework to manage big volume and variety of computing resources for climate experiments. This work has been supported by the Spanish National R&D Plan under projects WRF4G (CGL2011-28864), INSIGNIA (CGL2016-79210-R) and MULTI-SDM (CGL2015-66583-R) ; the IS-ENES2 project from the 7FP of the European Commission (grant agreement no. 312979); the European Regional Development Fund—ERDF and the Programa de Personal Investigador en Formación Predoctoral from Universidad de Cantabria and Government of Cantabria.
Exploring Infiniband Hardware Virtualization in OpenNebula towards Efficient High-Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pais Pitta de Lacerda Ruivo, Tiago; Bernabeu Altayo, Gerard; Garzoglio, Gabriele

2014-11-11

has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an OpenNebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the FermiCloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linux’s Hypervisor as well as the OpenNebula manager. We evaluated the performance of the hardware virtualization on up to 56more » virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as w a MPI-intensive application (the HPL Linpack benchmark).« less
Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

PubMed Central

Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

2015-01-01

Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
International Symposium on Grids and Clouds (ISGC) 2014

NASA Astrophysics Data System (ADS)

The International Symposium on Grids and Clouds (ISGC) 2014 will be held at Academia Sinica in Taipei, Taiwan from 23-28 March 2014, with co-located events and workshops. The conference is hosted by the Academia Sinica Grid Computing Centre (ASGC).“Bringing the data scientist to global e-Infrastructures” is the theme of ISGC 2014. The last decade has seen the phenomenal growth in the production of data in all forms by all research communities to produce a deluge of data from which information and knowledge need to be extracted. Key to this success will be the data scientist - educated to use advanced algorithms, applications and infrastructures - collaborating internationally to tackle society’s challenges. ISGC 2014 will bring together researchers working in all aspects of data science from different disciplines around the world to collaborate and educate themselves in the latest achievements and techniques being used to tackle the data deluge. In addition to the regular workshops, technical presentations and plenary keynotes, ISGC this year will focus on how to grow the data science community by considering the educational foundation needed for tomorrow’s data scientist. Topics of discussion include Physics (including HEP) and Engineering Applications, Biomedicine & Life Sciences Applications, Earth & Environmental Sciences & Biodiversity Applications, Humanities & Social Sciences Application, Virtual Research Environment (including Middleware, tools, services, workflow, ... etc.), Data Management, Big Data, Infrastructure & Operations Management, Infrastructure Clouds and Virtualisation, Interoperability, Business Models & Sustainability, Highly Distributed Computing Systems, and High Performance & Technical Computing (HPTC).
LXtoo: an integrated live Linux distribution for the bioinformatics community

PubMed Central

2012-01-01

Background Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Findings Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. Conclusions LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo. PMID:22813356
LXtoo: an integrated live Linux distribution for the bioinformatics community.

PubMed

Yu, Guangchuang; Wang, Li-Gen; Meng, Xiao-Hua; He, Qing-Yu

2012-07-19

Recent advances in high-throughput technologies dramatically increase biological data generation. However, many research groups lack computing facilities and specialists. This is an obstacle that remains to be addressed. Here, we present a Linux distribution, LXtoo, to provide a flexible computing platform for bioinformatics analysis. Unlike most of the existing live Linux distributions for bioinformatics limiting their usage to sequence analysis and protein structure prediction, LXtoo incorporates a comprehensive collection of bioinformatics software, including data mining tools for microarray and proteomics, protein-protein interaction analysis, and computationally complex tasks like molecular dynamics. Moreover, most of the programs have been configured and optimized for high performance computing. LXtoo aims to provide well-supported computing environment tailored for bioinformatics research, reducing duplication of efforts in building computing infrastructure. LXtoo is distributed as a Live DVD and freely available at http://bioinformatics.jnu.edu.cn/LXtoo.
1001 Ways to run AutoDock Vina for virtual screening

NASA Astrophysics Data System (ADS)

Jaghoori, Mohammad Mahdi; Bleijlevens, Boris; Olabarriaga, Silvia D.

2016-03-01

Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.

1001 Ways to run AutoDock Vina for virtual screening.

PubMed

Jaghoori, Mohammad Mahdi; Bleijlevens, Boris; Olabarriaga, Silvia D

2016-03-01

Large-scale computing technologies have enabled high-throughput virtual screening involving thousands to millions of drug candidates. It is not trivial, however, for biochemical scientists to evaluate the technical alternatives and their implications for running such large experiments. Besides experience with the molecular docking tool itself, the scientist needs to learn how to run it on high-performance computing (HPC) infrastructures, and understand the impact of the choices made. Here, we review such considerations for a specific tool, AutoDock Vina, and use experimental data to illustrate the following points: (1) an additional level of parallelization increases virtual screening throughput on a multi-core machine; (2) capturing of the random seed is not enough (though necessary) for reproducibility on heterogeneous distributed computing systems; (3) the overall time spent on the screening of a ligand library can be improved by analysis of factors affecting execution time per ligand, including number of active torsions, heavy atoms and exhaustiveness. We also illustrate differences among four common HPC infrastructures: grid, Hadoop, small cluster and multi-core (virtual machine on the cloud). Our analysis shows that these platforms are suitable for screening experiments of different sizes. These considerations can guide scientists when choosing the best computing platform and set-up for their future large virtual screening experiments.
High-reliability computing for the smarter planet

DOE Office of Scientific and Technical Information (OSTI.GOV)

Quinn, Heather M; Graham, Paul; Manuzzato, Andrea

2010-01-01

The geometric rate of improvement of transistor size and integrated circuit performance, known as Moore's Law, has been an engine of growth for our economy, enabling new products and services, creating new value and wealth, increasing safety, and removing menial tasks from our daily lives. Affordable, highly integrated components have enabled both life-saving technologies and rich entertainment applications. Anti-lock brakes, insulin monitors, and GPS-enabled emergency response systems save lives. Cell phones, internet appliances, virtual worlds, realistic video games, and mp3 players enrich our lives and connect us together. Over the past 40 years of silicon scaling, the increasing capabilities ofmore » inexpensive computation have transformed our society through automation and ubiquitous communications. In this paper, we will present the concept of the smarter planet, how reliability failures affect current systems, and methods that can be used to increase the reliable adoption of new automation in the future. We will illustrate these issues using a number of different electronic devices in a couple of different scenarios. Recently IBM has been presenting the idea of a 'smarter planet.' In smarter planet documents, IBM discusses increased computer automation of roadways, banking, healthcare, and infrastructure, as automation could create more efficient systems. A necessary component of the smarter planet concept is to ensure that these new systems have very high reliability. Even extremely rare reliability problems can easily escalate to problematic scenarios when implemented at very large scales. For life-critical systems, such as automobiles, infrastructure, medical implantables, and avionic systems, unmitigated failures could be dangerous. As more automation moves into these types of critical systems, reliability failures will need to be managed. As computer automation continues to increase in our society, the need for greater radiation reliability is necessary. Already critical infrastructure is failing too frequently. In this paper, we will introduce the Cross-Layer Reliability concept for designing more reliable computer systems.« less
Sampling Approaches for Multi-Domain Internet Performance Measurement Infrastructures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Calyam, Prasad

2014-09-15

The next-generation of high-performance networks being developed in DOE communities are critical for supporting current and emerging data-intensive science applications. The goal of this project is to investigate multi-domain network status sampling techniques and tools to measure/analyze performance, and thereby provide “network awareness” to end-users and network operators in DOE communities. We leverage the infrastructure and datasets available through perfSONAR, which is a multi-domain measurement framework that has been widely deployed in high-performance computing and networking communities; the DOE community is a core developer and the largest adopter of perfSONAR. Our investigations include development of semantic scheduling algorithms, measurement federationmore » policies, and tools to sample multi-domain and multi-layer network status within perfSONAR deployments. We validate our algorithms and policies with end-to-end measurement analysis tools for various monitoring objectives such as network weather forecasting, anomaly detection, and fault-diagnosis. In addition, we develop a multi-domain architecture for an enterprise-specific perfSONAR deployment that can implement monitoring-objective based sampling and that adheres to any domain-specific measurement policies.« less
Development of a telemedicine model for emerging countries: a case study on pediatric oncology in Brazil.

PubMed

Hira, A Y; Nebel de Mello, A; Faria, R A; Odone Filho, V; Lopes, R D; Zuffo, M K

2006-01-01

This article discusses a telemedicine model for emerging countries, through the description of ONCONET, a telemedicine initiative applied to pediatric oncology in Brazil. The ONCONET core technology is a Web-based system that offers health information and other services specialized in childhood cancer such as electronic medical records and cooperative protocols for complex treatments. All Web-based services are supported by the use of high performance computing infrastructure based on clusters of commodity computers. The system was fully implemented on an open-source and free-software approach. Aspects of modeling, implementation and integration are covered. A model, both technologically and economically viable, was created through the research and development of in-house solutions adapted to the emerging countries reality and with focus on scalability both in the total number of patients and in the national infrastructure.
Sustainable Cooperative Robotic Technologies for Human and Robotic Outpost Infrastructure Construction and Maintenance

NASA Technical Reports Server (NTRS)

Stroupe, Ashley W.; Okon, Avi; Robinson, Matthew; Huntsberger, Terry; Aghazarian, Hrand; Baumgartner, Eric

2004-01-01

Robotic Construction Crew (RCC) is a heterogeneous multi-robot system for autonomous acquisition, transport, and precision mating of components in construction tasks. RCC minimizes resources constrained in a space environment such as computation, power, communication and, sensing. A behavior-based architecture provides adaptability and robustness despite low computational requirements. RCC successfully performs several construction related tasks in an emulated outdoor environment despite high levels of uncertainty in motions and sensing. Quantitative results are provided for formation keeping in component transport, precision instrument placement, and construction tasks.
Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers.

PubMed

Jordan, Jakob; Ippen, Tammo; Helias, Moritz; Kitayama, Itaru; Sato, Mitsuhisa; Igarashi, Jun; Diesmann, Markus; Kunkel, Susanne

2018-01-01

State-of-the-art software tools for neuronal network simulations scale to the largest computing systems available today and enable investigations of large-scale networks of up to 10 % of the human cortex at a resolution of individual neurons and synapses. Due to an upper limit on the number of incoming connections of a single neuron, network connectivity becomes extremely sparse at this scale. To manage computational costs, simulation software ultimately targeting the brain scale needs to fully exploit this sparsity. Here we present a two-tier connection infrastructure and a framework for directed communication among compute nodes accounting for the sparsity of brain-scale networks. We demonstrate the feasibility of this approach by implementing the technology in the NEST simulation code and we investigate its performance in different scaling scenarios of typical network simulations. Our results show that the new data structures and communication scheme prepare the simulation kernel for post-petascale high-performance computing facilities without sacrificing performance in smaller systems.
Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers

PubMed Central

Jordan, Jakob; Ippen, Tammo; Helias, Moritz; Kitayama, Itaru; Sato, Mitsuhisa; Igarashi, Jun; Diesmann, Markus; Kunkel, Susanne

2018-01-01

State-of-the-art software tools for neuronal network simulations scale to the largest computing systems available today and enable investigations of large-scale networks of up to 10 % of the human cortex at a resolution of individual neurons and synapses. Due to an upper limit on the number of incoming connections of a single neuron, network connectivity becomes extremely sparse at this scale. To manage computational costs, simulation software ultimately targeting the brain scale needs to fully exploit this sparsity. Here we present a two-tier connection infrastructure and a framework for directed communication among compute nodes accounting for the sparsity of brain-scale networks. We demonstrate the feasibility of this approach by implementing the technology in the NEST simulation code and we investigate its performance in different scaling scenarios of typical network simulations. Our results show that the new data structures and communication scheme prepare the simulation kernel for post-petascale high-performance computing facilities without sacrificing performance in smaller systems. PMID:29503613
Computational biomedicine: a challenge for the twenty-first century.

PubMed

Coveney, Peter V; Shublaq, Nour W

2012-01-01

With the relentless increase of computer power and the widespread availability of digital patient-specific medical data, we are now entering an era when it is becoming possible to develop predictive models of human disease and pathology, which can be used to support and enhance clinical decision-making. The approach amounts to a grand challenge to computational science insofar as we need to be able to provide seamless yet secure access to large scale heterogeneous personal healthcare data in a facile way, typically integrated into complex workflows-some parts of which may need to be run on high performance computers-in a facile way that is integrated into clinical decision support software. In this paper, we review the state of the art in terms of case studies drawn from neurovascular pathologies and HIV/AIDS. These studies are representative of a large number of projects currently being performed within the Virtual Physiological Human initiative. They make demands of information technology at many scales, from the desktop to national and international infrastructures for data storage and processing, linked by high performance networks.
Dynamic VM Provisioning for TORQUE in a Cloud Environment

NASA Astrophysics Data System (ADS)

Zhang, S.; Boland, L.; Coddington, P.; Sevior, M.

2014-06-01

Cloud computing, also known as an Infrastructure-as-a-Service (IaaS), is attracting more interest from the commercial and educational sectors as a way to provide cost-effective computational infrastructure. It is an ideal platform for researchers who must share common resources but need to be able to scale up to massive computational requirements for specific periods of time. This paper presents the tools and techniques developed to allow the open source TORQUE distributed resource manager and Maui cluster scheduler to dynamically integrate OpenStack cloud resources into existing high throughput computing clusters.
Final Report Scalable Analysis Methods and In Situ Infrastructure for Extreme Scale Knowledge Discovery

DOE Office of Scientific and Technical Information (OSTI.GOV)

O'Leary, Patrick

The primary challenge motivating this project is the widening gap between the ability to compute information and to store it for subsequent analysis. This gap adversely impacts science code teams, who can perform analysis only on a small fraction of the data they calculate, resulting in the substantial likelihood of lost or missed science, when results are computed but not analyzed. Our approach is to perform as much analysis or visualization processing on data while it is still resident in memory, which is known as in situ processing. The idea in situ processing was not new at the time ofmore » the start of this effort in 2014, but efforts in that space were largely ad hoc, and there was no concerted effort within the research community that aimed to foster production-quality software tools suitable for use by Department of Energy (DOE) science projects. Our objective was to produce and enable the use of production-quality in situ methods and infrastructure, at scale, on DOE high-performance computing (HPC) facilities, though we expected to have an impact beyond DOE due to the widespread nature of the challenges, which affect virtually all large-scale computational science efforts. To achieve this objective, we engaged in software technology research and development (R&D), in close partnerships with DOE science code teams, to produce software technologies that were shown to run efficiently at scale on DOE HPC platforms.« less
ATLAS computing on Swiss Cloud SWITCHengines

NASA Astrophysics Data System (ADS)

Haug, S.; Sciacca, F. G.; ATLAS Collaboration

2017-10-01

Consolidation towards more computing at flat budgets beyond what pure chip technology can offer, is a requirement for the full scientific exploitation of the future data from the Large Hadron Collider at CERN in Geneva. One consolidation measure is to exploit cloud infrastructures whenever they are financially competitive. We report on the technical solutions and the performances used and achieved running simulation tasks for the ATLAS experiment on SWITCHengines. SWITCHengines is a new infrastructure as a service offered to Swiss academia by the National Research and Education Network SWITCH. While solutions and performances are general, financial considerations and policies, on which we also report, are country specific.
ENES the European Network for Earth System modelling and its infrastructure projects IS-ENES

NASA Astrophysics Data System (ADS)

Guglielmo, Francesca; Joussaume, Sylvie; Parinet, Marie

2016-04-01

The scientific community working on climate modelling is organized within the European Network for Earth System modelling (ENES). In the past decade, several European university departments, research centres, meteorological services, computer centres, and industrial partners engaged in the creation of ENES with the purpose of working together and cooperating towards the further development of the network, by signing a Memorandum of Understanding. As of 2015, the consortium counts 47 partners. The climate modelling community, and thus ENES, faces challenges which are both science-driven, i.e. analysing of the full complexity of the Earth System to improve our understanding and prediction of climate changes, and have multi-faceted societal implications, as a better representation of climate change on regional scales leads to improved understanding and prediction of impacts and to the development and provision of climate services. ENES, promoting and endorsing projects and initiatives, helps in developing and evaluating of state-of-the-art climate and Earth system models, facilitates model inter-comparison studies, encourages exchanges of software and model results, and fosters the use of high performance computing facilities dedicated to high-resolution multi-model experiments. ENES brings together public and private partners, integrates countries underrepresented in climate modelling studies, and reaches out to different user communities, thus enhancing European expertise and competitiveness. In this need of sophisticated models, world-class, high-performance computers, and state-of-the-art software solutions to make efficient use of models, data and hardware, a key role is played by the constitution and maintenance of a solid infrastructure, developing and providing services to the different user communities. ENES has investigated the infrastructural needs and has received funding from the EU FP7 program for the IS-ENES (InfraStructure for ENES) phase I and II projects. We present here the case study of an existing network of institutions brought together toward common goals by a non-binding agreement, ENES, and of its two IS-ENES projects. These latter will be discussed in their double role as a means to provide and/or maintain the actual infrastructure (hardware, software, skilled human resources, services) to achieve ENES scientific goals -fulfilling the aims set in a strategy document-, but also to inform and provide to the network a structured way of working and of interacting with the extended community. The genesis and evolution of the network and the interaction network/projects will also be analysed in terms of long-term sustainability.
Collaboratively Architecting a Scalable and Adaptable Petascale Infrastructure to Support Transdisciplinary Scientific Research for the Australian Earth and Environmental Sciences

NASA Astrophysics Data System (ADS)

Wyborn, L. A.; Evans, B. J. K.; Pugh, T.; Lescinsky, D. T.; Foster, C.; Uhlherr, A.

2014-12-01

The National Computational Infrastructure (NCI) at the Australian National University (ANU) is a partnership between CSIRO, ANU, Bureau of Meteorology (BoM) and Geoscience Australia. Recent investments in a 1.2 PFlop Supercomputer (Raijin), ~ 20 PB data storage using Lustre filesystems and a 3000 core high performance cloud have created a hybrid platform for higher performance computing and data-intensive science to enable large scale earth and climate systems modelling and analysis. There are > 3000 users actively logging in and > 600 projects on the NCI system. Efficiently scaling and adapting data and software systems to petascale infrastructures requires the collaborative development of an architecture that is designed, programmed and operated to enable users to interactively invoke different forms of in-situ computation over complex and large scale data collections. NCI makes available major and long tail data collections from both the government and research sectors based on six themes: 1) weather, climate and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology and 6) astronomy, bio and social. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. Collections are the operational form for data management and access. Similar data types from individual custodians are managed cohesively. Use of international standards for discovery and interoperability allow complex interactions within and between the collections. This design facilitates a transdisciplinary approach to research and enables a shift from small scale, 'stove-piped' science efforts to large scale, collaborative systems science. This new and complex infrastructure requires a move to shared, globally trusted software frameworks that can be maintained and updated. Workflow engines become essential and need to integrate provenance, versioning, traceability, repeatability and publication. There are also human resource challenges as highly skilled HPC/HPD specialists, specialist programmers, and data scientists are required whose skills can support scaling to the new paradigm of effective and efficient data-intensive earth science analytics on petascale, and soon to be exascale systems.
The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data Collections

NASA Astrophysics Data System (ADS)

Evans, B. J. K.; Pugh, T.; Wyborn, L. A.; Porter, D.; Allen, C.; Smillie, J.; Antony, J.; Trenham, C.; Evans, B. J.; Beckett, D.; Erwin, T.; King, E.; Hodge, J.; Woodcock, R.; Fraser, R.; Lescinsky, D. T.

2014-12-01

The National Computational Infrastructure (NCI) has co-located a priority set of national data assets within a HPC research platform. This powerful in-situ computational platform has been created to help serve and analyse the massive amounts of data across the spectrum of environmental collections - in particular the climate, observational data and geoscientific domains. This paper examines the infrastructure, innovation and opportunity for this significant research platform. NCI currently manages nationally significant data collections (10+ PB) categorised as 1) earth system sciences, climate and weather model data assets and products, 2) earth and marine observations and products, 3) geosciences, 4) terrestrial ecosystem, 5) water management and hydrology, and 6) astronomy, social science and biosciences. The data is largely sourced from the NCI partners (who include the custodians of many of the national scientific records), major research communities, and collaborating overseas organisations. By co-locating these large valuable data assets, new opportunities have arisen by harmonising the data collections, making a powerful transdisciplinary research platformThe data is accessible within an integrated HPC-HPD environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large scale and high-bandwidth Lustre filesystems. New scientific software, cloud-scale techniques, server-side visualisation and data services have been harnessed and integrated into the platform, so that analysis is performed seamlessly across the traditional boundaries of the underlying data domains. Characterisation of the techniques along with performance profiling ensures scalability of each software component, all of which can either be enhanced or replaced through future improvements. A Development-to-Operations (DevOps) framework has also been implemented to manage the scale of the software complexity alone. This ensures that software is both upgradable and maintainable, and can be readily reused with complexly integrated systems and become part of the growing global trusted community tools for cross-disciplinary research.
High-throughput neuroimaging-genetics computational infrastructure

PubMed Central

Dinov, Ivo D.; Petrosyan, Petros; Liu, Zhizhong; Eggert, Paul; Hobel, Sam; Vespa, Paul; Woo Moon, Seok; Van Horn, John D.; Franco, Joseph; Toga, Arthur W.

2014-01-01

Many contemporary neuroscientific investigations face significant challenges in terms of data management, computational processing, data mining, and results interpretation. These four pillars define the core infrastructure necessary to plan, organize, orchestrate, validate, and disseminate novel scientific methods, computational resources, and translational healthcare findings. Data management includes protocols for data acquisition, archival, query, transfer, retrieval, and aggregation. Computational processing involves the necessary software, hardware, and networking infrastructure required to handle large amounts of heterogeneous neuroimaging, genetics, clinical, and phenotypic data and meta-data. Data mining refers to the process of automatically extracting data features, characteristics and associations, which are not readily visible by human exploration of the raw dataset. Result interpretation includes scientific visualization, community validation of findings and reproducible findings. In this manuscript we describe the novel high-throughput neuroimaging-genetics computational infrastructure available at the Institute for Neuroimaging and Informatics (INI) and the Laboratory of Neuro Imaging (LONI) at University of Southern California (USC). INI and LONI include ultra-high-field and standard-field MRI brain scanners along with an imaging-genetics database for storing the complete provenance of the raw and derived data and meta-data. In addition, the institute provides a large number of software tools for image and shape analysis, mathematical modeling, genomic sequence processing, and scientific visualization. A unique feature of this architecture is the Pipeline environment, which integrates the data management, processing, transfer, and visualization. Through its client-server architecture, the Pipeline environment provides a graphical user interface for designing, executing, monitoring validating, and disseminating of complex protocols that utilize diverse suites of software tools and web-services. These pipeline workflows are represented as portable XML objects which transfer the execution instructions and user specifications from the client user machine to remote pipeline servers for distributed computing. Using Alzheimer's and Parkinson's data, we provide several examples of translational applications using this infrastructure1. PMID:24795619
HPCC and the National Information Infrastructure: an overview.

PubMed Central

Lindberg, D A

1995-01-01

The National Information Infrastructure (NII) or "information superhighway" is a high-priority federal initiative to combine communications networks, computers, databases, and consumer electronics to deliver information services to all U.S. citizens. The NII will be used to improve government and social services while cutting administrative costs. Operated by the private sector, the NII will rely on advanced technologies developed under the direction of the federal High Performance Computing and Communications (HPCC) Program. These include computing systems capable of performing trillions of operations (teraops) per second and networks capable of transmitting billions of bits (gigabits) per second. Among other activities, the HPCC Program supports the national supercomputer research centers, the federal portion of the Internet, and the development of interface software, such as Mosaic, that facilitates access to network information services. Health care has been identified as a critical demonstration area for HPCC technology and an important application area for the NII. As an HPCC participant, the National Library of Medicine (NLM) assists hospitals and medical centers to connect to the Internet through projects directed by the Regional Medical Libraries and through an Internet Connections Program cosponsored by the National Science Foundation. In addition to using the Internet to provide enhanced access to its own information services, NLM sponsors health-related applications of HPCC technology. Examples include the "Visible Human" project and recently awarded contracts for test-bed networks to share patient data and medical images, telemedicine projects to provide consultation and medical care to patients in rural areas, and advanced computer simulations of human anatomy for training in "virtual surgery." PMID:7703935
Cloud Computing with iPlant Atmosphere.

PubMed

McKay, Sheldon J; Skidmore, Edwin J; LaRose, Christopher J; Mercer, Andre W; Noutsos, Christos

2013-10-15

Cloud Computing refers to distributed computing platforms that use virtualization software to provide easy access to physical computing infrastructure and data storage, typically administered through a Web interface. Cloud-based computing provides access to powerful servers, with specific software and virtual hardware configurations, while eliminating the initial capital cost of expensive computers and reducing the ongoing operating costs of system administration, maintenance contracts, power consumption, and cooling. This eliminates a significant barrier to entry into bioinformatics and high-performance computing for many researchers. This is especially true of free or modestly priced cloud computing services. The iPlant Collaborative offers a free cloud computing service, Atmosphere, which allows users to easily create and use instances on virtual servers preconfigured for their analytical needs. Atmosphere is a self-service, on-demand platform for scientific computing. This unit demonstrates how to set up, access and use cloud computing in Atmosphere. Copyright © 2013 John Wiley & Sons, Inc.
VERA 3.6 Release Notes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williamson, Richard L.; Kochunas, Brendan; Adams, Brian M.

The Virtual Environment for Reactor Applications components included in this distribution include selected computational tools and supporting infrastructure that solve neutronics, thermal-hydraulics, fuel performance, and coupled neutronics-thermal hydraulics problems. The infrastructure components provide a simplified common user input capability and provide for the physics integration with data transfer and coupled-physics iterative solution algorithms.
A Primer on High-Throughput Computing for Genomic Selection

PubMed Central

Wu, Xiao-Lin; Beissinger, Timothy M.; Bauck, Stewart; Woodward, Brent; Rosa, Guilherme J. M.; Weigel, Kent A.; Gatti, Natalia de Leon; Gianola, Daniel

2011-01-01

High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. PMID:22303303
Cloud Computing for Complex Performance Codes.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Appel, Gordon John; Hadgu, Teklu; Klein, Brandon Thorin

This report describes the use of cloud computing services for running complex public domain performance assessment problems. The work consisted of two phases: Phase 1 was to demonstrate complex codes, on several differently configured servers, could run and compute trivial small scale problems in a commercial cloud infrastructure. Phase 2 focused on proving non-trivial large scale problems could be computed in the commercial cloud environment. The cloud computing effort was successfully applied using codes of interest to the geohydrology and nuclear waste disposal modeling community.

Transformation of OODT CAS to Perform Larger Tasks

NASA Technical Reports Server (NTRS)

Mattmann, Chris; Freeborn, Dana; Crichton, Daniel; Hughes, John; Ramirez, Paul; Hardman, Sean; Woollard, David; Kelly, Sean

2008-01-01

A computer program denoted OODT CAS has been transformed to enable performance of larger tasks that involve greatly increased data volumes and increasingly intensive processing of data on heterogeneous, geographically dispersed computers. Prior to the transformation, OODT CAS (also alternatively denoted, simply, 'CAS') [wherein 'OODT' signifies 'Object-Oriented Data Technology' and 'CAS' signifies 'Catalog and Archive Service'] was a proven software component used to manage scientific data from spaceflight missions. In the transformation, CAS was split into two separate components representing its canonical capabilities: file management and workflow management. In addition, CAS was augmented by addition of a resource-management component. This third component enables CAS to manage heterogeneous computing by use of diverse resources, including high-performance clusters of computers, commodity computing hardware, and grid computing infrastructures. CAS is now more easily maintainable, evolvable, and reusable. These components can be used separately or, taking advantage of synergies, can be used together. Other elements of the transformation included addition of a separate Web presentation layer that supports distribution of data products via Really Simple Syndication (RSS) feeds, and provision for full Resource Description Framework (RDF) exports of metadata.
Development of Measures to Assess Product Modularity and Reconfigurability

DTIC Science & Technology

2010-03-01

mission needs. For example, a thermal blanket is the only “module” currently being used to control spacecraft temperature (i.e. no active cooling). If...infrastructure, and thermal control. The spacecraft components include the autonomous flight software; the quantity of high- performance computing; power... thermal requirements are satisfied using this thermal blanket , then there may not be a need for active cooling to improve the thermal range of the
Self-service for software development projects and HPC activities

NASA Astrophysics Data System (ADS)

Husejko, M.; Høimyr, N.; Gonzalez, A.; Koloventzos, G.; Asbury, D.; Trzcinska, A.; Agtzidis, I.; Botrel, G.; Otto, J.

2014-05-01

This contribution describes how CERN has implemented several essential tools for agile software development processes, ranging from version control (Git) to issue tracking (Jira) and documentation (Wikis). Running such services in a large organisation like CERN requires many administrative actions both by users and service providers, such as creating software projects, managing access rights, users and groups, and performing tool-specific customisation. Dealing with these requests manually would be a time-consuming task. Another area of our CERN computing services that has required dedicated manual support has been clusters for specific user communities with special needs. Our aim is to move all our services to a layered approach, with server infrastructure running on the internal cloud computing infrastructure at CERN. This contribution illustrates how we plan to optimise the management of our of services by means of an end-user facing platform acting as a portal into all the related services for software projects, inspired by popular portals for open-source developments such as Sourceforge, GitHub and others. Furthermore, the contribution will discuss recent activities with tests and evaluations of High Performance Computing (HPC) applications on different hardware and software stacks, and plans to offer a dynamically scalable HPC service at CERN, based on affordable hardware.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure.

PubMed

Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei

2011-09-07

Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed.
Spatial data analytics on heterogeneous multi- and many-core parallel architectures using python

USGS Publications Warehouse

Laura, Jason R.; Rey, Sergio J.

2017-01-01

Parallel vector spatial analysis concerns the application of parallel computational methods to facilitate vector-based spatial analysis. The history of parallel computation in spatial analysis is reviewed, and this work is placed into the broader context of high-performance computing (HPC) and parallelization research. The rise of cyber infrastructure and its manifestation in spatial analysis as CyberGIScience is seen as a main driver of renewed interest in parallel computation in the spatial sciences. Key problems in spatial analysis that have been the focus of parallel computing are covered. Chief among these are spatial optimization problems, computational geometric problems including polygonization and spatial contiguity detection, the use of Monte Carlo Markov chain simulation in spatial statistics, and parallel implementations of spatial econometric methods. Future directions for research on parallelization in computational spatial analysis are outlined.
Lowering the barriers to computational modeling of Earth's surface: coupling Jupyter Notebooks with Landlab, HydroShare, and CyberGIS for research and education.

NASA Astrophysics Data System (ADS)

Bandaragoda, C.; Castronova, A. M.; Phuong, J.; Istanbulluoglu, E.; Strauch, R. L.; Nudurupati, S. S.; Tarboton, D. G.; Wang, S. W.; Yin, D.; Barnhart, K. R.; Tucker, G. E.; Hutton, E.; Hobley, D. E. J.; Gasparini, N. M.; Adams, J. M.

2017-12-01

The ability to test hypotheses about hydrology, geomorphology and atmospheric processes is invaluable to research in the era of big data. Although community resources are available, there remain significant educational, logistical and time investment barriers to their use. Knowledge infrastructure is an emerging intellectual framework to understand how people are creating, sharing and distributing knowledge - which has been dramatically transformed by Internet technologies. In addition to the technical and social components in a cyberinfrastructure system, knowledge infrastructure considers educational, institutional, and open source governance components required to advance knowledge. We are designing an infrastructure environment that lowers common barriers to reproducing modeling experiments for earth surface investigation. Landlab is an open-source modeling toolkit for building, coupling, and exploring two-dimensional numerical models. HydroShare is an online collaborative environment for sharing hydrologic data and models. CyberGIS-Jupyter is an innovative cyberGIS framework for achieving data-intensive, reproducible, and scalable geospatial analytics using the Jupyter Notebook based on ROGER - the first cyberGIS supercomputer, so that models that can be elastically reproduced through cloud computing approaches. Our team of geomorphologists, hydrologists, and computer geoscientists has created a new infrastructure environment that combines these three pieces of software to enable knowledge discovery. Through this novel integration, any user can interactively execute and explore their shared data and model resources. Landlab on HydroShare with CyberGIS-Jupyter supports the modeling continuum from fully developed modelling applications, prototyping new science tools, hands on research demonstrations for training workshops, and classroom applications. Computational geospatial models based on big data and high performance computing can now be more efficiently developed, improved, scaled, and seamlessly reproduced among multidisciplinary users, thereby expanding the active learning curriculum and research opportunities for students in earth surface modeling and informatics.
Infrastructure sensing.

PubMed

Soga, Kenichi; Schooling, Jennifer

2016-08-06

Design, construction, maintenance and upgrading of civil engineering infrastructure requires fresh thinking to minimize use of materials, energy and labour. This can only be achieved by understanding the performance of the infrastructure, both during its construction and throughout its design life, through innovative monitoring. Advances in sensor systems offer intriguing possibilities to radically alter methods of condition assessment and monitoring of infrastructure. In this paper, it is hypothesized that the future of infrastructure relies on smarter information; the rich information obtained from embedded sensors within infrastructure will act as a catalyst for new design, construction, operation and maintenance processes for integrated infrastructure systems linked directly with user behaviour patterns. Some examples of emerging sensor technologies for infrastructure sensing are given. They include distributed fibre-optics sensors, computer vision, wireless sensor networks, low-power micro-electromechanical systems, energy harvesting and citizens as sensors.
Infrastructure sensing

PubMed Central

Soga, Kenichi; Schooling, Jennifer

2016-01-01

Design, construction, maintenance and upgrading of civil engineering infrastructure requires fresh thinking to minimize use of materials, energy and labour. This can only be achieved by understanding the performance of the infrastructure, both during its construction and throughout its design life, through innovative monitoring. Advances in sensor systems offer intriguing possibilities to radically alter methods of condition assessment and monitoring of infrastructure. In this paper, it is hypothesized that the future of infrastructure relies on smarter information; the rich information obtained from embedded sensors within infrastructure will act as a catalyst for new design, construction, operation and maintenance processes for integrated infrastructure systems linked directly with user behaviour patterns. Some examples of emerging sensor technologies for infrastructure sensing are given. They include distributed fibre-optics sensors, computer vision, wireless sensor networks, low-power micro-electromechanical systems, energy harvesting and citizens as sensors. PMID:27499845
Computational Infrastructure for Engine Structural Performance Simulation

NASA Technical Reports Server (NTRS)

Chamis, Christos C.

1997-01-01

Select computer codes developed over the years to simulate specific aspects of engine structures are described. These codes include blade impact integrated multidisciplinary analysis and optimization, progressive structural fracture, quantification of uncertainties for structural reliability and risk, benefits estimation of new technology insertion and hierarchical simulation of engine structures made from metal matrix and ceramic matrix composites. Collectively these codes constitute a unique infrastructure readiness to credibly evaluate new and future engine structural concepts throughout the development cycle from initial concept, to design and fabrication, to service performance and maintenance and repairs, and to retirement for cause and even to possible recycling. Stated differently, they provide 'virtual' concurrent engineering for engine structures total-life-cycle-cost.
High Performance Computing and Visualization Infrastructure for Simultaneous Parallel Computing and Parallel Visualization Research

DTIC Science & Technology

2016-11-09

Total Number: Sub Contractors (DD882) Names of Personnel receiving masters degrees Names of personnel receiving PHDs Names of other research staff...Broadcom 5720 QP 1Gb Network Daughter Card (2) Intel Xeon E5-2680 v3 2.5GHz, 30M Cache, 9.60GT/s QPI, Turbo, HT , 12C/24T (120W...Broadcom 5720 QP 1Gb Network Daughter Card (2) Intel Xeon E5-2680 v3 2.5GHz, 30M Cache, 9.60GT/s QPI, Turbo, HT , 12C/24T (120W
First results from a combined analysis of CERN computing infrastructure metrics

NASA Astrophysics Data System (ADS)

Duellmann, Dirk; Nieke, Christian

2017-10-01

The IT Analysis Working Group (AWG) has been formed at CERN across individual computing units and the experiments to attempt a cross cutting analysis of computing infrastructure and application metrics. In this presentation we will describe the first results obtained using medium/long term data (1 months — 1 year) correlating box level metrics, job level metrics from LSF and HTCondor, IO metrics from the physics analysis disk pools (EOS) and networking and application level metrics from the experiment dashboards. We will cover in particular the measurement of hardware performance and prediction of job duration, the latency sensitivity of different job types and a search for bottlenecks with the production job mix in the current infrastructure. The presentation will conclude with the proposal of a small set of metrics to simplify drawing conclusions also in the more constrained environment of public cloud deployments.
Infrastructure Systems for Advanced Computing in E-science applications

NASA Astrophysics Data System (ADS)

Terzo, Olivier

2013-04-01

In the e-science field are growing needs for having computing infrastructure more dynamic and customizable with a model of use "on demand" that follow the exact request in term of resources and storage capacities. The integration of grid and cloud infrastructure solutions allows us to offer services that can adapt the availability in terms of up scaling and downscaling resources. The main challenges for e-sciences domains will on implement infrastructure solutions for scientific computing that allow to adapt dynamically the demands of computing resources with a strong emphasis on optimizing the use of computing resources for reducing costs of investments. Instrumentation, data volumes, algorithms, analysis contribute to increase the complexity for applications who require high processing power and storage for a limited time and often exceeds the computational resources that equip the majority of laboratories, research Unit in an organization. Very often it is necessary to adapt or even tweak rethink tools, algorithms, and consolidate existing applications through a phase of reverse engineering in order to adapt them to a deployment on Cloud infrastructure. For example, in areas such as rainfall monitoring, meteorological analysis, Hydrometeorology, Climatology Bioinformatics Next Generation Sequencing, Computational Electromagnetic, Radio occultation, the complexity of the analysis raises several issues such as the processing time, the scheduling of tasks of processing, storage of results, a multi users environment. For these reasons, it is necessary to rethink the writing model of E-Science applications in order to be already adapted to exploit the potentiality of cloud computing services through the uses of IaaS, PaaS and SaaS layer. An other important focus is on create/use hybrid infrastructure typically a federation between Private and public cloud, in fact in this way when all resources owned by the organization are all used it will be easy with a federate cloud infrastructure to add some additional resources form the Public cloud for following the needs in term of computational and storage resources and release them where process are finished. Following the hybrid model, the scheduling approach is important for managing both cloud models. Thanks to this model infrastructure every time resources are available for additional request in term of IT capacities that can used "on demand" for a limited time without having to proceed to purchase additional servers.
!CHAOS: A cloud of controls

NASA Astrophysics Data System (ADS)

Angius, S.; Bisegni, C.; Ciuffetti, P.; Di Pirro, G.; Foggetta, L. G.; Galletti, F.; Gargana, R.; Gioscio, E.; Maselli, D.; Mazzitelli, G.; Michelotti, A.; Orrù, R.; Pistoni, M.; Spagnoli, F.; Spigone, D.; Stecchi, A.; Tonto, T.; Tota, M. A.; Catani, L.; Di Giulio, C.; Salina, G.; Buzzi, P.; Checcucci, B.; Lubrano, P.; Piccini, M.; Fattibene, E.; Michelotto, M.; Cavallaro, S. R.; Diana, B. F.; Enrico, F.; Pulvirenti, S.

2016-01-01

The paper is aimed to present the !CHAOS open source project aimed to develop a prototype of a national private Cloud Computing infrastructure, devoted to accelerator control systems and large experiments of High Energy Physics (HEP). The !CHAOS project has been financed by MIUR (Italian Ministry of Research and Education) and aims to develop a new concept of control system and data acquisition framework by providing, with a high level of aaabstraction, all the services needed for controlling and managing a large scientific, or non-scientific, infrastructure. A beta version of the !CHAOS infrastructure will be released at the end of December 2015 and will run on private Cloud infrastructures based on OpenStack.
Geocomputation over Hybrid Computer Architecture and Systems: Prior Works and On-going Initiatives at UARK

NASA Astrophysics Data System (ADS)

Shi, X.

2015-12-01

As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
Open source system OpenVPN in a function of Virtual Private Network

NASA Astrophysics Data System (ADS)

Skendzic, A.; Kovacic, B.

2017-05-01

Using of Virtual Private Networks (VPN) can establish high security level in network communication. VPN technology enables high security networking using distributed or public network infrastructure. VPN uses different security and managing rules inside networks. It can be set up using different communication channels like Internet or separate ISP communication infrastructure. VPN private network makes security communication channel over public network between two endpoints (computers). OpenVPN is an open source software product under GNU General Public License (GPL) that can be used to establish VPN communication between two computers inside business local network over public communication infrastructure. It uses special security protocols and 256-bit Encryption and it is capable of traversing network address translators (NATs) and firewalls. It allows computers to authenticate each other using a pre-shared secret key, certificates or username and password. This work gives review of VPN technology with a special accent on OpenVPN. This paper will also give comparison and financial benefits of using open source VPN software in business environment.
Computation Directorate Annual Report 2003

DOE Office of Scientific and Technical Information (OSTI.GOV)

Crawford, D L; McGraw, J R; Ashby, S F

Big computers are icons: symbols of the culture, and of the larger computing infrastructure that exists at Lawrence Livermore. Through the collective effort of Laboratory personnel, they enable scientific discovery and engineering development on an unprecedented scale. For more than three decades, the Computation Directorate has supplied the big computers that enable the science necessary for Laboratory missions and programs. Livermore supercomputing is uniquely mission driven. The high-fidelity weapon simulation capabilities essential to the Stockpile Stewardship Program compel major advances in weapons codes and science, compute power, and computational infrastructure. Computation's activities align with this vital mission of the Departmentmore » of Energy. Increasingly, non-weapons Laboratory programs also rely on computer simulation. World-class achievements have been accomplished by LLNL specialists working in multi-disciplinary research and development teams. In these teams, Computation personnel employ a wide array of skills, from desktop support expertise, to complex applications development, to advanced research. Computation's skilled professionals make the Directorate the success that it has become. These individuals know the importance of the work they do and the many ways it contributes to Laboratory missions. They make appropriate and timely decisions that move the entire organization forward. They make Computation a leader in helping LLNL achieve its programmatic milestones. I dedicate this inaugural Annual Report to the people of Computation in recognition of their continuing contributions. I am proud that we perform our work securely and safely. Despite increased cyber attacks on our computing infrastructure from the Internet, advanced cyber security practices ensure that our computing environment remains secure. Through Integrated Safety Management (ISM) and diligent oversight, we address safety issues promptly and aggressively. The safety of our employees, whether at work or at home, is a paramount concern. Even as the Directorate meets today's supercomputing requirements, we are preparing for the future. We are investigating open-source cluster technology, the basis of our highly successful Mulitprogrammatic Capability Resource (MCR). Several breakthrough discoveries have resulted from MCR calculations coupled with theory and experiment, prompting Laboratory scientists to demand ever-greater capacity and capability. This demand is being met by a new 23-TF system, Thunder, with architecture modeled on MCR. In preparation for the ''after-next'' computer, we are researching technology even farther out on the horizon--cell-based computers. Assuming that the funding and the technology hold, we will acquire the cell-based machine BlueGene/L within the next 12 months.« less
Data Provenance Hybridization Supporting Extreme-Scale Scientific WorkflowApplications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Elsethagen, Todd O.; Stephan, Eric G.; Raju, Bibi

As high performance computing (HPC) infrastructures continue to grow in capability and complexity, so do the applications that they serve. HPC and distributed-area computing (DAC) (e.g. grid and cloud) users are looking increasingly toward workflow solutions to orchestrate their complex application coupling, pre- and post-processing needs To gain insight and a more quantitative understanding of a workflow’s performance our method includes not only the capture of traditional provenance information, but also the capture and integration of system environment metrics helping to give context and explanation for a workflow’s execution. In this paper, we describe IPPD’s provenance management solution (ProvEn) andmore » its hybrid data store combining both of these data provenance perspectives.« less
Role of information systems in controlling costs: the electronic medical record (EMR) and the high-performance computing and communications (HPCC) efforts

NASA Astrophysics Data System (ADS)

Kun, Luis G.

1994-12-01

On October 18, 1991, the IEEE-USA produced an entity statement which endorsed the vital importance of the High Performance Computer and Communications Act of 1991 (HPCC) and called for the rapid implementation of all its elements. Efforts are now underway to develop a Computer Based Patient Record (CBPR), the National Information Infrastructure (NII) as part of the HPCC, and the so-called `Patient Card'. Multiple legislative initiatives which address these and related information technology issues are pending in Congress. Clearly, a national information system will greatly affect the way health care delivery is provided to the United States public. Timely and reliable information represents a critical element in any initiative to reform the health care system as well as to protect and improve the health of every person. Appropriately used, information technologies offer a vital means of improving the quality of patient care, increasing access to universal care and lowering overall costs within a national health care program. Health care reform legislation should reflect increased budgetary support and a legal mandate for the creation of a national health care information system by: (1) constructing a National Information Infrastructure; (2) building a Computer Based Patient Record System; (3) bringing the collective resources of our National Laboratories to bear in developing and implementing the NII and CBPR, as well as a security system with which to safeguard the privacy rights of patients and the physician-patient privilege; and (4) utilizing Government (e.g. DOD, DOE) capabilities (technology and human resources) to maximize resource utilization, create new jobs and accelerate technology transfer to address health care issues.
Resilient and Robust High Performance Computing Platforms for Scientific Computing Integrity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Yier

As technology advances, computer systems are subject to increasingly sophisticated cyber-attacks that compromise both their security and integrity. High performance computing platforms used in commercial and scientific applications involving sensitive, or even classified data, are frequently targeted by powerful adversaries. This situation is made worse by a lack of fundamental security solutions that both perform efficiently and are effective at preventing threats. Current security solutions fail to address the threat landscape and ensure the integrity of sensitive data. As challenges rise, both private and public sectors will require robust technologies to protect its computing infrastructure. The research outcomes from thismore » project try to address all these challenges. For example, we present LAZARUS, a novel technique to harden kernel Address Space Layout Randomization (KASLR) against paging-based side-channel attacks. In particular, our scheme allows for fine-grained protection of the virtual memory mappings that implement the randomization. We demonstrate the effectiveness of our approach by hardening a recent Linux kernel with LAZARUS, mitigating all of the previously presented side-channel attacks on KASLR. Our extensive evaluation shows that LAZARUS incurs only 0.943% overhead for standard benchmarks, and is therefore highly practical. We also introduced HA2lloc, a hardware-assisted allocator that is capable of leveraging an extended memory management unit to detect memory errors in the heap. We also perform testing using HA2lloc in a simulation environment and find that the approach is capable of preventing common memory vulnerabilities.« less
Helix Nebula: Enabling federation of existing data infrastructures and data services to an overarching cross-domain e-infrastructure

NASA Astrophysics Data System (ADS)

Lengert, Wolfgang; Farres, Jordi; Lanari, Riccardo; Casu, Francesco; Manunta, Michele; Lassalle-Balier, Gerard

2014-05-01

Helix Nebula has established a growing public private partnership of more than 30 commercial cloud providers, SMEs, and publicly funded research organisations and e-infrastructures. The Helix Nebula strategy is to establish a federated cloud service across Europe. Three high-profile flagships, sponsored by CERN (high energy physics), EMBL (life sciences) and ESA/DLR/CNES/CNR (earth science), have been deployed and extensively tested within this federated environment. The commitments behind these initial flagships have created a critical mass that attracts suppliers and users to the initiative, to work together towards an "Information as a Service" market place. Significant progress in implementing the following 4 programmatic goals (as outlined in the strategic Plan Ref.1) has been achieved: × Goal #1 Establish a Cloud Computing Infrastructure for the European Research Area (ERA) serving as a platform for innovation and evolution of the overall infrastructure. × Goal #2 Identify and adopt suitable policies for trust, security and privacy on a European-level can be provided by the European Cloud Computing framework and infrastructure. × Goal #3 Create a light-weight governance structure for the future European Cloud Computing Infrastructure that involves all the stakeholders and can evolve over time as the infrastructure, services and user-base grows. × Goal #4 Define a funding scheme involving the three stake-holder groups (service suppliers, users, EC and national funding agencies) into a Public-Private-Partnership model to implement a Cloud Computing Infrastructure that delivers a sustainable business environment adhering to European level policies. Now in 2014 a first version of this generic cross-domain e-infrastructure is ready to go into operations building on federation of European industry and contributors (data, tools, knowledge, ...). This presentation describes how Helix Nebula is being used in the domain of earth science focusing on geohazards. The so called "Supersite Exploitation Platform" (SSEP) provides scientists an overarching federated e-infrastructure with a very fast access to (i) large volume of data (EO/non-space data), (ii) computing resources (e.g. hybrid cloud/grid), (iii) processing software (e.g. toolboxes, RTMs, retrieval baselines, visualization routines), and (iv) general platform capabilities (e.g. user management and access control, accounting, information portal, collaborative tools, social networks etc.). In this federation each data provider remains in full control of the implementation of its data policy. This presentation outlines the Architecture (technical and services) supporting very heterogeneous science domains as well as the procedures for new-comers to join the Helix Nebula Market Place. Ref.1 http://cds.cern.ch/record/1374172/files/CERN-OPEN-2011-036.pdf

WISDOM-II: screening against multiple targets implicated in malaria using computational grid infrastructures.

PubMed

Kasam, Vinod; Salzemann, Jean; Botha, Marli; Dacosta, Ana; Degliesposti, Gianluca; Isea, Raul; Kim, Doman; Maass, Astrid; Kenyon, Colin; Rastelli, Giulio; Hofmann-Apitius, Martin; Breton, Vincent

2009-05-01

Despite continuous efforts of the international community to reduce the impact of malaria on developing countries, no significant progress has been made in the recent years and the discovery of new drugs is more than ever needed. Out of the many proteins involved in the metabolic activities of the Plasmodium parasite, some are promising targets to carry out rational drug discovery. Recent years have witnessed the emergence of grids, which are highly distributed computing infrastructures particularly well fitted for embarrassingly parallel computations like docking. In 2005, a first attempt at using grids for large-scale virtual screening focused on plasmepsins and ended up in the identification of previously unknown scaffolds, which were confirmed in vitro to be active plasmepsin inhibitors. Following this success, a second deployment took place in the fall of 2006 focussing on one well known target, dihydrofolate reductase (DHFR), and on a new promising one, glutathione-S-transferase. In silico drug design, especially vHTS is a widely and well-accepted technology in lead identification and lead optimization. This approach, therefore builds, upon the progress made in computational chemistry to achieve more accurate in silico docking and in information technology to design and operate large scale grid infrastructures. On the computational side, a sustained infrastructure has been developed: docking at large scale, using different strategies in result analysis, storing of the results on the fly into MySQL databases and application of molecular dynamics refinement are MM-PBSA and MM-GBSA rescoring. The modeling results obtained are very promising. Based on the modeling results, In vitro results are underway for all the targets against which screening is performed. The current paper describes the rational drug discovery activity at large scale, especially molecular docking using FlexX software on computational grids in finding hits against three different targets (PfGST, PfDHFR, PvDHFR (wild type and mutant forms) implicated in malaria. Grid-enabled virtual screening approach is proposed to produce focus compound libraries for other biological targets relevant to fight the infectious diseases of the developing world.
A Unified Data-Driven Approach for Programming In Situ Analysis and Visualization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aiken, Alex

The placement and movement of data is becoming the key limiting factor on both performance and energy efficiency of high performance computations. As systems generate more data, it is becoming increasingly difficult to actually move that data elsewhere for post-processing, as the rate of improvements in supporting I/O infrastructure is not keeping pace. Together, these trends are creating a shift in how we think about exascale computations, from a viewpoint that focuses on FLOPS to one that focuses on data and data-centric operations as fundamental to the reasoning about, and optimization of, scientific workflows on extreme-scale architectures. The overarching goalmore » of our effort was the study of a unified data-driven approach for programming applications and in situ analysis and visualization. Our work was to understand the interplay between data-centric programming model requirements at extreme-scale and the overall impact of those requirements on the design, capabilities, flexibility, and implementation details for both applications and the supporting in situ infrastructure. In this context, we made many improvements to the Legion programming system (one of the leading data-centric models today) and demonstrated in situ analyses on real application codes using these improvements.« less
A Security Monitoring Framework For Virtualization Based HEP Infrastructures

NASA Astrophysics Data System (ADS)

Gomez Ramirez, A.; Martinez Pedreira, M.; Grigoras, C.; Betev, L.; Lara, C.; Kebschull, U.; ALICE Collaboration

2017-10-01

High Energy Physics (HEP) distributed computing infrastructures require automatic tools to monitor, analyze and react to potential security incidents. These tools should collect and inspect data such as resource consumption, logs and sequence of system calls for detecting anomalies that indicate the presence of a malicious agent. They should also be able to perform automated reactions to attacks without administrator intervention. We describe a novel framework that accomplishes these requirements, with a proof of concept implementation for the ALICE experiment at CERN. We show how we achieve a fully virtualized environment that improves the security by isolating services and Jobs without a significant performance impact. We also describe a collected dataset for Machine Learning based Intrusion Prevention and Detection Systems on Grid computing. This dataset is composed of resource consumption measurements (such as CPU, RAM and network traffic), logfiles from operating system services, and system call data collected from production Jobs running in an ALICE Grid test site and a big set of malware samples. This malware set was collected from security research sites. Based on this dataset, we will proceed to develop Machine Learning algorithms able to detect malicious Jobs.
Tesla: An application for real-time data analysis in High Energy Physics

NASA Astrophysics Data System (ADS)

Aaij, R.; Amato, S.; Anderlini, L.; Benson, S.; Cattaneo, M.; Clemencic, M.; Couturier, B.; Frank, M.; Gligorov, V. V.; Head, T.; Jones, C.; Komarov, I.; Lupton, O.; Matev, R.; Raven, G.; Sciascia, B.; Skwarnicki, T.; Spradlin, P.; Stahl, S.; Storaci, B.; Vesterinen, M.

2016-11-01

Upgrades to the LHCb computing infrastructure in the first long shutdown of the LHC have allowed for high quality decay information to be calculated by the software trigger making a separate offline event reconstruction unnecessary. Furthermore, the storage space of the triggered candidate is an order of magnitude smaller than the entire raw event that would otherwise need to be persisted. Tesla is an application designed to process the information calculated by the trigger, with the resulting output used to directly perform physics measurements.
Long-range interactions and parallel scalability in molecular simulations

NASA Astrophysics Data System (ADS)

Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

2007-01-01

Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
Secure Enclaves: An Isolation-centric Approach for Creating Secure High Performance Computing Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aderholdt, Ferrol; Caldwell, Blake A.; Hicks, Susan Elaine

High performance computing environments are often used for a wide variety of workloads ranging from simulation, data transformation and analysis, and complex workflows to name just a few. These systems may process data at various security levels but in so doing are often enclaved at the highest security posture. This approach places significant restrictions on the users of the system even when processing data at a lower security level and exposes data at higher levels of confidentiality to a much broader population than otherwise necessary. The traditional approach of isolation, while effective in establishing security enclaves poses significant challenges formore » the use of shared infrastructure in HPC environments. This report details current state-of-the-art in virtualization, reconfigurable network enclaving via Software Defined Networking (SDN), and storage architectures and bridging techniques for creating secure enclaves in HPC environments.« less
SOA-based digital library services and composition in biomedical applications.

PubMed

Zhao, Xia; Liu, Enjie; Clapworthy, Gordon J; Viceconti, Marco; Testi, Debora

2012-06-01

Carefully collected, high-quality data are crucial in biomedical visualization, and it is important that the user community has ready access to both this data and the high-performance computing resources needed by the complex, computational algorithms that will process it. Biological researchers generally require data, tools and algorithms from multiple providers to achieve their goals. This paper illustrates our response to the problems that result from this. The Living Human Digital Library (LHDL) project presented in this paper has taken advantage of Web Services to build a biomedical digital library infrastructure that allows clinicians and researchers not only to preserve, trace and share data resources, but also to collaborate at the data-processing level. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
A modular (almost) automatic set-up for elastic multi-tenants cloud (micro)infrastructures

NASA Astrophysics Data System (ADS)

Amoroso, A.; Astorino, F.; Bagnasco, S.; Balashov, N. A.; Bianchi, F.; Destefanis, M.; Lusso, S.; Maggiora, M.; Pellegrino, J.; Yan, L.; Yan, T.; Zhang, X.; Zhao, X.

2017-10-01

An auto-installing tool on an usb drive can allow for a quick and easy automatic deployment of OpenNebula-based cloud infrastructures remotely managed by a central VMDIRAC instance. A single team, in the main site of an HEP Collaboration or elsewhere, can manage and run a relatively large network of federated (micro-)cloud infrastructures, making an highly dynamic and elastic use of computing resources. Exploiting such an approach can lead to modular systems of cloud-bursting infrastructures addressing complex real-life scenarios.
High-performance parallel approaches for three-dimensional light detection and ranging point clouds gridding

NASA Astrophysics Data System (ADS)

Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon

2017-01-01

With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.
SCinet Architecture: Featured at the International Conference for High Performance Computing,Networking, Storage and Analysis 2016

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lyonnais, Marc; Smith, Matt; Mace, Kate P.

SCinet is the purpose-built network that operates during the International Conference for High Performance Computing,Networking, Storage and Analysis (Super Computing or SC). Created each year for the conference, SCinet brings to life a high-capacity network that supports applications and experiments that are a hallmark of the SC conference. The network links the convention center to research and commercial networks around the world. This resource serves as a platform for exhibitors to demonstrate the advanced computing resources of their home institutions and elsewhere by supporting a wide variety of applications. Volunteers from academia, government and industry work together to design andmore » deliver the SCinet infrastructure. Industry vendors and carriers donate millions of dollars in equipment and services needed to build and support the local and wide area networks. Planning begins more than a year in advance of each SC conference and culminates in a high intensity installation in the days leading up to the conference. The SCinet architecture for SC16 illustrates a dramatic increase in participation from the vendor community, particularly those that focus on network equipment. Software-Defined Networking (SDN) and Data Center Networking (DCN) are present in nearly all aspects of the design.« less
GraphMeta: Managing HPC Rich Metadata in Graphs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dai, Dong; Chen, Yong; Carns, Philip

High-performance computing (HPC) systems face increasingly critical metadata management challenges, especially in the approaching exascale era. These challenges arise not only from exploding metadata volumes, but also from increasingly diverse metadata, which contains data provenance and arbitrary user-defined attributes in addition to traditional POSIX metadata. This ‘rich’ metadata is becoming critical to supporting advanced data management functionality such as data auditing and validation. In our prior work, we identified a graph-based model as a promising solution to uniformly manage HPC rich metadata due to its flexibility and generality. However, at the same time, graph-based HPC rich metadata anagement also introducesmore » significant challenges to the underlying infrastructure. In this study, we first identify the challenges on the underlying infrastructure to support scalable, high-performance rich metadata management. Based on that, we introduce GraphMeta, a graphbased engine designed for this use case. It achieves performance scalability by introducing a new graph partitioning algorithm and a write-optimal storage engine. We evaluate GraphMeta under both synthetic and real HPC metadata workloads, compare it with other approaches, and demonstrate its advantages in terms of efficiency and usability for rich metadata management in HPC systems.« less
High Temporal Resolution Mapping of Seismic Noise Sources Using Heterogeneous Supercomputers

NASA Astrophysics Data System (ADS)

Paitz, P.; Gokhberg, A.; Ermert, L. A.; Fichtner, A.

2017-12-01

The time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems like earthquake fault zones, volcanoes, geothermal and hydrocarbon reservoirs. We present results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service providing seismic noise source maps for Central Europe with high temporal resolution. We use source imaging methods based on the cross-correlation of seismic noise records from all seismic stations available in the region of interest. The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept to provide the interested researchers worldwide with regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for the generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise source mapping itself rests on the measurement of logarithmic amplitude ratios in suitably pre-processed noise correlations, and the use of simplified sensitivity kernels. During the implementation we addressed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service-oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community.

PubMed

Krampis, Konstantinos; Booth, Tim; Chapman, Brad; Tiwari, Bela; Bicak, Mesude; Field, Dawn; Nelson, Karen E

2012-03-19

A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.
Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community

PubMed Central

2012-01-01

Background A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Results Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Conclusions Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them. PMID:22429538
Integration of Russian Tier-1 Grid Center with High Performance Computers at NRC-KI for LHC experiments and beyond HENP

NASA Astrophysics Data System (ADS)

Belyaev, A.; Berezhnaya, A.; Betev, L.; Buncic, P.; De, K.; Drizhuk, D.; Klimentov, A.; Lazin, Y.; Lyalin, I.; Mashinistov, R.; Novikov, A.; Oleynik, D.; Polyakov, A.; Poyda, A.; Ryabinkin, E.; Teslyuk, A.; Tkachenko, I.; Yasnopolskiy, L.

2015-12-01

The LHC experiments are preparing for the precision measurements and further discoveries that will be made possible by higher LHC energies from April 2015 (LHC Run2). The need for simulation, data processing and analysis would overwhelm the expected capacity of grid infrastructure computing facilities deployed by the Worldwide LHC Computing Grid (WLCG). To meet this challenge the integration of the opportunistic resources into LHC computing model is highly important. The Tier-1 facility at Kurchatov Institute (NRC-KI) in Moscow is a part of WLCG and it will process, simulate and store up to 10% of total data obtained from ALICE, ATLAS and LHCb experiments. In addition Kurchatov Institute has supercomputers with peak performance 0.12 PFLOPS. The delegation of even a fraction of supercomputing resources to the LHC Computing will notably increase total capacity. In 2014 the development a portal combining a Tier-1 and a supercomputer in Kurchatov Institute was started to provide common interfaces and storage. The portal will be used not only for HENP experiments, but also by other data- and compute-intensive sciences like biology with genome sequencing analysis; astrophysics with cosmic rays analysis, antimatter and dark matter search, etc.
Toward real-time Monte Carlo simulation using a commercial cloud computing infrastructure

NASA Astrophysics Data System (ADS)

Wang, Henry; Ma, Yunzhi; Pratx, Guillem; Xing, Lei

2011-09-01

Monte Carlo (MC) methods are the gold standard for modeling photon and electron transport in a heterogeneous medium; however, their computational cost prohibits their routine use in the clinic. Cloud computing, wherein computing resources are allocated on-demand from a third party, is a new approach for high performance computing and is implemented to perform ultra-fast MC calculation in radiation therapy. We deployed the EGS5 MC package in a commercial cloud environment. Launched from a single local computer with Internet access, a Python script allocates a remote virtual cluster. A handshaking protocol designates master and worker nodes. The EGS5 binaries and the simulation data are initially loaded onto the master node. The simulation is then distributed among independent worker nodes via the message passing interface, and the results aggregated on the local computer for display and data analysis. The described approach is evaluated for pencil beams and broad beams of high-energy electrons and photons. The output of cloud-based MC simulation is identical to that produced by single-threaded implementation. For 1 million electrons, a simulation that takes 2.58 h on a local computer can be executed in 3.3 min on the cloud with 100 nodes, a 47× speed-up. Simulation time scales inversely with the number of parallel nodes. The parallelization overhead is also negligible for large simulations. Cloud computing represents one of the most important recent advances in supercomputing technology and provides a promising platform for substantially improved MC simulation. In addition to the significant speed up, cloud computing builds a layer of abstraction for high performance parallel computing, which may change the way dose calculations are performed and radiation treatment plans are completed. This work was presented in part at the 2010 Annual Meeting of the American Association of Physicists in Medicine (AAPM), Philadelphia, PA.
Grid computing technology for hydrological applications

NASA Astrophysics Data System (ADS)

Lecca, G.; Petitdidier, M.; Hluchy, L.; Ivanovic, M.; Kussul, N.; Ray, N.; Thieron, V.

2011-06-01

SummaryAdvances in e-Infrastructure promise to revolutionize sensing systems and the way in which data are collected and assimilated, and complex water systems are simulated and visualized. According to the EU Infrastructure 2010 work-programme, data and compute infrastructures and their underlying technologies, either oriented to tackle scientific challenges or complex problem solving in engineering, are expected to converge together into the so-called knowledge infrastructures, leading to a more effective research, education and innovation in the next decade and beyond. Grid technology is recognized as a fundamental component of e-Infrastructures. Nevertheless, this emerging paradigm highlights several topics, including data management, algorithm optimization, security, performance (speed, throughput, bandwidth, etc.), and scientific cooperation and collaboration issues that require further examination to fully exploit it and to better inform future research policies. The paper illustrates the results of six different surface and subsurface hydrology applications that have been deployed on the Grid. All the applications aim to answer to strong requirements from the Civil Society at large, relatively to natural and anthropogenic risks. Grid technology has been successfully tested to improve flood prediction, groundwater resources management and Black Sea hydrological survey, by providing large computing resources. It is also shown that Grid technology facilitates e-cooperation among partners by means of services for authentication and authorization, seamless access to distributed data sources, data protection and access right, and standardization.
Federated data storage and management infrastructure

NASA Astrophysics Data System (ADS)

Zarochentsev, A.; Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Hristov, P.

2016-10-01

The Large Hadron Collider (LHC)’ operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. Computing models for the High Luminosity LHC era anticipate a growth of storage needs of at least orders of magnitude; it will require new approaches in data storage organization and data handling. In our project we address the fundamental problem of designing of architecture to integrate a distributed heterogeneous disk resources for LHC experiments and other data- intensive science applications and to provide access to data from heterogeneous computing facilities. We have prototyped a federated storage for Russian T1 and T2 centers located in Moscow, St.-Petersburg and Gatchina, as well as Russian / CERN federation. We have conducted extensive tests of underlying network infrastructure and storage endpoints with synthetic performance measurement tools as well as with HENP-specific workloads, including the ones running on supercomputing platform, cloud computing and Grid for ALICE and ATLAS experiments. We will present our current accomplishments with running LHC data analysis remotely and locally to demonstrate our ability to efficiently use federated data storage experiment wide within National Academic facilities for High Energy and Nuclear Physics as well as for other data-intensive science applications, such as bio-informatics.
Software Infrastructure for Computer-aided Drug Discovery and Development, a Practical Example with Guidelines.

PubMed

Moretti, Loris; Sartori, Luca

2016-09-01

In the field of Computer-Aided Drug Discovery and Development (CADDD) the proper software infrastructure is essential for everyday investigations. The creation of such an environment should be carefully planned and implemented with certain features in order to be productive and efficient. Here we describe a solution to integrate standard computational services into a functional unit that empowers modelling applications for drug discovery. This system allows users with various level of expertise to run in silico experiments automatically and without the burden of file formatting for different software, managing the actual computation, keeping track of the activities and graphical rendering of the structural outcomes. To showcase the potential of this approach, performances of five different docking programs on an Hiv-1 protease test set are presented. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Development of a High Precision Displacement Measurement System by Fusing a Low Cost RTK-GPS Sensor and a Force Feedback Accelerometer for Infrastructure Monitoring.

PubMed

Koo, Gunhee; Kim, Kiyoung; Chung, Jun Yeon; Choi, Jaemook; Kwon, Nam-Yeol; Kang, Doo-Young; Sohn, Hoon

2017-11-28

A displacement measurement system fusing a low cost real-time kinematic global positioning system (RTK-GPS) receiver and a force feedback accelerometer is proposed for infrastructure monitoring. The proposed system is composed of a sensor module, a base module and a computation module. The sensor module consists of a RTK-GPS rover and a force feedback accelerometer, and is installed on a target structure like conventional RTK-GPS sensors. The base module is placed on a rigid ground away from the target structure similar to conventional RTK-GPS bases, and transmits observation messages to the sensor module. Then, the initial acceleration, velocity and displacement responses measured by the sensor module are transmitted to the computation module located at a central monitoring facility. Finally, high precision and high sampling rate displacement, velocity, and acceleration are estimated by fusing the acceleration from the accelerometer, the velocity from the GPS rover, and the displacement from RTK-GPS. Note that the proposed displacement measurement system can measure 3-axis acceleration, velocity as well as displacement in real time. In terms of displacement, the proposed measurement system can estimate dynamic and pseudo-static displacement with a root-mean-square error of 2 mm and a sampling rate of up to 100 Hz. The performance of the proposed system is validated under sinusoidal, random and steady-state vibrations. Field tests were performed on the Yeongjong Grand Bridge and Yi Sun-sin Bridge in Korea, and the Xihoumen Bridge in China to compare the performance of the proposed system with a commercial RTK-GPS sensor and other data fusion techniques.

Integration of Cloud resources in the LHCb Distributed Computing

NASA Astrophysics Data System (ADS)

Úbeda García, Mario; Méndez Muñoz, Víctor; Stagni, Federico; Cabarrou, Baptiste; Rauschmayr, Nathalie; Charpentier, Philippe; Closier, Joel

2014-06-01

This contribution describes how Cloud resources have been integrated in the LHCb Distributed Computing. LHCb is using its specific Dirac extension (LHCbDirac) as an interware for its Distributed Computing. So far, it was seamlessly integrating Grid resources and Computer clusters. The cloud extension of DIRAC (VMDIRAC) allows the integration of Cloud computing infrastructures. It is able to interact with multiple types of infrastructures in commercial and institutional clouds, supported by multiple interfaces (Amazon EC2, OpenNebula, OpenStack and CloudStack) - instantiates, monitors and manages Virtual Machines running on this aggregation of Cloud resources. Moreover, specifications for institutional Cloud resources proposed by Worldwide LHC Computing Grid (WLCG), mainly by the High Energy Physics Unix Information Exchange (HEPiX) group, have been taken into account. Several initiatives and computing resource providers in the eScience environment have already deployed IaaS in production during 2013. Keeping this on mind, pros and cons of a cloud based infrasctructure have been studied in contrast with the current setup. As a result, this work addresses four different use cases which represent a major improvement on several levels of our infrastructure. We describe the solution implemented by LHCb for the contextualisation of the VMs based on the idea of Cloud Site. We report on operational experience of using in production several institutional Cloud resources that are thus becoming integral part of the LHCb Distributed Computing resources. Furthermore, we describe as well the gradual migration of our Service Infrastructure towards a fully distributed architecture following the Service as a Service (SaaS) model.
Integrating multiple scientific computing needs via a Private Cloud infrastructure

NASA Astrophysics Data System (ADS)

Bagnasco, S.; Berzano, D.; Brunetti, R.; Lusso, S.; Vallero, S.

2014-06-01

In a typical scientific computing centre, diverse applications coexist and share a single physical infrastructure. An underlying Private Cloud facility eases the management and maintenance of heterogeneous use cases such as multipurpose or application-specific batch farms, Grid sites catering to different communities, parallel interactive data analysis facilities and others. It allows to dynamically and efficiently allocate resources to any application and to tailor the virtual machines according to the applications' requirements. Furthermore, the maintenance of large deployments of complex and rapidly evolving middleware and application software is eased by the use of virtual images and contextualization techniques; for example, rolling updates can be performed easily and minimizing the downtime. In this contribution we describe the Private Cloud infrastructure at the INFN-Torino Computer Centre, that hosts a full-fledged WLCG Tier-2 site and a dynamically expandable PROOF-based Interactive Analysis Facility for the ALICE experiment at the CERN LHC and several smaller scientific computing applications. The Private Cloud building blocks include the OpenNebula software stack, the GlusterFS filesystem (used in two different configurations for worker- and service-class hypervisors) and the OpenWRT Linux distribution (used for network virtualization). A future integration into a federated higher-level infrastructure is made possible by exposing commonly used APIs like EC2 and by using mainstream contextualization tools like CloudInit.
UNH Data Cooperative: A Cyber Infrastructure for Earth System Studies

NASA Astrophysics Data System (ADS)

Braswell, B. H.; Fekete, B. M.; Prusevich, A.; Gliden, S.; Magill, A.; Vorosmarty, C. J.

2007-12-01

Earth system scientists and managers have a continuously growing demand for a wide array of earth observations derived from various data sources including (a) modern satellite retrievals, (b) "in-situ" records, (c) various simulation outputs, and (d) assimilated data products combining model results with observational records. The sheer quantity of data, and formatting inconsistencies make it difficult for users to take full advantage of this important information resource. Thus the system could benefit from a thorough retooling of our current data processing procedures and infrastructure. Emerging technologies, like OPeNDAP and OGC map services, open standard data formats (NetCDF, HDF) data cataloging systems (NASA-Echo, Global Change Master Directory, etc.) are providing the basis for a new approach in data management and processing, where web- services are increasingly designed to serve computer-to-computer communications without human interactions and complex analysis can be carried out over distributed computer resources interconnected via cyber infrastructure. The UNH Earth System Data Collaborative is designed to utilize the aforementioned emerging web technologies to offer new means of access to earth system data. While the UNH Data Collaborative serves a wide array of data ranging from weather station data (Climate Portal) to ocean buoy records and ship tracks (Portsmouth Harbor Initiative) to land cover characteristics, etc. the underlaying data architecture shares common components for data mining and data dissemination via web-services. Perhaps the most unique element of the UNH Data Cooperative's IT infrastructure is its prototype modeling environment for regional ecosystem surveillance over the Northeast corridor, which allows the integration of complex earth system model components with the Cooperative's data services. While the complexity of the IT infrastructure to perform complex computations is continuously increasing, scientists are often forced to spend considerable amount of time to solve basic data management and preprocessing tasks and deal with low level computational design problems like parallelization of model codes. Our modeling infrastructure is designed to take care the bulk of the common tasks found in complex earth system models like I/O handling, computational domain and time management, parallel execution of the modeling tasks, etc. The modeling infrastructure allows scientists to focus on the numerical implementation of the physical processes on a single computational objects(typically grid cells) while the framework takes care of the preprocessing of input data, establishing of the data exchange between computation objects and the execution of the science code. In our presentation, we will discuss the key concepts of our modeling infrastructure. We will demonstrate integration of our modeling framework with data services offered by the UNH Earth System Data Collaborative via web interfaces. We will layout the road map to turn our prototype modeling environment into a truly community framework for wide range of earth system scientists and environmental managers.
Transforming Our Cities: High-Performance Green Infrastructure (WERF Report INFR1R11)

EPA Science Inventory

The objective of this project is to demonstrate that the highly distributed real-time control (DRTC) technologies for green infrastructure being developed by the research team can play a critical role in transforming our nation’s urban infrastructure. These technologies include a...
Here and now: the intersection of computational science, quantum-mechanical simulations, and materials science

NASA Astrophysics Data System (ADS)

Marzari, Nicola

The last 30 years have seen the steady and exhilarating development of powerful quantum-simulation engines for extended systems, dedicated to the solution of the Kohn-Sham equations of density-functional theory, often augmented by density-functional perturbation theory, many-body perturbation theory, time-dependent density-functional theory, dynamical mean-field theory, and quantum Monte Carlo. Their implementation on massively parallel architectures, now leveraging also GPUs and accelerators, has started a massive effort in the prediction from first principles of many or of complex materials properties, leading the way to the exascale through the combination of HPC (high-performance computing) and HTC (high-throughput computing). Challenges and opportunities abound: complementing hardware and software investments and design; developing the materials' informatics infrastructure needed to encode knowledge into complex protocols and workflows of calculations; managing and curating data; resisting the complacency that we have already reached the predictive accuracy needed for materials design, or a robust level of verification of the different quantum engines. In this talk I will provide an overview of these challenges, with the ultimate prize being the computational understanding, prediction, and design of properties and performance for novel or complex materials and devices.
Characterizing Crowd Participation and Productivity of Foldit Through Web Scraping

DTIC Science & Technology

2016-03-01

Berkeley Open Infrastructure for Network Computing CDF Cumulative Distribution Function CPU Central Processing Unit CSSG Crowdsourced Serious Game...computers at once can create a similar capacity. According to Anderson [6], principal investigator for the Berkeley Open Infrastructure for Network...extraterrestrial life. From this project, a software-based distributed computing platform called the Berkeley Open Infrastructure for Network Computing
Complete distributed computing environment for a HEP experiment: experience with ARC-connected infrastructure for ATLAS

NASA Astrophysics Data System (ADS)

Read, A.; Taga, A.; O-Saada, F.; Pajchel, K.; Samset, B. H.; Cameron, D.

2008-07-01

Computing and storage resources connected by the Nordugrid ARC middleware in the Nordic countries, Switzerland and Slovenia are a part of the ATLAS computing Grid. This infrastructure is being commissioned with the ongoing ATLAS Monte Carlo simulation production in preparation for the commencement of data taking in 2008. The unique non-intrusive architecture of ARC, its straightforward interplay with the ATLAS Production System via the Dulcinea executor, and its performance during the commissioning exercise is described. ARC support for flexible and powerful end-user analysis within the GANGA distributed analysis framework is also shown. Whereas the storage solution for this Grid was earlier based on a large, distributed collection of GridFTP-servers, the ATLAS computing design includes a structured SRM-based system with a limited number of storage endpoints. The characteristics, integration and performance of the old and new storage solutions are presented. Although the hardware resources in this Grid are quite modest, it has provided more than double the agreed contribution to the ATLAS production with an efficiency above 95% during long periods of stable operation.
A prototype Infrastructure for Cloud-based distributed services in High Availability over WAN

NASA Astrophysics Data System (ADS)

Bulfon, C.; Carlino, G.; De Salvo, A.; Doria, A.; Graziosi, C.; Pardi, S.; Sanchez, A.; Carboni, M.; Bolletta, P.; Puccio, L.; Capone, V.; Merola, L.

2015-12-01

In this work we present the architectural and performance studies concerning a prototype of a distributed Tier2 infrastructure for HEP, instantiated between the two Italian sites of INFN-Romal and INFN-Napoli. The network infrastructure is based on a Layer-2 geographical link, provided by the Italian NREN (GARR), directly connecting the two remote LANs of the named sites. By exploiting the possibilities offered by the new distributed file systems, a shared storage area with synchronous copy has been set up. The computing infrastructure, based on an OpenStack facility, is using a set of distributed Hypervisors installed in both sites. The main parameter to be taken into account when managing two remote sites with a single framework is the effect of the latency, due to the distance and the end-to-end service overhead. In order to understand the capabilities and limits of our setup, the impact of latency has been investigated by means of a set of stress tests, including data I/O throughput, metadata access performance evaluation and network occupancy, during the life cycle of a Virtual Machine. A set of resilience tests has also been performed, in order to verify the stability of the system on the event of hardware or software faults. The results of this work show that the reliability and robustness of the chosen architecture are effective enough to build a production system and to provide common services. This prototype can also be extended to multiple sites with small changes of the network topology, thus creating a National Network of Cloud-based distributed services, in HA over WAN.
Extraction of urban vegetation with Pleiades multiangular images

NASA Astrophysics Data System (ADS)

Lefebvre, Antoine; Nabucet, Jean; Corpetti, Thomas; Courty, Nicolas; Hubert-Moy, Laurence

2016-10-01

Vegetation is essential in urban environments since it provides significant services in terms of health, heat, property value, ecology ... As part of the European Union Biodiversity Strategy Plan for 2020, the protection and development of green-infrastructures is strengthened in urban areas. In order to evaluate and monitor the quality of the green infra-structures, this article investigates contributions of Pléiades multi-angular images to extract and characterize low and high urban vegetation. From such images one can extract both spectral and elevation information from optical images. Our method is composed of 3 main steps : (1) the computation of a normalized Digital Surface Model from the multi-angular images ; (2) Extraction of spectral and contextual features ; (3) a classification of vegetation classes (tree and grass) performed with a random forest classifier. Results performed in the city of Rennes in France show the ability of multi-angular images to extract DEM in urban area despite building height. It also highlights its importance and its complementarity with contextual information to extract urban vegetation.
Accessible high performance computing solutions for near real-time image processing for time critical applications

NASA Astrophysics Data System (ADS)

Bielski, Conrad; Lemoine, Guido; Syryczynski, Jacek

2009-09-01

High Performance Computing (HPC) hardware solutions such as grid computing and General Processing on a Graphics Processing Unit (GPGPU) are now accessible to users with general computing needs. Grid computing infrastructures in the form of computing clusters or blades are becoming common place and GPGPU solutions that leverage the processing power of the video card are quickly being integrated into personal workstations. Our interest in these HPC technologies stems from the need to produce near real-time maps from a combination of pre- and post-event satellite imagery in support of post-disaster management. Faster processing provides a twofold gain in this situation: 1. critical information can be provided faster and 2. more elaborate automated processing can be performed prior to providing the critical information. In our particular case, we test the use of the PANTEX index which is based on analysis of image textural measures extracted using anisotropic, rotation-invariant GLCM statistics. The use of this index, applied in a moving window, has been shown to successfully identify built-up areas in remotely sensed imagery. Built-up index image masks are important input to the structuring of damage assessment interpretation because they help optimise the workload. The performance of computing the PANTEX workflow is compared on two different HPC hardware architectures: (1) a blade server with 4 blades, each having dual quad-core CPUs and (2) a CUDA enabled GPU workstation. The reference platform is a dual CPU-quad core workstation and the PANTEX workflow total computing time is measured. Furthermore, as part of a qualitative evaluation, the differences in setting up and configuring various hardware solutions and the related software coding effort is presented.
New frontiers in design synthesis

NASA Technical Reports Server (NTRS)

Goldin, D. S.; Venneri, S. L.; Noor, A. K.

1999-01-01

The Intelligent Synthesis Environment (ISE), which is one of the major strategic technologies under development at NASA centers and the University of Virginia, is described. One of the major objectives of ISE is to significantly enhance the rapid creation of innovative affordable products and missions. ISE uses a synergistic combination of leading-edge technologies, including high performance computing, high capacity communications and networking, human-centered computing, knowledge-based engineering, computational intelligence, virtual product development, and product information management. The environment will link scientists, design teams, manufacturers, suppliers, and consultants who participate in the mission synthesis as well as in the creation and operation of the aerospace system. It will radically advance the process by which complex science missions are synthesized, and high-tech engineering Systems are designed, manufactured and operated. The five major components critical to ISE are human-centered computing, infrastructure for distributed collaboration, rapid synthesis and simulation tools, life cycle integration and validation, and cultural change in both the engineering and science creative process. The five components and their subelements are described. Related U.S. government programs are outlined and the future impact of ISE on engineering research and education is discussed.
LEMON - LHC Era Monitoring for Large-Scale Infrastructures

NASA Astrophysics Data System (ADS)

Marian, Babik; Ivan, Fedorko; Nicholas, Hook; Hector, Lansdale Thomas; Daniel, Lenkes; Miroslav, Siket; Denis, Waldron

2011-12-01

At the present time computer centres are facing a massive rise in virtualization and cloud computing as these solutions bring advantages to service providers and consolidate the computer centre resources. However, as a result the monitoring complexity is increasing. Computer centre management requires not only to monitor servers, network equipment and associated software but also to collect additional environment and facilities data (e.g. temperature, power consumption, cooling efficiency, etc.) to have also a good overview of the infrastructure performance. The LHC Era Monitoring (Lemon) system is addressing these requirements for a very large scale infrastructure. The Lemon agent that collects data on every client and forwards the samples to the central measurement repository provides a flexible interface that allows rapid development of new sensors. The system allows also to report on behalf of remote devices such as switches and power supplies. Online and historical data can be visualized via a web-based interface or retrieved via command-line tools. The Lemon Alarm System component can be used for notifying the operator about error situations. In this article, an overview of the Lemon monitoring is provided together with a description of the CERN LEMON production instance. No direct comparison is made with other monitoring tool.
@neurIST: infrastructure for advanced disease management through integration of heterogeneous data, computing, and complex processing services.

PubMed

Benkner, Siegfried; Arbona, Antonio; Berti, Guntram; Chiarini, Alessandro; Dunlop, Robert; Engelbrecht, Gerhard; Frangi, Alejandro F; Friedrich, Christoph M; Hanser, Susanne; Hasselmeyer, Peer; Hose, Rod D; Iavindrasana, Jimison; Köhler, Martin; Iacono, Luigi Lo; Lonsdale, Guy; Meyer, Rodolphe; Moore, Bob; Rajasekaran, Hariharan; Summers, Paul E; Wöhrer, Alexander; Wood, Steven

2010-11-01

The increasing volume of data describing human disease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the @neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system's architecture is generic enough that it could be adapted to the treatment of other diseases. Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers clinicians the tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medical researchers gain access to a critical mass of aneurysm related data due to the system's ability to federate distributed information sources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access and work on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand for performing computationally intensive simulations for treatment planning and research.
The Next Generation of Lab and Classroom Computing - The Silver Lining

DTIC Science & Technology

2016-12-01

desktop infrastructure (VDI) solution, as well as the computing solutions at three universities, was selected as the basis for comparison. The research... infrastructure , VDI, hardware cost, software cost, manpower, availability, cloud computing, private cloud, bring your own device, BYOD, thin client...virtual desktop infrastructure (VDI) solution, as well as the computing solutions at three universities, was selected as the basis for comparison. The
Optimization of tomographic reconstruction workflows on geographically distributed resources

DOE PAGES

Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...

2016-01-01

New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources

PubMed Central

Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.

2016-01-01

New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149
Optimization of tomographic reconstruction workflows on geographically distributed resources

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar

New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
A Cyber-ITS Framework for Massive Traffic Data Analysis Using Cyber Infrastructure

PubMed Central

Fontaine, Michael D.

2013-01-01

Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing. PMID:23766690
A Cyber-ITS framework for massive traffic data analysis using cyber infrastructure.

PubMed

Xia, Yingjie; Hu, Jia; Fontaine, Michael D

2013-01-01

Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing.
A high-speed DAQ framework for future high-level trigger and event building clusters

NASA Astrophysics Data System (ADS)

Caselle, M.; Ardila Perez, L. E.; Balzer, M.; Dritschler, T.; Kopmann, A.; Mohr, H.; Rota, L.; Vogelgesang, M.; Weber, M.

2017-03-01

Modern data acquisition and trigger systems require a throughput of several GB/s and latencies of the order of microseconds. To satisfy such requirements, a heterogeneous readout system based on FPGA readout cards and GPU-based computing nodes coupled by InfiniBand has been developed. The incoming data from the back-end electronics is delivered directly into the internal memory of GPUs through a dedicated peer-to-peer PCIe communication. High performance DMA engines have been developed for direct communication between FPGAs and GPUs using "DirectGMA (AMD)" and "GPUDirect (NVIDIA)" technologies. The proposed infrastructure is a candidate for future generations of event building clusters, high-level trigger filter farms and low-level trigger system. In this paper the heterogeneous FPGA-GPU architecture will be presented and its performance be discussed.

The dependence of educational infrastructure on clinical infrastructure.

PubMed Central

Cimino, C.

1998-01-01

The Albert Einstein College of Medicine needed to assess the growth of its infrastructure for educational computing as a first step to determining if student needs were being met. Included in computing infrastructure are space, equipment, software, and computing services. The infrastructure was assessed by reviewing purchasing and support logs for a six year period from 1992 to 1998. This included equipment, software, and e-mail accounts provided to students and to faculty for educational purposes. Student space has grown at a constant rate (averaging 14% increase each year respectively). Student equipment on campus has grown by a constant amount each year (average 8.3 computers each year). Student infrastructure off campus and educational support of faculty has not kept pace. It has either declined or remained level over the six year period. The availability of electronic mail clearly demonstrates this with accounts being used by 99% of students, 78% of Basic Science Course Leaders, 38% of Clerkship Directors, 18% of Clerkship Site Directors, and 8% of Clinical Elective Directors. The collection of the initial descriptive infrastructure data has revealed problems that may generalize to other medical schools. The discrepancy between infrastructure available to students and faculty on campus and students and faculty off campus creates a setting where students perceive a paradoxical declining support for computer use as they progress through medical school. While clinical infrastructure may be growing, it is at the expense of educational infrastructure at affiliate hospitals. PMID:9929262
Advanced Computational Methods for Optimization of Non-Periodic Inspection Intervals for Aging Infrastructure

DTIC Science & Technology

2017-01-05

AFRL-AFOSR-JP-TR-2017-0002 Advanced Computational Methods for Optimization of Non-Periodic Inspection Intervals for Aging Infrastructure Manabu...Computational Methods for Optimization of Non-Periodic Inspection Intervals for Aging Infrastructure 5a. CONTRACT NUMBER 5b. GRANT NUMBER FA2386...UNLIMITED: PB Public Release 13. SUPPLEMENTARY NOTES 14. ABSTRACT This report for the project titled ’Advanced Computational Methods for Optimization of
Bioinformatics clouds for big data manipulation.

PubMed

Dai, Lin; Gao, Xin; Guo, Yan; Xiao, Jingfa; Zhang, Zhang

2012-11-28

As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.
Investigation into Cloud Computing for More Robust Automated Bulk Image Geoprocessing

NASA Technical Reports Server (NTRS)

Brown, Richard B.; Smoot, James C.; Underwood, Lauren; Armstrong, C. Duane

2012-01-01

Geospatial resource assessments frequently require timely geospatial data processing that involves large multivariate remote sensing data sets. In particular, for disasters, response requires rapid access to large data volumes, substantial storage space and high performance processing capability. The processing and distribution of this data into usable information products requires a processing pipeline that can efficiently manage the required storage, computing utilities, and data handling requirements. In recent years, with the availability of cloud computing technology, cloud processing platforms have made available a powerful new computing infrastructure resource that can meet this need. To assess the utility of this resource, this project investigates cloud computing platforms for bulk, automated geoprocessing capabilities with respect to data handling and application development requirements. This presentation is of work being conducted by Applied Sciences Program Office at NASA-Stennis Space Center. A prototypical set of image manipulation and transformation processes that incorporate sample Unmanned Airborne System data were developed to create value-added products and tested for implementation on the "cloud". This project outlines the steps involved in creating and testing of open source software developed process code on a local prototype platform, and then transitioning this code with associated environment requirements into an analogous, but memory and processor enhanced cloud platform. A data processing cloud was used to store both standard digital camera panchromatic and multi-band image data, which were subsequently subjected to standard image processing functions such as NDVI (Normalized Difference Vegetation Index), NDMI (Normalized Difference Moisture Index), band stacking, reprojection, and other similar type data processes. Cloud infrastructure service providers were evaluated by taking these locally tested processing functions, and then applying them to a given cloud-enabled infrastructure to assesses and compare environment setup options and enabled technologies. This project reviews findings that were observed when cloud platforms were evaluated for bulk geoprocessing capabilities based on data handling and application development requirements.
Sector and Sphere: the design and implementation of a high-performance data cloud

PubMed Central

Gu, Yunhong; Grossman, Robert L.

2009-01-01

Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but also across geographically distributed data centres. Similarly, the Sphere compute cloud supports user-defined functions (UDFs) over data both within and across data centres. As a special case, MapReduce-style programming can be implemented in Sphere by using a Map UDF followed by a Reduce UDF. We describe some experimental studies comparing Sector/Sphere and Hadoop using the Terasort benchmark. In these studies, Sector is approximately twice as fast as Hadoop. Sector/Sphere is open source. PMID:19451100
Intelligent systems technology infrastructure for integrated systems

NASA Technical Reports Server (NTRS)

Lum, Henry, Jr.

1991-01-01

Significant advances have occurred during the last decade in intelligent systems technologies (a.k.a. knowledge-based systems, KBS) including research, feasibility demonstrations, and technology implementations in operational environments. Evaluation and simulation data obtained to date in real-time operational environments suggest that cost-effective utilization of intelligent systems technologies can be realized for Automated Rendezvous and Capture applications. The successful implementation of these technologies involve a complex system infrastructure integrating the requirements of transportation, vehicle checkout and health management, and communication systems without compromise to systems reliability and performance. The resources that must be invoked to accomplish these tasks include remote ground operations and control, built-in system fault management and control, and intelligent robotics. To ensure long-term evolution and integration of new validated technologies over the lifetime of the vehicle, system interfaces must also be addressed and integrated into the overall system interface requirements. An approach for defining and evaluating the system infrastructures including the testbed currently being used to support the on-going evaluations for the evolutionary Space Station Freedom Data Management System is presented and discussed. Intelligent system technologies discussed include artificial intelligence (real-time replanning and scheduling), high performance computational elements (parallel processors, photonic processors, and neural networks), real-time fault management and control, and system software development tools for rapid prototyping capabilities.
Controlling Infrastructure Costs: Right-Sizing the Mission Control Facility

NASA Technical Reports Server (NTRS)

Martin, Keith; Sen-Roy, Michael; Heiman, Jennifer

2009-01-01

Johnson Space Center's Mission Control Center is a space vehicle, space program agnostic facility. The current operational design is essentially identical to the original facility architecture that was developed and deployed in the mid-90's. In an effort to streamline the support costs of the mission critical facility, the Mission Operations Division (MOD) of Johnson Space Center (JSC) has sponsored an exploratory project to evaluate and inject current state-of-the-practice Information Technology (IT) tools, processes and technology into legacy operations. The general push in the IT industry has been trending towards a data-centric computer infrastructure for the past several years. Organizations facing challenges with facility operations costs are turning to creative solutions combining hardware consolidation, virtualization and remote access to meet and exceed performance, security, and availability requirements. The Operations Technology Facility (OTF) organization at the Johnson Space Center has been chartered to build and evaluate a parallel Mission Control infrastructure, replacing the existing, thick-client distributed computing model and network architecture with a data center model utilizing virtualization to provide the MCC Infrastructure as a Service. The OTF will design a replacement architecture for the Mission Control Facility, leveraging hardware consolidation through the use of blade servers, increasing utilization rates for compute platforms through virtualization while expanding connectivity options through the deployment of secure remote access. The architecture demonstrates the maturity of the technologies generally available in industry today and the ability to successfully abstract the tightly coupled relationship between thick-client software and legacy hardware into a hardware agnostic "Infrastructure as a Service" capability that can scale to meet future requirements of new space programs and spacecraft. This paper discusses the benefits and difficulties that a migration to cloud-based computing philosophies has uncovered when compared to the legacy Mission Control Center architecture. The team consists of system and software engineers with extensive experience with the MCC infrastructure and software currently used to support the International Space Station (ISS) and Space Shuttle program (SSP).
Digital Rocks Portal: Preservation, Sharing, Remote Visualization and Automated Analysis of Imaged Datasets

NASA Astrophysics Data System (ADS)

Prodanovic, M.; Esteva, M.; Ketcham, R. A.; Hanlon, M.; Pettengill, M.; Ranganath, A.; Venkatesh, A.

2016-12-01

Due to advances in imaging modalities such as X-ray microtomography and scattered electron microscopy, 2D and 3D imaged datasets of rock microstructure on nanometer to centimeter length scale allow investigation of nonlinear flow and mechanical phenomena using numerical approaches. This in turn produces various upscaled parameters required by subsurface flow and deformation simulators. However, a single research group typically specializes in an imaging modality and/or related modeling on a single length scale, and lack of data-sharing infrastructure makes it difficult to integrate different length scales. We developed a sustainable, open and easy-to-use repository called the Digital Rocks Portal (http://www.digitalrocksportal.org), that (1) organizes images and related experimental measurements of different porous materials, (2) improves access to them for a wider community of geosciences or engineering researchers not necessarily trained in computer science or data analysis. Our objective is to enable scientific inquiry and engineering decisions founded on a data-driven basis. We show how the data loaded in the portal can be documented, referenced in publications via digital object identifiers, visualize and linked to other repositories. We then show preliminary results on integrating remote parallel visualization and flow simulation workflow with the pore structures currently stored in the repository. We finally discuss the issues of collecting correct metadata, data discoverability and repository sustainability. This is the first repository for this particular data, but is part of the wider ecosystem of geoscience data and model cyber-infrastructure called "Earthcube" (http://earthcube.org/) sponsored by National Science Foundation. For data sustainability and continuous access, the portal is implemented within the reliable, 24/7 maintained High Performance Computing Infrastructure supported by the Texas Advanced Computing Center (TACC) at the University of Texas at Austin. Long-term storage is provided through the University of Texas System Research Cyber-infrastructure initiative.
Computational Environments and Analysis methods available on the NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform

NASA Astrophysics Data System (ADS)

Evans, B. J. K.; Foster, C.; Minchin, S. A.; Pugh, T.; Lewis, A.; Wyborn, L. A.; Evans, B. J.; Uhlherr, A.

2014-12-01

The National Computational Infrastructure (NCI) has established a powerful in-situ computational environment to enable both high performance computing and data-intensive science across a wide spectrum of national environmental data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress in addressing harmonisation of the underlying data collections for future transdisciplinary research that enable accurate climate projections. NCI makes available 10+ PB major data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the national scientific records), major research communities, and collaborating overseas organisations. The data is accessible within an integrated HPC-HPD environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large scale and high-bandwidth Lustre filesystems. This computational environment supports a catalogue of integrated reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis. To enable transdisciplinary research on this scale, data needs to be harmonised so that researchers can readily apply techniques and software across the corpus of data available and not be constrained to work within artificial disciplinary boundaries. Future challenges will involve the further integration and analysis of this data across the social sciences to facilitate the impacts across the societal domain, including timely analysis to more accurately predict and forecast future climate and environmental state.
Development of capability for microtopography-resolving simulations of hydrologic processes in permafrost affected regions

NASA Astrophysics Data System (ADS)

Painter, S.; Moulton, J. D.; Berndt, M.; Coon, E.; Garimella, R.; Lewis, K. C.; Manzini, G.; Mishra, P.; Travis, B. J.; Wilson, C. J.

2012-12-01

The frozen soils of the Arctic and subarctic regions contain vast amounts of stored organic carbon. This carbon is vulnerable to release to the atmosphere as temperatures warm and permafrost degrades. Understanding the response of the subsurface and surface hydrologic system to degrading permafrost is key to understanding the rate, timing, and chemical form of potential carbon releases to the atmosphere. Simulating the hydrologic system in degrading permafrost regions is challenging because of the potential for topographic evolution and associated drainage network reorganization as permafrost thaws and massive ground ice melts. The critical process models required for simulating hydrology include subsurface thermal hydrology of freezing/thawing soils, thermal processes within ice wedges, mechanical deformation processes, overland flow, and surface energy balances including snow dynamics. A new simulation tool, the Arctic Terrestrial Simulator (ATS), is being developed to simulate these coupled processes. The computational infrastructure must accommodate fully unstructured grids that track evolving topography, allow accurate solutions on distorted grids, provide robust and efficient solutions on highly parallel computer architectures, and enable flexibility in the strategies for coupling among the various processes. The ATS is based on Amanzi (Moulton et al. 2012), an object-oriented multi-process simulator written in C++ that provides much of the necessary computational infrastructure. Status and plans for the ATS including major hydrologic process models and validation strategies will be presented. Highly parallel simulations of overland flow using high-resolution digital elevation maps of polygonal patterned ground landscapes demonstrate the feasibility of the approach. Simulations coupling three-phase subsurface thermal hydrology with a simple thaw-induced subsidence model illustrate the strong feedbacks among the processes. D. Moulton, M. Berndt, M. Day, J. Meza, et al., High-Level Design of Amanzi, the Multi-Process High Performance Computing Simulator, Technical Report ASCEM-HPC-2011-03-1, DOE Environmental Management, 2012.
Building analytical platform with Big Data solutions for log files of PanDA infrastructure

NASA Astrophysics Data System (ADS)

Alekseev, A. A.; Barreiro Megino, F. G.; Klimentov, A. A.; Korchuganova, T. A.; Maendo, T.; Padolski, S. V.

2018-05-01

The paper describes the implementation of a high-performance system for the processing and analysis of log files for the PanDA infrastructure of the ATLAS experiment at the Large Hadron Collider (LHC), responsible for the workload management of order of 2M daily jobs across the Worldwide LHC Computing Grid. The solution is based on the ELK technology stack, which includes several components: Filebeat, Logstash, ElasticSearch (ES), and Kibana. Filebeat is used to collect data from logs. Logstash processes data and export to Elasticsearch. ES are responsible for centralized data storage. Accumulated data in ES can be viewed using a special software Kibana. These components were integrated with the PanDA infrastructure and replaced previous log processing systems for increased scalability and usability. The authors will describe all the components and their configuration tuning for the current tasks, the scale of the actual system and give several real-life examples of how this centralized log processing and storage service is used to showcase the advantages for daily operations.
The diverse use of clouds by CMS

DOE PAGES

Andronis, Anastasios; Bauer, Daniela; Chaze, Olivier; ...

2015-12-23

The resources CMS is using are increasingly being offered as clouds. In Run 2 of the LHC the majority of CMS CERN resources, both in Meyrin and at the Wigner Computing Centre, will be presented as cloud resources on which CMS will have to build its own infrastructure. This infrastructure will need to run all of the CMS workflows including: Tier 0, production and user analysis. In addition, the CMS High Level Trigger will provide a compute resource comparable in scale to the total offered by the CMS Tier 1 sites, when it is not running as part of themore » trigger system. During these periods a cloud infrastructure will be overlaid on this resource, making it accessible for general CMS use. Finally, CMS is starting to utilise cloud resources being offered by individual institutes and is gaining experience to facilitate the use of opportunistically available cloud resources. Lastly, we present a snap shot of this infrastructure and its operation at the time of the CHEP2015 conference.« less
A Development of Lightweight Grid Interface

NASA Astrophysics Data System (ADS)

Iwai, G.; Kawai, Y.; Sasaki, T.; Watase, Y.

2011-12-01

In order to help a rapid development of Grid/Cloud aware applications, we have developed API to abstract the distributed computing infrastructures based on SAGA (A Simple API for Grid Applications). SAGA, which is standardized in the OGF (Open Grid Forum), defines API specifications to access distributed computing infrastructures, such as Grid, Cloud and local computing resources. The Universal Grid API (UGAPI), which is a set of command line interfaces (CLI) and APIs, aims to offer simpler API to combine several SAGA interfaces with richer functionalities. These CLIs of the UGAPI offer typical functionalities required by end users for job management and file access to the different distributed computing infrastructures as well as local computing resources. We have also built a web interface for the particle therapy simulation and demonstrated the large scale calculation using the different infrastructures at the same time. In this paper, we would like to present how the web interface based on UGAPI and SAGA achieve more efficient utilization of computing resources over the different infrastructures with technical details and practical experiences.
The main challenges that remain in applying high-throughput sequencing to clinical diagnostics.

PubMed

Loeffelholz, Michael; Fofanov, Yuriy

2015-01-01

Over the last 10 years, the quality, price and availability of high-throughput sequencing instruments have improved to the point that this technology may be close to becoming a routine tool in the diagnostic microbiology laboratory. Two groups of challenges, however, have to be resolved in order to move this powerful research technology into routine use in the clinical microbiology laboratory. The computational/bioinformatics challenges include data storage cost and privacy concerns, requiring analysis to be performed without access to cloud storage or expensive computational infrastructure. The logistical challenges include interpretation of complex results and acceptance and understanding of the advantages and limitations of this technology by the medical community. This article focuses on the approaches to address these challenges, such as file formats, algorithms, data collection, reporting and good laboratory practices.
A case study for cloud based high throughput analysis of NGS data using the globus genomics system

DOE PAGES

Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; ...

2015-01-01

Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-end NGS analysis requirements. The Globus Genomicsmore » system is built on Amazon's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.« less
A case study for cloud based high throughput analysis of NGS data using the globus genomics system

PubMed Central

Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; Rodriguez, Alex; Madduri, Ravi; Dave, Utpal; Lacinski, Lukasz; Foster, Ian; Gusev, Yuriy; Madhavan, Subha

2014-01-01

Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon 's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research. PMID:26925205
Probability Distributome: A Web Computational Infrastructure for Exploring the Properties, Interrelations, and Applications of Probability Distributions.

PubMed

Dinov, Ivo D; Siegrist, Kyle; Pearl, Dennis K; Kalinin, Alexandr; Christou, Nicolas

2016-06-01

Probability distributions are useful for modeling, simulation, analysis, and inference on varieties of natural processes and physical phenomena. There are uncountably many probability distributions. However, a few dozen families of distributions are commonly defined and are frequently used in practice for problem solving, experimental applications, and theoretical studies. In this paper, we present a new computational and graphical infrastructure, the Distributome , which facilitates the discovery, exploration and application of diverse spectra of probability distributions. The extensible Distributome infrastructure provides interfaces for (human and machine) traversal, search, and navigation of all common probability distributions. It also enables distribution modeling, applications, investigation of inter-distribution relations, as well as their analytical representations and computational utilization. The entire Distributome framework is designed and implemented as an open-source, community-built, and Internet-accessible infrastructure. It is portable, extensible and compatible with HTML5 and Web2.0 standards (http://Distributome.org). We demonstrate two types of applications of the probability Distributome resources: computational research and science education. The Distributome tools may be employed to address five complementary computational modeling applications (simulation, data-analysis and inference, model-fitting, examination of the analytical, mathematical and computational properties of specific probability distributions, and exploration of the inter-distributional relations). Many high school and college science, technology, engineering and mathematics (STEM) courses may be enriched by the use of modern pedagogical approaches and technology-enhanced methods. The Distributome resources provide enhancements for blended STEM education by improving student motivation, augmenting the classical curriculum with interactive webapps, and overhauling the learning assessment protocols.
Probability Distributome: A Web Computational Infrastructure for Exploring the Properties, Interrelations, and Applications of Probability Distributions

PubMed Central

Dinov, Ivo D.; Siegrist, Kyle; Pearl, Dennis K.; Kalinin, Alexandr; Christou, Nicolas

2015-01-01

Probability distributions are useful for modeling, simulation, analysis, and inference on varieties of natural processes and physical phenomena. There are uncountably many probability distributions. However, a few dozen families of distributions are commonly defined and are frequently used in practice for problem solving, experimental applications, and theoretical studies. In this paper, we present a new computational and graphical infrastructure, the Distributome, which facilitates the discovery, exploration and application of diverse spectra of probability distributions. The extensible Distributome infrastructure provides interfaces for (human and machine) traversal, search, and navigation of all common probability distributions. It also enables distribution modeling, applications, investigation of inter-distribution relations, as well as their analytical representations and computational utilization. The entire Distributome framework is designed and implemented as an open-source, community-built, and Internet-accessible infrastructure. It is portable, extensible and compatible with HTML5 and Web2.0 standards (http://Distributome.org). We demonstrate two types of applications of the probability Distributome resources: computational research and science education. The Distributome tools may be employed to address five complementary computational modeling applications (simulation, data-analysis and inference, model-fitting, examination of the analytical, mathematical and computational properties of specific probability distributions, and exploration of the inter-distributional relations). Many high school and college science, technology, engineering and mathematics (STEM) courses may be enriched by the use of modern pedagogical approaches and technology-enhanced methods. The Distributome resources provide enhancements for blended STEM education by improving student motivation, augmenting the classical curriculum with interactive webapps, and overhauling the learning assessment protocols. PMID:27158191
GOES-R GS Product Generation Infrastructure Operations

NASA Astrophysics Data System (ADS)

Blanton, M.; Gundy, J.

2012-12-01

GOES-R GS Product Generation Infrastructure Operations: The GOES-R Ground System (GS) will produce a much larger set of products with higher data density than previous GOES systems. This requires considerably greater compute and memory resources to achieve the necessary latency and availability for these products. Over time, new algorithms could be added and existing ones removed or updated, but the GOES-R GS cannot go down during this time. To meet these GOES-R GS processing needs, the Harris Corporation will implement a Product Generation (PG) infrastructure that is scalable, extensible, extendable, modular and reliable. The primary parts of the PG infrastructure are the Service Based Architecture (SBA), which includes the Distributed Data Fabric (DDF). The SBA is the middleware that encapsulates and manages science algorithms that generate products. The SBA is divided into three parts, the Executive, which manages and configures the algorithm as a service, the Dispatcher, which provides data to the algorithm, and the Strategy, which determines when the algorithm can execute with the available data. The SBA is a distributed architecture, with services connected to each other over a compute grid and is highly scalable. This plug-and-play architecture allows algorithms to be added, removed, or updated without affecting any other services or software currently running and producing data. Algorithms require product data from other algorithms, so a scalable and reliable messaging is necessary. The SBA uses the DDF to provide this data communication layer between algorithms. The DDF provides an abstract interface over a distributed and persistent multi-layered storage system (memory based caching above disk-based storage) and an event system that allows algorithm services to know when data is available and to get the data that they need to begin processing when they need it. Together, the SBA and the DDF provide a flexible, high performance architecture that can meet the needs of product processing now and as they grow in the future.
Analysis of metabolomics datasets with high-performance computing and metabolite atlases

DOE PAGES

Yao, Yushu; Sun, Terence; Wang, Tony; ...

2015-07-20

Even with the widespread use of liquid chromatography mass spectrometry (LC/MS) based metabolomics, there are still a number of challenges facing this promising technique. Many, diverse experimental workflows exist; yet there is a lack of infrastructure and systems for tracking and sharing of information. Here, we describe the Metabolite Atlas framework and interface that provides highly-efficient, web-based access to raw mass spectrometry data in concert with assertions about chemicals detected to help address some of these challenges. This integration, by design, enables experimentalists to explore their raw data, specify and refine features annotations such that they can be leveraged formore » future experiments. Fast queries of the data through the web using SciDB, a parallelized database for high performance computing, make this process operate quickly. Furthermore, by using scripting containers, such as IPython or Jupyter, to analyze the data, scientists can utilize a wide variety of freely available graphing, statistics, and information management resources. In addition, the interfaces facilitate integration with systems biology tools to ultimately link metabolomics data with biological models.« less

e-Infrastructures for Astronomy: An Integrated View

NASA Astrophysics Data System (ADS)

Pasian, F.; Longo, G.

2010-12-01

As for other disciplines, the capability of performing “Big Science” in astrophysics requires the availability of large facilities. In the field of ICT, computational resources (e.g. HPC) are important, but are far from being enough for the community: as a matter of fact, the whole set of e-infrastructures (network, computing nodes, data repositories, applications) need to work in an interoperable way. This implies the development of common (or at least compatible) user interfaces to computing resources, transparent access to observations and numerical simulations through the Virtual Observatory, integrated data processing pipelines, data mining and semantic web applications. Achieving this interoperability goal is a must to build a real “Knowledge Infrastructure” in the astrophysical domain. Also, the emergence of new professional profiles (e.g. the “astro-informatician”) is necessary to allow defining and implementing properly this conceptual schema.
High temporal resolution mapping of seismic noise sources using heterogeneous supercomputers

NASA Astrophysics Data System (ADS)

Gokhberg, Alexey; Ermert, Laura; Paitz, Patrick; Fichtner, Andreas

2017-04-01

Time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems. Significant interest in seismic noise source maps with high temporal resolution (days) is expected to come from a number of domains, including natural resources exploration, analysis of active earthquake fault zones and volcanoes, as well as geothermal and hydrocarbon reservoir monitoring. Currently, knowledge of noise sources is insufficient for high-resolution subsurface monitoring applications. Near-real-time seismic data, as well as advanced imaging methods to constrain seismic noise sources have recently become available. These methods are based on the massive cross-correlation of seismic noise records from all available seismic stations in the region of interest and are therefore very computationally intensive. Heterogeneous massively parallel supercomputing systems introduced in the recent years combine conventional multi-core CPU with GPU accelerators and provide an opportunity for manifold increase and computing performance. Therefore, these systems represent an efficient platform for implementation of a noise source mapping solution. We present the first results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service that provides seismic noise source maps for Central Europe with high temporal resolution (days to few weeks depending on frequency and data availability). The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept in order to provide the interested external researchers the regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise mapping application is composed of four principal modules: (1) pre-processing of raw data, (2) massive cross-correlation, (3) post-processing of correlation data based on computation of logarithmic energy ratio and (4) generation of source maps from post-processed data. Implementation of the solution posed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
SWMM LID Module Validation Study

EPA Science Inventory

EPA’s Storm Water Management Model (SWMM) is a computational code heavily relied upon by industry for the simulation of wastewater and stormwater infrastructure performance. Many municipalities are relying on SWMM results to design multi-billion-dollar, multi-decade infrastructu...
Heterogeneous access and processing of EO-Data on a Cloud based Infrastructure delivering operational Products

NASA Astrophysics Data System (ADS)

Niggemann, F.; Appel, F.; Bach, H.; de la Mar, J.; Schirpke, B.; Dutting, K.; Rucker, G.; Leimbach, D.

2015-04-01

To address the challenges of effective data handling faced by Small and Medium Sized Enterprises (SMEs) a cloud-based infrastructure for accessing and processing of Earth Observation(EO)-data has been developed within the project APPS4GMES(www.apps4gmes.de). To gain homogenous multi mission data access an Input Data Portal (IDP) been implemented on this infrastructure. The IDP consists of an Open Geospatial Consortium (OGC) conformant catalogue, a consolidation module for format conversion and an OGC-conformant ordering framework. Metadata of various EO-sources and with different standards is harvested and transferred to an OGC conformant Earth Observation Product standard and inserted into the catalogue by a Metadata Harvester. The IDP can be accessed for search and ordering of the harvested datasets by the services implemented on the cloud infrastructure. Different land-surface services have been realised by the project partners, using the implemented IDP and cloud infrastructure. Results of these are customer ready products, as well as pre-products (e.g. atmospheric corrected EO data), serving as a basis for other services. Within the IDP an automated access to ESA's Sentinel-1 Scientific Data Hub has been implemented. Searching and downloading of the SAR data can be performed in an automated way. With the implementation of the Sentinel-1 Toolbox and own software, for processing of the datasets for further use, for example for Vista's snow monitoring, delivering input for the flood forecast services, can also be performed in an automated way. For performance tests of the cloud environment a sophisticated model based atmospheric correction and pre-classification service has been implemented. Tests conducted an automated synchronised processing of one entire Landsat 8 (LS-8) coverage for Germany and performance comparisons to standard desktop systems. Results of these tests, showing a performance improvement by the factor of six, proved the high flexibility and computing power of the cloud environment. To make full use of the cloud capabilities a possibility for automated upscaling of the hardware resources has been implemented. Together with the IDP infrastructure fast and automated processing of various satellite sources to deliver market ready products can be realised, thus increasing customer needs and numbers can be satisfied without loss of accuracy and quality.
The Gain of Resource Delegation in Distributed Computing Environments

NASA Astrophysics Data System (ADS)

Fölling, Alexander; Grimme, Christian; Lepping, Joachim; Papaspyrou, Alexander

In this paper, we address job scheduling in Distributed Computing Infrastructures, that is a loosely coupled network of autonomous acting High Performance Computing systems. In contrast to the common approach of mutual workload exchange, we consider the more intuitive operator's viewpoint of load-dependent resource reconfiguration. In case of a site's over-utilization, the scheduling system is able to lease resources from other sites to keep up service quality for its local user community. Contrary, the granting of idle resources can increase utilization in times of low local workload and thus ensure higher efficiency. The evaluation considers real workload data and is done with respect to common service quality indicators. For two simple resource exchange policies and three basic setups we show the possible gain of this approach and analyze the dynamics in workload-adaptive reconfiguration behavior.
Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators

PubMed Central

2017-01-01

In a 2016 survey of 704 National Science Foundation (NSF) Biological Sciences Directorate principal investigators (BIO PIs), nearly 90% indicated they are currently or will soon be analyzing large data sets. BIO PIs considered a range of computational needs important to their work, including high performance computing (HPC), bioinformatics support, multistep workflows, updated analysis software, and the ability to store, share, and publish data. Previous studies in the United States and Canada emphasized infrastructure needs. However, BIO PIs said the most pressing unmet needs are training in data integration, data management, and scaling analyses for HPC—acknowledging that data science skills will be required to build a deeper understanding of life. This portends a growing data knowledge gap in biology and challenges institutions and funding agencies to redouble their support for computational training in biology. PMID:29049281
Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators.

PubMed

Barone, Lindsay; Williams, Jason; Micklos, David

2017-10-01

In a 2016 survey of 704 National Science Foundation (NSF) Biological Sciences Directorate principal investigators (BIO PIs), nearly 90% indicated they are currently or will soon be analyzing large data sets. BIO PIs considered a range of computational needs important to their work, including high performance computing (HPC), bioinformatics support, multistep workflows, updated analysis software, and the ability to store, share, and publish data. Previous studies in the United States and Canada emphasized infrastructure needs. However, BIO PIs said the most pressing unmet needs are training in data integration, data management, and scaling analyses for HPC-acknowledging that data science skills will be required to build a deeper understanding of life. This portends a growing data knowledge gap in biology and challenges institutions and funding agencies to redouble their support for computational training in biology.
CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing.

PubMed

Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian

2011-08-30

Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
High Performance Computing for Modeling Wind Farms and Their Impact

NASA Astrophysics Data System (ADS)

Mavriplis, D.; Naughton, J. W.; Stoellinger, M. K.

2016-12-01

As energy generated by wind penetrates further into our electrical system, modeling of power production, power distribution, and the economic impact of wind-generated electricity is growing in importance. The models used for this work can range in fidelity from simple codes that run on a single computer to those that require high performance computing capabilities. Over the past several years, high fidelity models have been developed and deployed on the NCAR-Wyoming Supercomputing Center's Yellowstone machine. One of the primary modeling efforts focuses on developing the capability to compute the behavior of a wind farm in complex terrain under realistic atmospheric conditions. Fully modeling this system requires the simulation of continental flows to modeling the flow over a wind turbine blade, including down to the blade boundary level, fully 10 orders of magnitude in scale. To accomplish this, the simulations are broken up by scale, with information from the larger scales being passed to the lower scale models. In the code being developed, four scale levels are included: the continental weather scale, the local atmospheric flow in complex terrain, the wind plant scale, and the turbine scale. The current state of the models in the latter three scales will be discussed. These simulations are based on a high-order accurate dynamic overset and adaptive mesh approach, which runs at large scale on the NWSC Yellowstone machine. A second effort on modeling the economic impact of new wind development as well as improvement in wind plant performance and enhancements to the transmission infrastructure will also be discussed.
The Path to Convergence: Design, Coordination and Social Issues in the Implementation of a Middleware Data Broker.

NASA Astrophysics Data System (ADS)

Slota, S.; Khalsa, S. J. S.

2015-12-01

Infrastructures are the result of systems, networks, and inter-networks that accrete, overlay and segment one another over time. As a result, working infrastructures represent a broad heterogeneity of elements - data types, computational resources, material substrates (computing hardware, physical infrastructure, labs, physical information resources, etc.) as well as organizational and social functions which result in divergent outputs and goals. Cyber infrastructure's engineering often defaults to a separation of the social from the technical that results in the engineering succeeding in limited ways, or the exposure of unanticipated points of failure within the system. Studying the development of middleware intended to mediate interactions among systems within an earth systems science infrastructure exposes organizational, technical and standards-focused negotiations endemic to a fundamental trait of infrastructure: its characteristic invisibility in use. Intended to perform a core function within the EarthCube cyberinfrastructure, the development, governance and maintenance of an automated brokering system is a microcosm of large-scale infrastructural efforts. Points of potential system failure, regardless of the extent to which they are more social or more technical in nature, can be considered in terms of the reverse salient: a point of social and material configuration that momentarily lags behind the progress of an emerging or maturing infrastructure. The implementation of the BCube data broker has exposed reverse salients in regards to the overall EarthCube infrastructure (and the role of middleware brokering) in the form of organizational factors such as infrastructural alignment, maintenance and resilience; differing and incompatible practices of data discovery and evaluation among users and stakeholders; and a preponderance of local variations in the implementation of standards and authentication in data access. These issues are characterized by their role in increasing tension or friction among components that are on the path to convergence and may help to predict otherwise-occluded endogenous points of failure or non-adoption in the infrastructure.
Tempest: Tools for Addressing the Needs of Next-Generation Climate Models

NASA Astrophysics Data System (ADS)

Ullrich, P. A.; Guerra, J. E.; Pinheiro, M. C.; Fong, J.

2015-12-01

Tempest is a comprehensive simulation-to-science infrastructure that tackles the needs of next-generation, high-resolution, data intensive climate modeling activities. This project incorporates three key components: TempestDynamics, a global modeling framework for experimental numerical methods and high-performance computing; TempestRemap, a toolset for arbitrary-order conservative and consistent remapping between unstructured grids; and TempestExtremes, a suite of detection and characterization tools for identifying weather extremes in large climate datasets. In this presentation, the latest advances with the implementation of this framework will be discussed, and a number of projects now utilizing these tools will be featured.
Advanced Optical Burst Switched Network Concepts

NASA Astrophysics Data System (ADS)

Nejabati, Reza; Aracil, Javier; Castoldi, Piero; de Leenheer, Marc; Simeonidou, Dimitra; Valcarenghi, Luca; Zervas, Georgios; Wu, Jian

In recent years, as the bandwidth and the speed of networks have increased significantly, a new generation of network-based applications using the concept of distributed computing and collaborative services is emerging (e.g., Grid computing applications). The use of the available fiber and DWDM infrastructure for these applications is a logical choice offering huge amounts of cheap bandwidth and ensuring global reach of computing resources [230]. Currently, there is a great deal of interest in deploying optical circuit (wavelength) switched network infrastructure for distributed computing applications that require long-lived wavelength paths and address the specific needs of a small number of well-known users. Typical users are particle physicists who, due to their international collaborations and experiments, generate enormous amounts of data (Petabytes per year). These users require a network infrastructures that can support processing and analysis of large datasets through globally distributed computing resources [230]. However, providing wavelength granularity bandwidth services is not an efficient and scalable solution for applications and services that address a wider base of user communities with different traffic profiles and connectivity requirements. Examples of such applications may be: scientific collaboration in smaller scale (e.g., bioinformatics, environmental research), distributed virtual laboratories (e.g., remote instrumentation), e-health, national security and defense, personalized learning environments and digital libraries, evolving broadband user services (i.e., high resolution home video editing, real-time rendering, high definition interactive TV). As a specific example, in e-health services and in particular mammography applications due to the size and quantity of images produced by remote mammography, stringent network requirements are necessary. Initial calculations have shown that for 100 patients to be screened remotely, the network would have to securely transport 1.2 GB of data every 30 s [230]. According to the above explanation it is clear that these types of applications need a new network infrastructure and transport technology that makes large amounts of bandwidth at subwavelength granularity, storage, computation, and visualization resources potentially available to a wide user base for specified time durations. As these types of collaborative and network-based applications evolve addressing a wide range and large number of users, it is infeasible to build dedicated networks for each application type or category. Consequently, there should be an adaptive network infrastructure able to support all application types, each with their own access, network, and resource usage patterns. This infrastructure should offer flexible and intelligent network elements and control mechanism able to deploy new applications quickly and efficiently.
Cloud Computing in Support of Applied Learning: A Baseline Study of Infrastructure Design at Southern Polytechnic State University

ERIC Educational Resources Information Center

Conn, Samuel S.; Reichgelt, Han

2013-01-01

Cloud computing represents an architecture and paradigm of computing designed to deliver infrastructure, platforms, and software as constructible computing resources on demand to networked users. As campuses are challenged to better accommodate academic needs for applications and computing environments, cloud computing can provide an accommodating…
X-ray-induced acoustic computed tomography of concrete infrastructure

NASA Astrophysics Data System (ADS)

Tang, Shanshan; Ramseyer, Chris; Samant, Pratik; Xiang, Liangzhong

2018-02-01

X-ray-induced Acoustic Computed Tomography (XACT) takes advantage of both X-ray absorption contrast and high ultrasonic resolution in a single imaging modality by making use of the thermoacoustic effect. In XACT, X-ray absorption by defects and other structures in concrete create thermally induced pressure jumps that launch ultrasonic waves, which are then received by acoustic detectors to form images. In this research, XACT imaging was used to non-destructively test and identify defects in concrete. For concrete structures, we conclude that XACT imaging allows multiscale imaging at depths ranging from centimeters to meters, with spatial resolutions from sub-millimeter to centimeters. XACT imaging also holds promise for single-side testing of concrete infrastructure and provides an optimal solution for nondestructive inspection of existing bridges, pavement, nuclear power plants, and other concrete infrastructure.
Evaluation of a grid based molecular dynamics approach for polypeptide simulations.

PubMed

Merelli, Ivan; Morra, Giulia; Milanesi, Luciano

2007-09-01

Molecular dynamics is very important for biomedical research because it makes possible simulation of the behavior of a biological macromolecule in silico. However, molecular dynamics is computationally rather expensive: the simulation of some nanoseconds of dynamics for a large macromolecule such as a protein takes very long time, due to the high number of operations that are needed for solving the Newton's equations in the case of a system of thousands of atoms. In order to obtain biologically significant data, it is desirable to use high-performance computation resources to perform these simulations. Recently, a distributed computing approach based on replacing a single long simulation with many independent short trajectories has been introduced, which in many cases provides valuable results. This study concerns the development of an infrastructure to run molecular dynamics simulations on a grid platform in a distributed way. The implemented software allows the parallel submission of different simulations that are singularly short but together bring important biological information. Moreover, each simulation is divided into a chain of jobs to avoid data loss in case of system failure and to contain the dimension of each data transfer from the grid. The results confirm that the distributed approach on grid computing is particularly suitable for molecular dynamics simulations thanks to the elevated scalability.
EMHP: an accurate automated hole masking algorithm for single-particle cryo-EM image processing.

PubMed

Berndsen, Zachary; Bowman, Charles; Jang, Haerin; Ward, Andrew B

2017-12-01

The Electron Microscopy Hole Punch (EMHP) is a streamlined suite of tools for quick assessment, sorting and hole masking of electron micrographs. With recent advances in single-particle electron cryo-microscopy (cryo-EM) data processing allowing for the rapid determination of protein structures using a smaller computational footprint, we saw the need for a fast and simple tool for data pre-processing that could run independent of existing high-performance computing (HPC) infrastructures. EMHP provides a data preprocessing platform in a small package that requires minimal python dependencies to function. https://www.bitbucket.org/chazbot/emhp Apache 2.0 License. bowman@scripps.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Bioinformatics clouds for big data manipulation

PubMed Central

2012-01-01

Abstract As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics. Reviewers This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor. PMID:23190475
Phenomenology tools on cloud infrastructures using OpenStack

NASA Astrophysics Data System (ADS)

Campos, I.; Fernández-del-Castillo, E.; Heinemeyer, S.; Lopez-Garcia, A.; Pahlen, F.; Borges, G.

2013-04-01

We present a new environment for computations in particle physics phenomenology employing recent developments in cloud computing. On this environment users can create and manage "virtual" machines on which the phenomenology codes/tools can be deployed easily in an automated way. We analyze the performance of this environment based on "virtual" machines versus the utilization of physical hardware. In this way we provide a qualitative result for the influence of the host operating system on the performance of a representative set of applications for phenomenology calculations.
Detecting Abnormal Machine Characteristics in Cloud Infrastructures

NASA Technical Reports Server (NTRS)

Bhaduri, Kanishka; Das, Kamalika; Matthews, Bryan L.

2011-01-01

In the cloud computing environment resources are accessed as services rather than as a product. Monitoring this system for performance is crucial because of typical pay-peruse packages bought by the users for their jobs. With the huge number of machines currently in the cloud system, it is often extremely difficult for system administrators to keep track of all machines using distributed monitoring programs such as Ganglia1 which lacks system health assessment and summarization capabilities. To overcome this problem, we propose a technique for automated anomaly detection using machine performance data in the cloud. Our algorithm is entirely distributed and runs locally on each computing machine on the cloud in order to rank the machines in order of their anomalous behavior for given jobs. There is no need to centralize any of the performance data for the analysis and at the end of the analysis, our algorithm generates error reports, thereby allowing the system administrators to take corrective actions. Experiments performed on real data sets collected for different jobs validate the fact that our algorithm has a low overhead for tracking anomalous machines in a cloud infrastructure.
A Big Data Platform for Storing, Accessing, Mining and Learning Geospatial Data

NASA Astrophysics Data System (ADS)

Yang, C. P.; Bambacus, M.; Duffy, D.; Little, M. M.

2017-12-01

Big Data is becoming a norm in geoscience domains. A platform that is capable to effiently manage, access, analyze, mine, and learn the big data for new information and knowledge is desired. This paper introduces our latest effort on developing such a platform based on our past years' experiences on cloud and high performance computing, analyzing big data, comparing big data containers, and mining big geospatial data for new information. The platform includes four layers: a) the bottom layer includes a computing infrastructure with proper network, computer, and storage systems; b) the 2nd layer is a cloud computing layer based on virtualization to provide on demand computing services for upper layers; c) the 3rd layer is big data containers that are customized for dealing with different types of data and functionalities; d) the 4th layer is a big data presentation layer that supports the effient management, access, analyses, mining and learning of big geospatial data.

Integration of the Chinese HPC Grid in ATLAS Distributed Computing

NASA Astrophysics Data System (ADS)

Filipčič, A.; ATLAS Collaboration

2017-10-01

Fifteen Chinese High-Performance Computing sites, many of them on the TOP500 list of most powerful supercomputers, are integrated into a common infrastructure providing coherent access to a user through an interface based on a RESTful interface called SCEAPI. These resources have been integrated into the ATLAS Grid production system using a bridge between ATLAS and SCEAPI which translates the authorization and job submission protocols between the two environments. The ARC Computing Element (ARC-CE) forms the bridge using an extended batch system interface to allow job submission to SCEAPI. The ARC-CE was setup at the Institute for High Energy Physics, Beijing, in order to be as close as possible to the SCEAPI front-end interface at the Computing Network Information Center, also in Beijing. This paper describes the technical details of the integration between ARC-CE and SCEAPI and presents results so far with two supercomputer centers, Tianhe-IA and ERA. These two centers have been the pilots for ATLAS Monte Carlo Simulation in SCEAPI and have been providing CPU power since fall 2015.
Digital Rocks Portal: a sustainable platform for imaged dataset sharing, translation and automated analysis

NASA Astrophysics Data System (ADS)

Prodanovic, M.; Esteva, M.; Hanlon, M.; Nanda, G.; Agarwal, P.

2015-12-01

Recent advances in imaging have provided a wealth of 3D datasets that reveal pore space microstructure (nm to cm length scale) and allow investigation of nonlinear flow and mechanical phenomena from first principles using numerical approaches. This framework has popularly been called "digital rock physics". Researchers, however, have trouble storing and sharing the datasets both due to their size and the lack of standardized image types and associated metadata for volumetric datasets. This impedes scientific cross-validation of the numerical approaches that characterize large scale porous media properties, as well as development of multiscale approaches required for correct upscaling. A single research group typically specializes in an imaging modality and/or related modeling on a single length scale, and lack of data-sharing infrastructure makes it difficult to integrate different length scales. We developed a sustainable, open and easy-to-use repository called the Digital Rocks Portal, that (1) organizes images and related experimental measurements of different porous materials, (2) improves access to them for a wider community of geosciences or engineering researchers not necessarily trained in computer science or data analysis. Once widely accepter, the repository will jumpstart productivity and enable scientific inquiry and engineering decisions founded on a data-driven basis. This is the first repository of its kind. We show initial results on incorporating essential software tools and pipelines that make it easier for researchers to store and reuse data, and for educators to quickly visualize and illustrate concepts to a wide audience. For data sustainability and continuous access, the portal is implemented within the reliable, 24/7 maintained High Performance Computing Infrastructure supported by the Texas Advanced Computing Center (TACC) at the University of Texas at Austin. Long-term storage is provided through the University of Texas System Research Cyber-infrastructure initiative.
Digital Rocks Portal: a Sustainable Platform for Data Management, Analysis and Remote Visualization of Volumetric Images of Porous Media

NASA Astrophysics Data System (ADS)

Prodanovic, M.; Esteva, M.; Ketcham, R. A.

2017-12-01

Nanometer to centimeter-scale imaging such as (focused ion beam) scattered electron microscopy, magnetic resonance imaging and X-ray (micro)tomography has since 1990s introduced 2D and 3D datasets of rock microstructure that allow investigation of nonlinear flow and mechanical phenomena on the length scales that are otherwise impervious to laboratory measurements. The numerical approaches that use such images produce various upscaled parameters required by subsurface flow and deformation simulators. All of this has revolutionized our knowledge about grain scale phenomena. However, a lack of data-sharing infrastructure among research groups makes it difficult to integrate different length scales. We have developed a sustainable, open and easy-to-use repository called the Digital Rocks Portal (https://www.digitalrocksportal.org), that (1) organizes images and related experimental measurements of different porous materials, (2) improves access to them for a wider community of engineering or geosciences researchers not necessarily trained in computer science or data analysis. Digital Rocks Portal (NSF EarthCube Grant 1541008) is the first repository for imaged porous microstructure data. It is implemented within the reliable, 24/7 maintained High Performance Computing Infrastructure supported by the Texas Advanced Computing Center (University of Texas at Austin). Long-term storage is provided through the University of Texas System Research Cyber-infrastructure initiative. We show how the data can be documented, referenced in publications via digital object identifiers (see Figure below for examples), visualized, searched for and linked to other repositories. We show recently implemented integration of the remote parallel visualization, bulk upload for large datasets as well as preliminary flow simulation workflow with the pore structures currently stored in the repository. We discuss the issues of collecting correct metadata, data discoverability and repository sustainability.
Explicit Content Caching at Mobile Edge Networks with Cross-Layer Sensing

PubMed Central

Chen, Lingyu; Su, Youxing; Luo, Wenbin; Hong, Xuemin; Shi, Jianghong

2018-01-01

The deployment density and computational power of small base stations (BSs) are expected to increase significantly in the next generation mobile communication networks. These BSs form the mobile edge network, which is a pervasive and distributed infrastructure that can empower a variety of edge/fog computing applications. This paper proposes a novel edge-computing application called explicit caching, which stores selective contents at BSs and exposes such contents to local users for interactive browsing and download. We formulate the explicit caching problem as a joint content recommendation, caching, and delivery problem, which aims to maximize the expected user quality-of-experience (QoE) with varying degrees of cross-layer sensing capability. Optimal and effective heuristic algorithms are presented to solve the problem. The theoretical performance bounds of the explicit caching system are derived in simplified scenarios. The impacts of cache storage space, BS backhaul capacity, cross-layer information, and user mobility on the system performance are simulated and discussed in realistic scenarios. Results suggest that, compared with conventional implicit caching schemes, explicit caching can better exploit the mobile edge network infrastructure for personalized content dissemination. PMID:29565313
Explicit Content Caching at Mobile Edge Networks with Cross-Layer Sensing.

PubMed

Chen, Lingyu; Su, Youxing; Luo, Wenbin; Hong, Xuemin; Shi, Jianghong

2018-03-22

The deployment density and computational power of small base stations (BSs) are expected to increase significantly in the next generation mobile communication networks. These BSs form the mobile edge network, which is a pervasive and distributed infrastructure that can empower a variety of edge/fog computing applications. This paper proposes a novel edge-computing application called explicit caching, which stores selective contents at BSs and exposes such contents to local users for interactive browsing and download. We formulate the explicit caching problem as a joint content recommendation, caching, and delivery problem, which aims to maximize the expected user quality-of-experience (QoE) with varying degrees of cross-layer sensing capability. Optimal and effective heuristic algorithms are presented to solve the problem. The theoretical performance bounds of the explicit caching system are derived in simplified scenarios. The impacts of cache storage space, BS backhaul capacity, cross-layer information, and user mobility on the system performance are simulated and discussed in realistic scenarios. Results suggest that, compared with conventional implicit caching schemes, explicit caching can better exploit the mobile edge network infrastructure for personalized content dissemination.
GreenView and GreenLand Applications Development on SEE-GRID Infrastructure

NASA Astrophysics Data System (ADS)

Mihon, Danut; Bacu, Victor; Gorgan, Dorian; Mészáros, Róbert; Gelybó, Györgyi; Stefanut, Teodor

2010-05-01

The GreenView and GreenLand applications [1] have been developed through the SEE-GRID-SCI (SEE-GRID eInfrastructure for regional eScience) FP7 project co-funded by the European Commission [2]. The development of environment applications is a challenge for Grid technologies and software development methodologies. This presentation exemplifies the development of the GreenView and GreenLand applications over the SEE-GRID infrastructure by the Grid Application Development Methodology [3]. Today's environmental applications are used in vary domains of Earth Science such as meteorology, ground and atmospheric pollution, ground metal detection or weather prediction. These applications run on satellite images (e.g. Landsat, MERIS, MODIS, etc.) and the accuracy of output results depends mostly of the quality of these images. The main drawback of such environmental applications regards the need of computation power and storage power (some images are almost 1GB in size), in order to process such a large data volume. Actually, almost applications requiring high computation resources have approached the migration onto the Grid infrastructure. This infrastructure offers the computing power by running the atomic application components on different Grid nodes in sequential or parallel mode. The middleware used between the Grid infrastructure and client applications is ESIP (Environment Oriented Satellite Image Processing Platform), which is based on gProcess platform [4]. In its current format, gProcess is used for launching new processes on the Grid nodes, but also for monitoring the execution status of these processes. This presentation highlights two case studies of Grid based environmental applications, GreenView and GreenLand [5]. GreenView is used in correlation with MODIS (Moderate Resolution Imaging Spectroradiometer) satellite images and meteorological datasets, in order to produce pseudo colored temperature and vegetation maps for different geographical CEE (Central Eastern Europe) regions. On the other hand, GreenLand is used for generating maps for different vegetation indexes (e.g. NDVI, EVI, SAVI, GEMI) based on Landsat satellite images. Both applications are using interpolation and random value generation algorithms, but also specific formulas for computing vegetation index values. The GreenView and GreenLand applications have been experimented over the SEE-GRID infrastructure and the performance evaluation is reported in [6]. The improvement of the execution time (obtained through a better parallelization of jobs), the extension of geographical areas to other parts of the Earth, and new user interaction techniques on spatial data and large set of satellite images are the goals of the future work. References [1] GreenView application on Wiki, http://wiki.egee-see.org/index.php/GreenView [2] SEE-GRID-SCI Project, http://www.see-grid-sci.eu/ [3] Gorgan D., Stefanut T., Bâcu V., Mihon D., Grid based Environment Application Development Methodology, SCICOM, 7th International Conference on "Large-Scale Scientific Computations", 4-8 June, 2009, Sozopol, Bulgaria, (To be published by Springer), (2009). [4] Gorgan D., Bacu V., Stefanut T., Rodila D., Mihon D., Grid based Satellite Image Processing Platform for Earth Observation Applications Development. IDAACS'2009 - IEEE Fifth International Workshop on "Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications", 21-23 September, Cosenza, Italy, IEEE Published in Computer Press, 247-252 (2009). [5] Mihon D., Bacu V., Stefanut T., Gorgan D., "Grid Based Environment Application Development - GreenView Application". ICCP2009 - IEEE 5th International Conference on Intelligent Computer Communication and Processing, 27 Aug, 2009 Cluj-Napoca. Published by IEEE Computer Press, pp. 275-282 (2009). [6] Danut Mihon, Victor Bacu, Dorian Gorgan, Róbert Mészáros, Györgyi Gelybó, Teodor Stefanut, Practical Considerations on the GreenView Application Development and Execution over SEE-GRID. SEE-GRID-SCI User Forum, 9-10 Dec 2009, Bogazici University, Istanbul, Turkey, ISBN: 978-975-403-510-0, pp. 167-175 (2009).
About Distributed Simulation-based Optimization of Forming Processes using a Grid Architecture

NASA Astrophysics Data System (ADS)

Grauer, Manfred; Barth, Thomas

2004-06-01

Permanently increasing complexity of products and their manufacturing processes combined with a shorter "time-to-market" leads to more and more use of simulation and optimization software systems for product design. Finding a "good" design of a product implies the solution of computationally expensive optimization problems based on the results of simulation. Due to the computational load caused by the solution of these problems, the requirements on the Information&Telecommunication (IT) infrastructure of an enterprise or research facility are shifting from stand-alone resources towards the integration of software and hardware resources in a distributed environment for high-performance computing. Resources can either comprise software systems, hardware systems, or communication networks. An appropriate IT-infrastructure must provide the means to integrate all these resources and enable their use even across a network to cope with requirements from geographically distributed scenarios, e.g. in computational engineering and/or collaborative engineering. Integrating expert's knowledge into the optimization process is inevitable in order to reduce the complexity caused by the number of design variables and the high dimensionality of the design space. Hence, utilization of knowledge-based systems must be supported by providing data management facilities as a basis for knowledge extraction from product data. In this paper, the focus is put on a distributed problem solving environment (PSE) capable of providing access to a variety of necessary resources and services. A distributed approach integrating simulation and optimization on a network of workstations and cluster systems is presented. For geometry generation the CAD-system CATIA is used which is coupled with the FEM-simulation system INDEED for simulation of sheet-metal forming processes and the problem solving environment OpTiX for distributed optimization.
Facilitating NASA Earth Science Data Processing Using Nebula Cloud Computing

NASA Astrophysics Data System (ADS)

Chen, A.; Pham, L.; Kempler, S.; Theobald, M.; Esfandiari, A.; Campino, J.; Vollmer, B.; Lynnes, C.

2011-12-01

Cloud Computing technology has been used to offer high-performance and low-cost computing and storage resources for both scientific problems and business services. Several cloud computing services have been implemented in the commercial arena, e.g. Amazon's EC2 & S3, Microsoft's Azure, and Google App Engine. There are also some research and application programs being launched in academia and governments to utilize Cloud Computing. NASA launched the Nebula Cloud Computing platform in 2008, which is an Infrastructure as a Service (IaaS) to deliver on-demand distributed virtual computers. Nebula users can receive required computing resources as a fully outsourced service. NASA Goddard Earth Science Data and Information Service Center (GES DISC) migrated several GES DISC's applications to the Nebula as a proof of concept, including: a) The Simple, Scalable, Script-based Science Processor for Measurements (S4PM) for processing scientific data; b) the Atmospheric Infrared Sounder (AIRS) data process workflow for processing AIRS raw data; and c) the GES-DISC Interactive Online Visualization ANd aNalysis Infrastructure (GIOVANNI) for online access to, analysis, and visualization of Earth science data. This work aims to evaluate the practicability and adaptability of the Nebula. The initial work focused on the AIRS data process workflow to evaluate the Nebula. The AIRS data process workflow consists of a series of algorithms being used to process raw AIRS level 0 data and output AIRS level 2 geophysical retrievals. Migrating the entire workflow to the Nebula platform is challenging, but practicable. After installing several supporting libraries and the processing code itself, the workflow is able to process AIRS data in a similar fashion to its current (non-cloud) configuration. We compared the performance of processing 2 days of AIRS level 0 data through level 2 using a Nebula virtual computer and a local Linux computer. The result shows that Nebula has significantly better performance than the local machine. Much of the difference was due to newer equipment in the Nebula than the legacy computer, which is suggestive of a potential economic advantage beyond elastic power, i.e., access to up-to-date hardware vs. legacy hardware that must be maintained past its prime to amortize the cost. In addition to a trade study of advantages and challenges of porting complex processing to the cloud, a tutorial was developed to enable further progress in utilizing the Nebula for Earth Science applications and understanding better the potential for Cloud Computing in further data- and computing-intensive Earth Science research. In particular, highly bursty computing such as that experienced in the user-demand-driven Giovanni system may become more tractable in a Cloud environment. Our future work will continue to focus on migrating more GES DISC's applications/instances, e.g. Giovanni instances, to the Nebula platform and making matured migrated applications to be in operation on the Nebula.
Sustaining a Community Computing Infrastructure for Online Teacher Professional Development: A Case Study of Designing Tapped In

NASA Astrophysics Data System (ADS)

Farooq, Umer; Schank, Patricia; Harris, Alexandra; Fusco, Judith; Schlager, Mark

Community computing has recently grown to become a major research area in human-computer interaction. One of the objectives of community computing is to support computer-supported cooperative work among distributed collaborators working toward shared professional goals in online communities of practice. A core issue in designing and developing community computing infrastructures — the underlying sociotechnical layer that supports communitarian activities — is sustainability. Many community computing initiatives fail because the underlying infrastructure does not meet end user requirements; the community is unable to maintain a critical mass of users consistently over time; it generates insufficient social capital to support significant contributions by members of the community; or, as typically happens with funded initiatives, financial and human capital resource become unavailable to further maintain the infrastructure. On the basis of more than 9 years of design experience with Tapped In-an online community of practice for education professionals — we present a case study that discusses four design interventions that have sustained the Tapped In infrastructure and its community to date. These interventions represent broader design strategies for developing online environments for professional communities of practice.
Integration of a neuroimaging processing pipeline into a pan-canadian computing grid

NASA Astrophysics Data System (ADS)

Lavoie-Courchesne, S.; Rioux, P.; Chouinard-Decorte, F.; Sherif, T.; Rousseau, M.-E.; Das, S.; Adalat, R.; Doyon, J.; Craddock, C.; Margulies, D.; Chu, C.; Lyttelton, O.; Evans, A. C.; Bellec, P.

2012-02-01

The ethos of the neuroimaging field is quickly moving towards the open sharing of resources, including both imaging databases and processing tools. As a neuroimaging database represents a large volume of datasets and as neuroimaging processing pipelines are composed of heterogeneous, computationally intensive tools, such open sharing raises specific computational challenges. This motivates the design of novel dedicated computing infrastructures. This paper describes an interface between PSOM, a code-oriented pipeline development framework, and CBRAIN, a web-oriented platform for grid computing. This interface was used to integrate a PSOM-compliant pipeline for preprocessing of structural and functional magnetic resonance imaging into CBRAIN. We further tested the capacity of our infrastructure to handle a real large-scale project. A neuroimaging database including close to 1000 subjects was preprocessed using our interface and publicly released to help the participants of the ADHD-200 international competition. This successful experiment demonstrated that our integrated grid-computing platform is a powerful solution for high-throughput pipeline analysis in the field of neuroimaging.
DICOMGrid: a middleware to integrate PACS and EELA-2 grid infrastructure

NASA Astrophysics Data System (ADS)

Moreno, Ramon A.; de Sá Rebelo, Marina; Gutierrez, Marco A.

2010-03-01

Medical images provide lots of information for physicians, but the huge amount of data produced by medical image equipments in a modern Health Institution is not completely explored in its full potential yet. Nowadays medical images are used in hospitals mostly as part of routine activities while its intrinsic value for research is underestimated. Medical images can be used for the development of new visualization techniques, new algorithms for patient care and new image processing techniques. These research areas usually require the use of huge volumes of data to obtain significant results, along with enormous computing capabilities. Such qualities are characteristics of grid computing systems such as EELA-2 infrastructure. The grid technologies allow the sharing of data in large scale in a safe and integrated environment and offer high computing capabilities. In this paper we describe the DicomGrid to store and retrieve medical images, properly anonymized, that can be used by researchers to test new processing techniques, using the computational power offered by grid technology. A prototype of the DicomGrid is under evaluation and permits the submission of jobs into the EELA-2 grid infrastructure while offering a simple interface that requires minimal understanding of the grid operation.
Configuration Management and Infrastructure Monitoring Using CFEngine and Icinga for Real-time Heterogeneous Data Taking Environment

NASA Astrophysics Data System (ADS)

Poat, M. D.; Lauret, J.; Betts, W.

2015-12-01

The STAR online computing environment is an intensive ever-growing system used for real-time data collection and analysis. Composed of heterogeneous and sometimes groups of custom-tuned machines, the computing infrastructure was previously managed by manual configurations and inconsistently monitored by a combination of tools. This situation led to configuration inconsistency and an overload of repetitive tasks along with lackluster communication between personnel and machines. Globally securing this heterogeneous cyberinfrastructure was tedious at best and an agile, policy-driven system ensuring consistency, was pursued. Three configuration management tools, Chef, Puppet, and CFEngine have been compared in reliability, versatility and performance along with a comparison of infrastructure monitoring tools Nagios and Icinga. STAR has selected the CFEngine configuration management tool and the Icinga infrastructure monitoring system leading to a versatile and sustainable solution. By leveraging these two tools STAR can now swiftly upgrade and modify the environment to its needs with ease as well as promptly react to cyber-security requests. By creating a sustainable long term monitoring solution, the detection of failures was reduced from days to minutes, allowing rapid actions before the issues become dire problems, potentially causing loss of precious experimental data or uptime.
The ATLAS Simulation Infrastructure

DOE PAGES

Aad, G.; Abbott, B.; Abdallah, J.; ...

2010-09-25

The simulation software for the ATLAS Experiment at the Large Hadron Collider is being used for large-scale production of events on the LHC Computing Grid. This simulation requires many components, from the generators that simulate particle collisions, through packages simulating the response of the various detectors and triggers. All of these components come together under the ATLAS simulation infrastructure. In this paper, that infrastructure is discussed, including that supporting the detector description, interfacing the event generation, and combining the GEANT4 simulation of the response of the individual detectors. Also described are the tools allowing the software validation, performance testing, andmore » the validation of the simulated output against known physics processes.« less
Nbody Simulations and Weak Gravitational Lensing using new HPC-Grid resources: the PI2S2 project

NASA Astrophysics Data System (ADS)

Becciani, U.; Antonuccio-Delogu, V.; Costa, A.; Comparato, M.

2008-08-01

We present the main project of the new grid infrastructure and the researches, that have been already started in Sicily and will be completed by next year. The PI2S2 project of the COMETA consortium is funded by the Italian Ministry of University and Research and will be completed in 2009. Funds are from the European Union Structural Funds for Objective 1 regions. The project, together with a similar project called Trinacria GRID Virtual Laboratory (Trigrid VL), aims to create in Sicily a computational grid for e-science and e-commerce applications with the main goal of increasing the technological innovation of local enterprises and their competition on the global market. PI2S2 project aims to build and develop an e-Infrastructure in Sicily, based on the grid paradigm, mainly for research activity using the grid environment and High Performance Computer systems. As an example we present the first results of a new grid version of FLY a tree Nbody code developed by INAF Astrophysical Observatory of Catania, already published in the CPC program Library, that will be used in the Weak Gravitational Lensing field.
Energy Systems Integration Facility (ESIF): Golden, CO - Energy Integration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sheppy, Michael; VanGeet, Otto; Pless, Shanti

2015-03-01

At NREL's Energy Systems Integration Facility (ESIF) in Golden, Colo., scientists and engineers work to overcome challenges related to how the nation generates, delivers and uses energy by modernizing the interplay between energy sources, infrastructure, and data. Test facilities include a megawatt-scale ac electric grid, photovoltaic simulators and a load bank. Additionally, a high performance computing data center (HPCDC) is dedicated to advancing renewable energy and energy efficient technologies. A key design strategy is to use waste heat from the HPCDC to heat parts of the building. The ESIF boasts an annual EUI of 168.3 kBtu/ft2. This article describes themore » building's procurement, design and first year of performance.« less
Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing.

PubMed

Memon, Farhat N; Owen, Anne M; Sanchez-Graillet, Olivia; Upton, Graham J G; Harrison, Andrew P

2010-01-15

A tetramer quadruplex structure is formed by four parallel strands of DNA/ RNA containing runs of guanine. These quadruplexes are able to form because guanine can Hoogsteen hydrogen bond to other guanines, and a tetrad of guanines can form a stable arrangement. Recently we have discovered that probes on Affymetrix GeneChips that contain runs of guanine do not measure gene expression reliably. We associate this finding with the likelihood that quadruplexes are forming on the surface of GeneChips. In order to cope with the rapidly expanding size of GeneChip array datasets in the public domain, we are exploring the use of cloud computing to replicate our experiments on 3' arrays to look at the effect of the location of G-spots (runs of guanines). Cloud computing is a recently introduced high-performance solution that takes advantage of the computational infrastructure of large organisations such as Amazon and Google. We expect that cloud computing will become widely adopted because it enables bioinformaticians to avoid capital expenditure on expensive computing resources and to only pay a cloud computing provider for what is used. Moreover, as well as financial efficiency, cloud computing is an ecologically-friendly technology, it enables efficient data-sharing and we expect it to be faster for development purposes. Here we propose the advantageous use of cloud computing to perform a large data-mining analysis of public domain 3' arrays.
Methodologies and systems for heterogeneous concurrent computing

NASA Technical Reports Server (NTRS)

Sunderam, V. S.

1994-01-01

Heterogeneous concurrent computing is gaining increasing acceptance as an alternative or complementary paradigm to multiprocessor-based parallel processing as well as to conventional supercomputing. While algorithmic and programming aspects of heterogeneous concurrent computing are similar to their parallel processing counterparts, system issues, partitioning and scheduling, and performance aspects are significantly different. In this paper, we discuss critical design and implementation issues in heterogeneous concurrent computing, and describe techniques for enhancing its effectiveness. In particular, we highlight the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments. We also present recent developments and experiences in the context of the PVM system and comment on ongoing and future work.
Heterogeneous concurrent computing with exportable services

NASA Technical Reports Server (NTRS)

Sunderam, Vaidy

1995-01-01

Heterogeneous concurrent computing, based on the traditional process-oriented model, is approaching its functionality and performance limits. An alternative paradigm, based on the concept of services, supporting data driven computation, and built on a lightweight process infrastructure, is proposed to enhance the functional capabilities and the operational efficiency of heterogeneous network-based concurrent computing. TPVM is an experimental prototype system supporting exportable services, thread-based computation, and remote memory operations that is built as an extension of and an enhancement to the PVM concurrent computing system. TPVM offers a significantly different computing paradigm for network-based computing, while maintaining a close resemblance to the conventional PVM model in the interest of compatibility and ease of transition Preliminary experiences have demonstrated that the TPVM framework presents a natural yet powerful concurrent programming interface, while being capable of delivering performance improvements of upto thirty percent.
OSiRIS: a distributed Ceph deployment using software defined networking for multi-institutional research

NASA Astrophysics Data System (ADS)

McKee, Shawn; Kissel, Ezra; Meekhof, Benjeman; Swany, Martin; Miller, Charles; Gregorowicz, Michael

2017-10-01

We report on the first year of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) which is targeting the creation of a distributed Ceph storage infrastructure coupled together with software-defined networking to provide high-performance access for well-connected locations on any participating campus. The projects goal is to provide a single scalable, distributed storage infrastructure that allows researchers at each campus to read, write, manage and share data directly from their own computing locations. The NSF CC*DNI DIBBS program which funded OSiRIS is seeking solutions to the challenges of multi-institutional collaborations involving large amounts of data and we are exploring the creative use of Ceph and networking to address those challenges. While OSiRIS will eventually be serving a broad range of science domains, its first adopter will be the LHC ATLAS detector project via the ATLAS Great Lakes Tier-2 (AGLT2) jointly located at the University of Michigan and Michigan State University. Part of our presentation will cover how ATLAS is using the OSiRIS infrastructure and our experiences integrating our first user community. The presentation will also review the motivations for and goals of the project, the technical details of the OSiRIS infrastructure, the challenges in providing such an infrastructure, and the technical choices made to address those challenges. We will conclude with our plans for the remaining 4 years of the project and our vision for what we hope to deliver by the projects end.
Tracking the NGS revolution: managing life science research on shared high-performance computing clusters.

PubMed

Dahlö, Martin; Scofield, Douglas G; Schaal, Wesley; Spjuth, Ola

2018-05-01

Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences. The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat. Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.

Tracking the NGS revolution: managing life science research on shared high-performance computing clusters

PubMed Central

2018-01-01

Abstract Background Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences. Results The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat. Conclusions Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases. PMID:29659792
Managing a tier-2 computer centre with a private cloud infrastructure

NASA Astrophysics Data System (ADS)

Bagnasco, Stefano; Berzano, Dario; Brunetti, Riccardo; Lusso, Stefano; Vallero, Sara

2014-06-01

In a typical scientific computing centre, several applications coexist and share a single physical infrastructure. An underlying Private Cloud infrastructure eases the management and maintenance of such heterogeneous applications (such as multipurpose or application-specific batch farms, Grid sites, interactive data analysis facilities and others), allowing dynamic allocation resources to any application. Furthermore, the maintenance of large deployments of complex and rapidly evolving middleware and application software is eased by the use of virtual images and contextualization techniques. Such infrastructures are being deployed in some large centres (see e.g. the CERN Agile Infrastructure project), but with several open-source tools reaching maturity this is becoming viable also for smaller sites. In this contribution we describe the Private Cloud infrastructure at the INFN-Torino Computer Centre, that hosts a full-fledged WLCG Tier-2 centre, an Interactive Analysis Facility for the ALICE experiment at the CERN LHC and several smaller scientific computing applications. The private cloud building blocks include the OpenNebula software stack, the GlusterFS filesystem and the OpenWRT Linux distribution (used for network virtualization); a future integration into a federated higher-level infrastructure is made possible by exposing commonly used APIs like EC2 and OCCI.
GISpark: A Geospatial Distributed Computing Platform for Spatiotemporal Big Data

NASA Astrophysics Data System (ADS)

Wang, S.; Zhong, E.; Wang, E.; Zhong, Y.; Cai, W.; Li, S.; Gao, S.

2016-12-01

Geospatial data are growing exponentially because of the proliferation of cost effective and ubiquitous positioning technologies such as global remote-sensing satellites and location-based devices. Analyzing large amounts of geospatial data can provide great value for both industrial and scientific applications. Data- and compute- intensive characteristics inherent in geospatial big data increasingly pose great challenges to technologies of data storing, computing and analyzing. Such challenges require a scalable and efficient architecture that can store, query, analyze, and visualize large-scale spatiotemporal data. Therefore, we developed GISpark - a geospatial distributed computing platform for processing large-scale vector, raster and stream data. GISpark is constructed based on the latest virtualized computing infrastructures and distributed computing architecture. OpenStack and Docker are used to build multi-user hosting cloud computing infrastructure for GISpark. The virtual storage systems such as HDFS, Ceph, MongoDB are combined and adopted for spatiotemporal data storage management. Spark-based algorithm framework is developed for efficient parallel computing. Within this framework, SuperMap GIScript and various open-source GIS libraries can be integrated into GISpark. GISpark can also integrated with scientific computing environment (e.g., Anaconda), interactive computing web applications (e.g., Jupyter notebook), and machine learning tools (e.g., TensorFlow/Orange). The associated geospatial facilities of GISpark in conjunction with the scientific computing environment, exploratory spatial data analysis tools, temporal data management and analysis systems make up a powerful geospatial computing tool. GISpark not only provides spatiotemporal big data processing capacity in the geospatial field, but also provides spatiotemporal computational model and advanced geospatial visualization tools that deals with other domains related with spatial property. We tested the performance of the platform based on taxi trajectory analysis. Results suggested that GISpark achieves excellent run time performance in spatiotemporal big data applications.
Foundational Tools for Petascale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, Barton

2014-05-19

The Paradyn project has a history of developing algorithms, techniques, and software that push the cutting edge of tool technology for high-end computing systems. Under this funding, we are working on a three-year agenda to make substantial new advances in support of new and emerging Petascale systems. The overall goal for this work is to address the steady increase in complexity of these petascale systems. Our work covers two key areas: (1) The analysis, instrumentation and control of binary programs. Work in this area falls under the general framework of the Dyninst API tool kits. (2) Infrastructure for building toolsmore » and applications at extreme scale. Work in this area falls under the general framework of the MRNet scalability framework. Note that work done under this funding is closely related to work done under a contemporaneous grant, “High-Performance Energy Applications and Systems”, SC0004061/FG02-10ER25972, UW PRJ36WV.« less
Power monitoring and control for large scale projects: SKA, a case study

NASA Astrophysics Data System (ADS)

Barbosa, Domingos; Barraca, João. Paulo; Maia, Dalmiro; Carvalho, Bruno; Vieira, Jorge; Swart, Paul; Le Roux, Gerhard; Natarajan, Swaminathan; van Ardenne, Arnold; Seca, Luis

2016-07-01

Large sensor-based science infrastructures for radio astronomy like the SKA will be among the most intensive datadriven projects in the world, facing very high demanding computation, storage, management, and above all power demands. The geographically wide distribution of the SKA and its associated processing requirements in the form of tailored High Performance Computing (HPC) facilities, require a Greener approach towards the Information and Communications Technologies (ICT) adopted for the data processing to enable operational compliance to potentially strict power budgets. Addressing the reduction of electricity costs, improve system power monitoring and the generation and management of electricity at system level is paramount to avoid future inefficiencies and higher costs and enable fulfillments of Key Science Cases. Here we outline major characteristics and innovation approaches to address power efficiency and long-term power sustainability for radio astronomy projects, focusing on Green ICT for science and Smart power monitoring and control.
Commissioning the CERN IT Agile Infrastructure with experiment workloads

NASA Astrophysics Data System (ADS)

Medrano Llamas, Ramón; Harald Barreiro Megino, Fernando; Kucharczyk, Katarzyna; Kamil Denis, Marek; Cinquilli, Mattia

2014-06-01

In order to ease the management of their infrastructure, most of the WLCG sites are adopting cloud based strategies. In the case of CERN, the Tier 0 of the WLCG, is completely restructuring the resource and configuration management of their computing center under the codename Agile Infrastructure. Its goal is to manage 15,000 Virtual Machines by means of an OpenStack middleware in order to unify all the resources in CERN's two datacenters: the one placed in Meyrin and the new on in Wigner, Hungary. During the commissioning of this infrastructure, CERN IT is offering an attractive amount of computing resources to the experiments (800 cores for ATLAS and CMS) through a private cloud interface. ATLAS and CMS have joined forces to exploit them by running stress tests and simulation workloads since November 2012. This work will describe the experience of the first deployments of the current experiment workloads on the CERN private cloud testbed. The paper is organized as follows: the first section will explain the integration of the experiment workload management systems (WMS) with the cloud resources. The second section will revisit the performance and stress testing performed with HammerCloud in order to evaluate and compare the suitability for the experiment workloads. The third section will go deeper into the dynamic provisioning techniques, such as the use of the cloud APIs directly by the WMS. The paper finishes with a review of the conclusions and the challenges ahead.
Energy Systems Integration Laboratory | Energy Systems Integration Facility

Science.gov Websites

systems test hub includes a Class 1, Division 2 space for performing tests of high-pressure hydrogen Laboratory offers the following capabilities. High-Pressure Hydrogen Systems The high-pressure hydrogen infrastructure. Key Infrastructure Robotic arm; high-pressure hydrogen; natural gas supply; standalone SCADA
Evaluating Commercial and Private Cloud Services for Facility-Scale Geodetic Data Access, Analysis, and Services

NASA Astrophysics Data System (ADS)

Meertens, C. M.; Boler, F. M.; Ertz, D. J.; Mencin, D.; Phillips, D.; Baker, S.

2017-12-01

UNAVCO, in its role as a NSF facility for geodetic infrastructure and data, has succeeded for over two decades using on-premises infrastructure, and while the promise of cloud-based infrastructure is well-established, significant questions about suitability of such infrastructure for facility-scale services remain. Primarily through the GeoSciCloud award from NSF EarthCube, UNAVCO is investigating the costs, advantages, and disadvantages of providing its geodetic data and services in the cloud versus using UNAVCO's on-premises infrastructure. (IRIS is a collaborator on the project and is performing its own suite of investigations). In contrast to the 2-3 year time scale for the research cycle, the time scale of operation and planning for NSF facilities is for a minimum of five years and for some services extends to a decade or more. Planning for on-premises infrastructure is deliberate, and migrations typically take months to years to fully implement. Migrations to a cloud environment can only go forward with similar deliberate planning and understanding of all costs and benefits. The EarthCube GeoSciCloud project is intended to address the uncertainties of facility-level operations in the cloud. Investigations are being performed in a commercial cloud environment (Amazon AWS) during the first year of the project and in a private cloud environment (NSF XSEDE resource at the Texas Advanced Computing Center) during the second year. These investigations are expected to illuminate the potential as well as the limitations of running facility scale production services in the cloud. The work includes running parallel equivalent cloud-based services to on premises services and includes: data serving via ftp from a large data store, operation of a metadata database, production scale processing of multiple months of geodetic data, web services delivery of quality checked data and products, large-scale compute services for event post-processing, and serving real time data from a network of 700-plus GPS stations. The evaluation is based on a suite of metrics that we have developed to elucidate the effectiveness of cloud-based services in price, performance, and management. Services are currently running in AWS and evaluation is underway.
Marginal adaptation and CAD-CAM technology: A systematic review of restorative material and fabrication techniques.

PubMed

Papadiochou, Sofia; Pissiotis, Argirios L

2018-04-01

The comparative assessment of computer-aided design and computer-aided manufacturing (CAD-CAM) technology and other fabrication techniques pertaining to marginal adaptation should be documented. Limited evidence exists on the effect of restorative material on the performance of a CAD-CAM system relative to marginal adaptation. The purpose of this systematic review was to investigate whether the marginal adaptation of CAD-CAM single crowns, fixed dental prostheses, and implant-retained fixed dental prostheses or their infrastructures differs from that obtained by other fabrication techniques using a similar restorative material and whether it depends on the type of restorative material. An electronic search of English-language literature published between January 1, 2000, and June 30, 2016, was conducted of the Medline/PubMed database. Of the 55 included comparative studies, 28 compared CAD-CAM technology with conventional fabrication techniques, 12 contrasted CAD-CAM technology and copy milling, 4 compared CAD-CAM milling with direct metal laser sintering (DMLS), and 22 investigated the performance of a CAD-CAM system regarding marginal adaptation in restorations/infrastructures produced with different restorative materials. Most of the CAD-CAM restorations/infrastructures were within the clinically acceptable marginal discrepancy (MD) range. The performance of a CAD-CAM system relative to marginal adaptation is influenced by the restorative material. Compared with CAD-CAM, most of the heat-pressed lithium disilicate crowns displayed equal or smaller MD values. Slip-casting crowns exhibited similar or better marginal accuracy than those fabricated with CAD-CAM. Cobalt-chromium and titanium implant infrastructures produced using a CAD-CAM system elicited smaller MD values than zirconia. The majority of cobalt-chromium restorations/infrastructures produced by DMLS displayed better marginal accuracy than those fabricated with the casting technique. Compared with copy milling, the majority of zirconia restorations/infrastructures produced by CAD-CAM milling exhibited better marginal adaptation. No clear conclusions can be drawn about the superiority of CAD-CAM milling over the casting technique and DMLS regarding marginal adaptation. Copyright © 2017 Editorial Council for the Journal of Prosthetic Dentistry. Published by Elsevier Inc. All rights reserved.
Information Power Grid: Distributed High-Performance Computing and Large-Scale Data Management for Science and Engineering

NASA Technical Reports Server (NTRS)

Johnston, William E.; Gannon, Dennis; Nitzberg, Bill; Feiereisen, William (Technical Monitor)

2000-01-01

The term "Grid" refers to distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. The vision for NASN's Information Power Grid - a computing and data Grid - is that it will provide significant new capabilities to scientists and engineers by facilitating routine construction of information based problem solving environments / frameworks that will knit together widely distributed computing, data, instrument, and human resources into just-in-time systems that can address complex and large-scale computing and data analysis problems. IPG development and deployment is addressing requirements obtained by analyzing a number of different application areas, in particular from the NASA Aero-Space Technology Enterprise. This analysis has focussed primarily on two types of users: The scientist / design engineer whose primary interest is problem solving (e.g., determining wing aerodynamic characteristics in many different operating environments), and whose primary interface to IPG will be through various sorts of problem solving frameworks. The second type of user if the tool designer: The computational scientists who convert physics and mathematics into code that can simulate the physical world. These are the two primary users of IPG, and they have rather different requirements. This paper describes the current state of IPG (the operational testbed), the set of capabilities being put into place for the operational prototype IPG, as well as some of the longer term R&D tasks.
WPS mediation: An approach to process geospatial data on different computing backends

NASA Astrophysics Data System (ADS)

Giuliani, Gregory; Nativi, Stefano; Lehmann, Anthony; Ray, Nicolas

2012-10-01

The OGC Web Processing Service (WPS) specification allows generating information by processing distributed geospatial data made available through Spatial Data Infrastructures (SDIs). However, current SDIs have limited analytical capacities and various problems emerge when trying to use them in data and computing-intensive domains such as environmental sciences. These problems are usually not or only partially solvable using single computing resources. Therefore, the Geographic Information (GI) community is trying to benefit from the superior storage and computing capabilities offered by distributed computing (e.g., Grids, Clouds) related methods and technologies. Currently, there is no commonly agreed approach to grid-enable WPS. No implementation allows one to seamlessly execute a geoprocessing calculation following user requirements on different computing backends, ranging from a stand-alone GIS server up to computer clusters and large Grid infrastructures. Considering this issue, this paper presents a proof of concept by mediating different geospatial and Grid software packages, and by proposing an extension of WPS specification through two optional parameters. The applicability of this approach will be demonstrated using a Normalized Difference Vegetation Index (NDVI) mediated WPS process, highlighting benefits, and issues that need to be further investigated to improve performances.
The SCEC Community Modeling Environment(SCEC/CME): A Collaboratory for Seismic Hazard Analysis

NASA Astrophysics Data System (ADS)

Maechling, P. J.; Jordan, T. H.; Minster, J. B.; Moore, R.; Kesselman, C.

2005-12-01

The SCEC Community Modeling Environment (SCEC/CME) Project is an NSF-supported Geosciences/IT partnership that is actively developing an advanced information infrastructure for system-level earthquake science in Southern California. This partnership includes SCEC, USC's Information Sciences Institute (ISI), the San Diego Supercomputer Center (SDSC), the Incorporated Institutions for Research in Seismology (IRIS), and the U.S. Geological Survey. The goal of the SCEC/CME is to develop seismological applications and information technology (IT) infrastructure to support the development of Seismic Hazard Analysis (SHA) programs and other geophysical simulations. The SHA application programs developed on the Project include a Probabilistic Seismic Hazard Analysis system called OpenSHA. OpenSHA computational elements that are currently available include a collection of attenuation relationships, and several Earthquake Rupture Forecasts (ERFs). Geophysicists in the collaboration have also developed Anelastic Wave Models (AWMs) using both finite-difference and finite-element approaches. Earthquake simulations using these codes have been run for a variety of earthquake sources. Rupture Dynamic Model (RDM) codes have also been developed that simulate friction-based fault slip. The SCEC/CME collaboration has also developed IT software and hardware infrastructure to support the development, execution, and analysis of these SHA programs. To support computationally expensive simulations, we have constructed a grid-based scientific workflow system. Using the SCEC grid, project collaborators can submit computations from the SCEC/CME servers to High Performance Computers at USC and TeraGrid High Performance Computing Centers. Data generated and archived by the SCEC/CME is stored in a digital library system, the Storage Resource Broker (SRB). This system provides a robust and secure system for maintaining the association between the data seta and their metadata. To provide an easy-to-use system for constructing SHA computations, a browser-based workflow assembly web portal has been developed. Users can compose complex SHA calculations, specifying SCEC/CME data sets as inputs to calculations, and calling SCEC/CME computational programs to process the data and the output. Knowledge-based software tools have been implemented that utilize ontological descriptions of SHA software and data can validate workflows created with this pathway assembly tool. Data visualization software developed by the collaboration supports analysis and validation of data sets. Several programs have been developed to visualize SCEC/CME data including GMT-based map making software for PSHA codes, 4D wavefield propagation visualization software based on OpenGL, and 3D Geowall-based visualization of earthquakes, faults, and seismic wave propagation. The SCEC/CME Project also helps to sponsor the SCEC UseIT Intern program. The UseIT Intern Program provides research opportunities in both Geosciences and Information Technology to undergraduate students in a variety of fields. The UseIT group has developed a 3D data visualization tool, called SCEC-VDO, as a part of this undergraduate research program.
Towards a Scalable and Adaptive Application Support Platform for Large-Scale Distributed E-Sciences in High-Performance Network Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Chase Qishi; Zhu, Michelle Mengxia

The advent of large-scale collaborative scientific applications has demonstrated the potential for broad scientific communities to pool globally distributed resources to produce unprecedented data acquisition, movement, and analysis. System resources including supercomputers, data repositories, computing facilities, network infrastructures, storage systems, and display devices have been increasingly deployed at national laboratories and academic institutes. These resources are typically shared by large communities of users over Internet or dedicated networks and hence exhibit an inherent dynamic nature in their availability, accessibility, capacity, and stability. Scientific applications using either experimental facilities or computation-based simulations with various physical, chemical, climatic, and biological models featuremore » diverse scientific workflows as simple as linear pipelines or as complex as a directed acyclic graphs, which must be executed and supported over wide-area networks with massively distributed resources. Application users oftentimes need to manually configure their computing tasks over networks in an ad hoc manner, hence significantly limiting the productivity of scientists and constraining the utilization of resources. The success of these large-scale distributed applications requires a highly adaptive and massively scalable workflow platform that provides automated and optimized computing and networking services. This project is to design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a web-based user interface specially tailored for a target application, a set of user libraries, and several easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in heterogeneous high-performance network environments. SWAMP will enable the automation and management of the entire process of scientific workflows with the convenience of a few mouse clicks while hiding the implementation and technical details from end users. Particularly, we will consider two types of applications with distinct performance requirements: data-centric and service-centric applications. For data-centric applications, the main workflow task involves large-volume data generation, catalog, storage, and movement typically from supercomputers or experimental facilities to a team of geographically distributed users; while for service-centric applications, the main focus of workflow is on data archiving, preprocessing, filtering, synthesis, visualization, and other application-specific analysis. We will conduct a comprehensive comparison of existing workflow systems and choose the best suited one with open-source code, a flexible system structure, and a large user base as the starting point for our development. Based on the chosen system, we will develop and integrate new components including a black box design of computing modules, performance monitoring and prediction, and workflow optimization and reconfiguration, which are missing from existing workflow systems. A modular design for separating specification, execution, and monitoring aspects will be adopted to establish a common generic infrastructure suited for a wide spectrum of science applications. We will further design and develop efficient workflow mapping and scheduling algorithms to optimize the workflow performance in terms of minimum end-to-end delay, maximum frame rate, and highest reliability. We will develop and demonstrate the SWAMP system in a local environment, the grid network, and the 100Gpbs Advanced Network Initiative (ANI) testbed. The demonstration will target scientific applications in climate modeling and high energy physics and the functions to be demonstrated include workflow deployment, execution, steering, and reconfiguration. Throughout the project period, we will work closely with the science communities in the fields of climate modeling and high energy physics including Spallation Neutron Source (SNS) and Large Hadron Collider (LHC) projects to mature the system for production use.« less
Real-time performance monitoring and management system

DOEpatents

Budhraja, Vikram S [Los Angeles, CA; Dyer, James D [La Mirada, CA; Martinez Morales, Carlos A [Upland, CA

2007-06-19

A real-time performance monitoring system for monitoring an electric power grid. The electric power grid has a plurality of grid portions, each grid portion corresponding to one of a plurality of control areas. The real-time performance monitoring system includes a monitor computer for monitoring at least one of reliability metrics, generation metrics, transmission metrics, suppliers metrics, grid infrastructure security metrics, and markets metrics for the electric power grid. The data for metrics being monitored by the monitor computer are stored in a data base, and a visualization of the metrics is displayed on at least one display computer having a monitor. The at least one display computer in one said control area enables an operator to monitor the grid portion corresponding to a different said control area.
Physicists Get INSPIREd: INSPIRE Project and Grid Applications

NASA Astrophysics Data System (ADS)

Klem, Jukka; Iwaszkiewicz, Jan

2011-12-01

INSPIRE is the new high-energy physics scientific information system developed by CERN, DESY, Fermilab and SLAC. INSPIRE combines the curated and trusted contents of SPIRES database with Invenio digital library technology. INSPIRE contains the entire HEP literature with about one million records and in addition to becoming the reference HEP scientific information platform, it aims to provide new kinds of data mining services and metrics to assess the impact of articles and authors. Grid and cloud computing provide new opportunities to offer better services in areas that require large CPU and storage resources including document Optical Character Recognition (OCR) processing, full-text indexing of articles and improved metrics. D4Science-II is a European project that develops and operates an e-Infrastructure supporting Virtual Research Environments (VREs). It develops an enabling technology (gCube) which implements a mechanism for facilitating the interoperation of its e-Infrastructure with other autonomously running data e-Infrastructures. As a result, this creates the core of an e-Infrastructure ecosystem. INSPIRE is one of the e-Infrastructures participating in D4Science-II project. In the context of the D4Science-II project, the INSPIRE e-Infrastructure makes available some of its resources and services to other members of the resulting ecosystem. Moreover, it benefits from the ecosystem via a dedicated Virtual Organization giving access to an array of resources ranging from computing and storage resources of grid infrastructures to data and services.
Integrated Computational Materials Engineering for Magnesium in Automotive Body Applications

NASA Astrophysics Data System (ADS)

Allison, John E.; Liu, Baicheng; Boyle, Kevin P.; Hector, Lou; McCune, Robert

This paper provides an overview and progress report for an international collaborative project which aims to develop an ICME infrastructure for magnesium for use in automotive body applications. Quantitative processing-micro structure-property relationships are being developed for extruded Mg alloys, sheet-formed Mg alloys and high pressure die cast Mg alloys. These relationships are captured in computational models which are then linked with manufacturing process simulation and used to provide constitutive models for component performance analysis. The long term goal is to capture this information in efficient computational models and in a web-centered knowledge base. The work is being conducted at leading universities, national labs and industrial research facilities in the US, China and Canada. This project is sponsored by the U.S. Department of Energy, the U.S. Automotive Materials Partnership (USAMP), Chinese Ministry of Science and Technology (MOST) and Natural Resources Canada (NRCan).
A pervasive parallel framework for visualization: final report for FWP 10-014707

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth D.

2014-01-01

We are on the threshold of a transformative change in the basic architecture of highperformance computing. The use of accelerator processors, characterized by large core counts, shared but asymmetrical memory, and heavy thread loading, is quickly becoming the norm in high performance computing. These accelerators represent significant challenges in updating our existing base of software. An intrinsic problem with this transition is a fundamental programming shift from message passing processes to much more fine thread scheduling with memory sharing. Another problem is the lack of stability in accelerator implementation; processor and compiler technology is currently changing rapidly. This report documentsmore » the results of our three-year ASCR project to address these challenges. Our project includes the development of the Dax toolkit, which contains the beginnings of new algorithms for a new generation of computers and the underlying infrastructure to rapidly prototype and build further algorithms as necessary.« less
Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles.

PubMed

Lampa, Samuel; Alvarsson, Jonathan; Spjuth, Ola

2016-01-01

Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster.Graphical abstract.
People at risk - nexus critical infrastructure and society

NASA Astrophysics Data System (ADS)

Heiser, Micha; Thaler, Thomas; Fuchs, Sven

2016-04-01

Strategic infrastructure networks include the highly complex and interconnected systems that are so vital to a city or state that any sudden disruption can result in debilitating impacts on human life, the economy and the society as a whole. Recently, various studies have applied complex network-based models to study the performance and vulnerability of infrastructure systems under various types of attacks and hazards - a major part of them is, particularly after the 9/11 incident, related to terrorism attacks. Here, vulnerability is generally defined as the performance drop of an infrastructure system under a given disruptive event. The performance can be measured by different metrics, which correspond to various levels of resilience. In this paper, we will address vulnerability and exposure of critical infrastructure in the Eastern Alps. The Federal State Tyrol is an international transport route and an essential component of the north-south transport connectivity in Europe. Any interruption of the transport flow leads to incommensurable consequences in terms of indirect losses, since the system does not feature redundant elements at comparable economic efficiency. Natural hazard processes such as floods, debris flows, rock falls and avalanches, endanger this infrastructure line, such as large flood events in 2005 or 2012, rock falls 2014, which had strong impacts to the critical infrastructure, such as disruption of the railway lines (in 2005 and 2012), highways and motorways (in 2014). The aim of this paper is to present how critical infrastructures as well as communities and societies are vulnerable and can be resilient against natural hazard risks and the relative cascading effects to different compartments (industrial, infrastructural, societal, institutional, cultural, etc.), which is the dominant by the type of hazard (avalanches, torrential flooding, debris flow, rock falls). Specific themes will be addressed in various case studies to allow cross-learning and cross-comparison of, for example rural and urban areas, and different scales. Correspondingly, scale-specific resilience indicators and metrics will be developed to tailor methods to specific needs according to the scale of assessment (micro/local and macro/regional) and to the type of infrastructure. The traditional indicators normally used in structural analysis are not sufficient to understand how events happening on the networks can have cascading consequences. Moreover, effects have multidimensional (technical, economic, organizational and human), multiscale (micro and macro) and temporal characteristics (short- to long-term incidence). These considerations will guide to different activities: 1) computation of classic structural analysis indicators on the case studies in order to obtain an identity of the transport infrastructure and; 2) development of a set of new measures of resilience. To mitigate natural hazard risk a large amount of protection measures of different typology have been constructed following inhomogeneous reliability standards. The focus of this case study will be on resilience issues and decision making in the context of a large scale sectorial approach focused on transport infrastructure network.
Genomic cloud computing: legal and ethical points to consider

PubMed Central

Dove, Edward S; Joly, Yann; Tassé, Anne-Marie; Burton, Paul; Chisholm, Rex; Fortier, Isabel; Goodwin, Pat; Harris, Jennifer; Hveem, Kristian; Kaye, Jane; Kent, Alistair; Knoppers, Bartha Maria; Lindpaintner, Klaus; Little, Julian; Riegman, Peter; Ripatti, Samuli; Stolk, Ronald; Bobrow, Martin; Cambon-Thomsen, Anne; Dressler, Lynn; Joly, Yann; Kato, Kazuto; Knoppers, Bartha Maria; Rodriguez, Laura Lyman; McPherson, Treasa; Nicolás, Pilar; Ouellette, Francis; Romeo-Casabona, Carlos; Sarin, Rajiv; Wallace, Susan; Wiesner, Georgia; Wilson, Julia; Zeps, Nikolajs; Simkevitz, Howard; De Rienzo, Assunta; Knoppers, Bartha M

2015-01-01

The biggest challenge in twenty-first century data-intensive genomic science, is developing vast computer infrastructure and advanced software tools to perform comprehensive analyses of genomic data sets for biomedical research and clinical practice. Researchers are increasingly turning to cloud computing both as a solution to integrate data from genomics, systems biology and biomedical data mining and as an approach to analyze data to solve biomedical problems. Although cloud computing provides several benefits such as lower costs and greater efficiency, it also raises legal and ethical issues. In this article, we discuss three key ‘points to consider' (data control; data security, confidentiality and transfer; and accountability) based on a preliminary review of several publicly available cloud service providers' Terms of Service. These ‘points to consider' should be borne in mind by genomic research organizations when negotiating legal arrangements to store genomic data on a large commercial cloud service provider's servers. Diligent genomic cloud computing means leveraging security standards and evaluation processes as a means to protect data and entails many of the same good practices that researchers should always consider in securing their local infrastructure. PMID:25248396

Genomic cloud computing: legal and ethical points to consider.

PubMed

Dove, Edward S; Joly, Yann; Tassé, Anne-Marie; Knoppers, Bartha M

2015-10-01

The biggest challenge in twenty-first century data-intensive genomic science, is developing vast computer infrastructure and advanced software tools to perform comprehensive analyses of genomic data sets for biomedical research and clinical practice. Researchers are increasingly turning to cloud computing both as a solution to integrate data from genomics, systems biology and biomedical data mining and as an approach to analyze data to solve biomedical problems. Although cloud computing provides several benefits such as lower costs and greater efficiency, it also raises legal and ethical issues. In this article, we discuss three key 'points to consider' (data control; data security, confidentiality and transfer; and accountability) based on a preliminary review of several publicly available cloud service providers' Terms of Service. These 'points to consider' should be borne in mind by genomic research organizations when negotiating legal arrangements to store genomic data on a large commercial cloud service provider's servers. Diligent genomic cloud computing means leveraging security standards and evaluation processes as a means to protect data and entails many of the same good practices that researchers should always consider in securing their local infrastructure.
Computational Materials Science and Chemistry: Accelerating Discovery and Innovation through Simulation-Based Engineering and Science

DOE Office of Scientific and Technical Information (OSTI.GOV)

Crabtree, George; Glotzer, Sharon; McCurdy, Bill

This report is based on a SC Workshop on Computational Materials Science and Chemistry for Innovation on July 26-27, 2010, to assess the potential of state-of-the-art computer simulations to accelerate understanding and discovery in materials science and chemistry, with a focus on potential impacts in energy technologies and innovation. The urgent demand for new energy technologies has greatly exceeded the capabilities of today's materials and chemical processes. To convert sunlight to fuel, efficiently store energy, or enable a new generation of energy production and utilization technologies requires the development of new materials and processes of unprecedented functionality and performance. Newmore » materials and processes are critical pacing elements for progress in advanced energy systems and virtually all industrial technologies. Over the past two decades, the United States has developed and deployed the world's most powerful collection of tools for the synthesis, processing, characterization, and simulation and modeling of materials and chemical systems at the nanoscale, dimensions of a few atoms to a few hundred atoms across. These tools, which include world-leading x-ray and neutron sources, nanoscale science facilities, and high-performance computers, provide an unprecedented view of the atomic-scale structure and dynamics of materials and the molecular-scale basis of chemical processes. For the first time in history, we are able to synthesize, characterize, and model materials and chemical behavior at the length scale where this behavior is controlled. This ability is transformational for the discovery process and, as a result, confers a significant competitive advantage. Perhaps the most spectacular increase in capability has been demonstrated in high performance computing. Over the past decade, computational power has increased by a factor of a million due to advances in hardware and software. This rate of improvement, which shows no sign of abating, has enabled the development of computer simulations and models of unprecedented fidelity. We are at the threshold of a new era where the integrated synthesis, characterization, and modeling of complex materials and chemical processes will transform our ability to understand and design new materials and chemistries with predictive power. In turn, this predictive capability will transform technological innovation by accelerating the development and deployment of new materials and processes in products and manufacturing. Harnessing the potential of computational science and engineering for the discovery and development of materials and chemical processes is essential to maintaining leadership in these foundational fields that underpin energy technologies and industrial competitiveness. Capitalizing on the opportunities presented by simulation-based engineering and science in materials and chemistry will require an integration of experimental capabilities with theoretical and computational modeling; the development of a robust and sustainable infrastructure to support the development and deployment of advanced computational models; and the assembly of a community of scientists and engineers to implement this integration and infrastructure. This community must extend to industry, where incorporating predictive materials science and chemistry into design tools can accelerate the product development cycle and drive economic competitiveness. The confluence of new theories, new materials synthesis capabilities, and new computer platforms has created an unprecedented opportunity to implement a "materials-by-design" paradigm with wide-ranging benefits in technological innovation and scientific discovery. The Workshop on Computational Materials Science and Chemistry for Innovation was convened in Bethesda, Maryland, on July 26-27, 2010. Sponsored by the Department of Energy (DOE) Offices of Advanced Scientific Computing Research and Basic Energy Sciences, the workshop brought together 160 experts in materials science, chemistry, and computational science representing more than 65 universities, laboratories, and industries, and four agencies. The workshop examined seven foundational challenge areas in materials science and chemistry: materials for extreme conditions, self-assembly, light harvesting, chemical reactions, designer fluids, thin films and interfaces, and electronic structure. Each of these challenge areas is critical to the development of advanced energy systems, and each can be accelerated by the integrated application of predictive capability with theory and experiment. The workshop concluded that emerging capabilities in predictive modeling and simulation have the potential to revolutionize the development of new materials and chemical processes. Coupled with world-leading materials characterization and nanoscale science facilities, this predictive capability provides the foundation for an innovation ecosystem that can accelerate the discovery, development, and deployment of new technologies, including advanced energy systems. Delivering on the promise of this innovation ecosystem requires the following: Integration of synthesis, processing, characterization, theory, and simulation and modeling. Many of the newly established Energy Frontier Research Centers and Energy Hubs are exploiting this integration. Achieving/strengthening predictive capability in foundational challenge areas. Predictive capability in the seven foundational challenge areas described in this report is critical to the development of advanced energy technologies. Developing validated computational approaches that span vast differences in time and length scales. This fundamental computational challenge crosscuts all of the foundational challenge areas. Similarly challenging is coupling of analytical data from multiple instruments and techniques that are required to link these length and time scales. Experimental validation and quantification of uncertainty in simulation and modeling. Uncertainty quantification becomes increasingly challenging as simulations become more complex. Robust and sustainable computational infrastructure, including software and applications. For modeling and simulation, software equals infrastructure. To validate the computational tools, software is critical infrastructure that effectively translates huge arrays of experimental data into useful scientific understanding. An integrated approach for managing this infrastructure is essential. Efficient transfer and incorporation of simulation-based engineering and science in industry. Strategies for bridging the gap between research and industrial applications and for widespread industry adoption of integrated computational materials engineering are needed.« less
Flexible Description and Adaptive Processing of Earth Observation Data through the BigEarth Platform

NASA Astrophysics Data System (ADS)

Gorgan, Dorian; Bacu, Victor; Stefanut, Teodor; Nandra, Cosmin; Mihon, Danut

2016-04-01

The Earth Observation data repositories extending periodically by several terabytes become a critical issue for organizations. The management of the storage capacity of such big datasets, accessing policy, data protection, searching, and complex processing require high costs that impose efficient solutions to balance the cost and value of data. Data can create value only when it is used, and the data protection has to be oriented toward allowing innovation that sometimes depends on creative people, which achieve unexpected valuable results through a flexible and adaptive manner. The users need to describe and experiment themselves different complex algorithms through analytics in order to valorize data. The analytics uses descriptive and predictive models to gain valuable knowledge and information from data analysis. Possible solutions for advanced processing of big Earth Observation data are given by the HPC platforms such as cloud. With platforms becoming more complex and heterogeneous, the developing of applications is even harder and the efficient mapping of these applications to a suitable and optimum platform, working on huge distributed data repositories, is challenging and complex as well, even by using specialized software services. From the user point of view, an optimum environment gives acceptable execution times, offers a high level of usability by hiding the complexity of computing infrastructure, and supports an open accessibility and control to application entities and functionality. The BigEarth platform [1] supports the entire flow of flexible description of processing by basic operators and adaptive execution over cloud infrastructure [2]. The basic modules of the pipeline such as the KEOPS [3] set of basic operators, the WorDeL language [4], the Planner for sequential and parallel processing, and the Executor through virtual machines, are detailed as the main components of the BigEarth platform [5]. The presentation exemplifies the development of some Earth Observation oriented applications based on flexible description of processing, and adaptive and portable execution over Cloud infrastructure. Main references for further information: [1] BigEarth project, http://cgis.utcluj.ro/projects/bigearth [2] Gorgan, D., "Flexible and Adaptive Processing of Earth Observation Data over High Performance Computation Architectures", International Conference and Exhibition Satellite 2015, August 17-19, Houston, Texas, USA. [3] Mihon, D., Bacu, V., Colceriu, V., Gorgan, D., "Modeling of Earth Observation Use Cases through the KEOPS System", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp. 455-460, (2015). [4] Nandra, C., Gorgan, D., "Workflow Description Language for Defining Big Earth Data Processing Tasks", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp. 461-468, (2015). [5] Bacu, V., Stefan, T., Gorgan, D., "Adaptive Processing of Earth Observation Data on Cloud Infrastructures Based on Workflow Description", Proceedings of the Intelligent Computer Communication and Processing (ICCP), IEEE-Press, pp.444-454, (2015).
Analysis of CERN computing infrastructure and monitoring data

NASA Astrophysics Data System (ADS)

Nieke, C.; Lassnig, M.; Menichetti, L.; Motesnitsalis, E.; Duellmann, D.

2015-12-01

Optimizing a computing infrastructure on the scale of LHC requires a quantitative understanding of a complex network of many different resources and services. For this purpose the CERN IT department and the LHC experiments are collecting a large multitude of logs and performance probes, which are already successfully used for short-term analysis (e.g. operational dashboards) within each group. The IT analytics working group has been created with the goal to bring data sources from different services and on different abstraction levels together and to implement a suitable infrastructure for mid- to long-term statistical analysis. It further provides a forum for joint optimization across single service boundaries and the exchange of analysis methods and tools. To simplify access to the collected data, we implemented an automated repository for cleaned and aggregated data sources based on the Hadoop ecosystem. This contribution describes some of the challenges encountered, such as dealing with heterogeneous data formats, selecting an efficient storage format for map reduce and external access, and will describe the repository user interface. Using this infrastructure we were able to quantitatively analyze the relationship between CPU/wall fraction, latency/throughput constraints of network and disk and the effective job throughput. In this contribution we will first describe the design of the shared analysis infrastructure and then present a summary of first analysis results from the combined data sources.
Control System Applicable Use Assessment of the Secure Computing Corporation - Secure Firewall (Sidewinder)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hadley, Mark D.; Clements, Samuel L.

2009-01-01

Battelle’s National Security & Defense objective is, “applying unmatched expertise and unique facilities to deliver homeland security solutions. From detection and protection against weapons of mass destruction to emergency preparedness/response and protection of critical infrastructure, we are working with industry and government to integrate policy, operational, technological, and logistical parameters that will secure a safe future”. In an ongoing effort to meet this mission, engagements with industry that are intended to improve operational and technical attributes of commercial solutions that are related to national security initiatives are necessary. This necessity will ensure that capabilities for protecting critical infrastructure assets aremore » considered by commercial entities in their development, design, and deployment lifecycles thus addressing the alignment of identified deficiencies and improvements needed to support national cyber security initiatives. The Secure Firewall (Sidewinder) appliance by Secure Computing was assessed for applicable use in critical infrastructure control system environments, such as electric power, nuclear and other facilities containing critical systems that require augmented protection from cyber threat. The testing was performed in the Pacific Northwest National Laboratory’s (PNNL) Electric Infrastructure Operations Center (EIOC). The Secure Firewall was tested in a network configuration that emulates a typical control center network and then evaluated. A number of observations and recommendations are included in this report relating to features currently included in the Secure Firewall that support critical infrastructure security needs.« less
Tools For Evaluating The Benefits Of Green Infrastructure For Urban Water Management: Informational Brief (WERF Report INFR5SG09b)

EPA Science Inventory

This report identifies the practical challenges for evaluating the benefits of green infrastructure. It also discusses a more systematic approach to integrate cost-effective, high-performance urban water infrastructure practices with other environmental, social, and economic goa...
Primer to Design Safe School Projects in Case of Terrorist Attacks and School Shootings. Buildings and Infrastructure Protection Series. FEMA-428/BIPS-07/January 2012. Edition 2

ERIC Educational Resources Information Center

Chipley, Michael; Lyon, Wesley; Smilowitz, Robert; Williams, Pax; Arnold, Christopher; Blewett, William; Hazen, Lee; Krimgold, Fred

2012-01-01

This publication, part of the new Building and Infrastructure Protection Series (BIPS) published by the Department of Homeland Security (DHS) Science and Technology Directorate (S&T) Infrastructure Protection and Disaster Management Division (IDD), serves to advance high performance and integrated design for buildings and infrastructure. This…
Changing from computing grid to knowledge grid in life-science grid.

PubMed

Talukdar, Veera; Konar, Amit; Datta, Ayan; Choudhury, Anamika Roy

2009-09-01

Grid computing has a great potential to become a standard cyber infrastructure for life sciences that often require high-performance computing and large data handling, which exceeds the computing capacity of a single institution. Grid computer applies the resources of many computers in a network to a single problem at the same time. It is useful to scientific problems that require a great number of computer processing cycles or access to a large amount of data.As biologists,we are constantly discovering millions of genes and genome features, which are assembled in a library and distributed on computers around the world.This means that new, innovative methods must be developed that exploit the re-sources available for extensive calculations - for example grid computing.This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing a "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. By extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community.
CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

PubMed Central

2011-01-01

Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105
The GMOS cyber(e)-infrastructure: advanced services for supporting science and policy.

PubMed

Cinnirella, S; D'Amore, F; Bencardino, M; Sprovieri, F; Pirrone, N

2014-03-01

The need for coordinated, systematized and catalogued databases on mercury in the environment is of paramount importance as improved information can help the assessment of the effectiveness of measures established to phase out and ban mercury. Long-term monitoring sites have been established in a number of regions and countries for the measurement of mercury in ambient air and wet deposition. Long term measurements of mercury concentration in biota also produced a huge amount of information, but such initiatives are far from being within a global, systematic and interoperable approach. To address these weaknesses the on-going Global Mercury Observation System (GMOS) project ( www.gmos.eu ) established a coordinated global observation system for mercury as well it retrieved historical data ( www.gmos.eu/sdi ). To manage such large amount of information a technological infrastructure was planned. This high-performance back-end resource associated with sophisticated client applications enables data storage, computing services, telecommunications networks and all services necessary to support the activity. This paper reports the architecture definition of the GMOS Cyber(e)-Infrastructure and the services developed to support science and policy, including the United Nation Environmental Program. It finally describes new possibilities in data analysis and data management through client applications.
An adaptive process-based cloud infrastructure for space situational awareness applications

NASA Astrophysics Data System (ADS)

Liu, Bingwei; Chen, Yu; Shen, Dan; Chen, Genshe; Pham, Khanh; Blasch, Erik; Rubin, Bruce

2014-06-01

Space situational awareness (SSA) and defense space control capabilities are top priorities for groups that own or operate man-made spacecraft. Also, with the growing amount of space debris, there is an increase in demand for contextual understanding that necessitates the capability of collecting and processing a vast amount sensor data. Cloud computing, which features scalable and flexible storage and computing services, has been recognized as an ideal candidate that can meet the large data contextual challenges as needed by SSA. Cloud computing consists of physical service providers and middleware virtual machines together with infrastructure, platform, and software as service (IaaS, PaaS, SaaS) models. However, the typical Virtual Machine (VM) abstraction is on a per operating systems basis, which is at too low-level and limits the flexibility of a mission application architecture. In responding to this technical challenge, a novel adaptive process based cloud infrastructure for SSA applications is proposed in this paper. In addition, the details for the design rationale and a prototype is further examined. The SSA Cloud (SSAC) conceptual capability will potentially support space situation monitoring and tracking, object identification, and threat assessment. Lastly, the benefits of a more granular and flexible cloud computing resources allocation are illustrated for data processing and implementation considerations within a representative SSA system environment. We show that the container-based virtualization performs better than hypervisor-based virtualization technology in an SSA scenario.
Implementing Computer-Aided Instruction in Distance Education: An Infrastructure. RR/89-06.

ERIC Educational Resources Information Center

Kotze, Paula

The infrastructure required for the implementation of computer aided instruction is described with particular reference to the distance education environment at the University of South Africa. A review of the state of the art of online distance education in the United States and Europe is followed by an outline of the proposed infrastructure for…
Multimedia courseware in an open-systems environment: a DoD strategy

NASA Astrophysics Data System (ADS)

Welsch, Lawrence A.

1991-03-01

The federal government is about to invest billions of dollars to develop multimedia training materials for delivery on computer-based interactive training systems. Acquisition of a variety of computers and peripheral devices hosting various operating systems and suites of authoring system software will be necessary to facilitate the development of this courseware. There is no single source that will satisfy all needs. Although high-performance, low-cost interactive training hardware is available, the products have proprietary software interfaces. Because the interfaces are proprietary, expensive reprogramming is usually required to adapt such software products to other platforms. This costly reprogramming could be eliminated by adopting standard software interfaces. DoD's Portable Courseware Project (PORTCO) is typical of projects worldwide that require standard software interfaces. This paper articulates the strategy whereby PORTCO leverages the open systems movement and the new realities of information technology. These realities encompass changes in the pace at which new technology becomes available, changes in organizational goals and philosophy, new roles of vendors and users, changes in the procurement process, and acceleration toward open system environments. The PORTCO strategy is applicable to all projects and systems that require open systems to achieve mission objectives. The federal goal is to facilitate the creation of an environment in which high quality portable courseware is available as commercial off-the-shelf products and is competitively supplied by a variety of vendors. In order to achieve this goal a system architecture incorporating standards to meet the users' needs must be established. The Request for Architecture (RFA) developed cooperatively by DoD and the National Institute of Standards and Technology (NIST) will generate the PORTCO systems architecture. This architecture must freely integrate the courseware and authoring software from the lower levels of machine architecture and systems service implementation. In addition, the systems architecture will establish how the application-specific technologies relate to other technologies. Further, a computer-based interactive training applications profile must be developed. This profile, along with the systems architecture derived as a result of the RFA, provides the basis for identifying the needed standards. NIST will then accelerate the development of these standards using, but not restricted to, existing standards activities within established standards forums. The federal multimedia courseware effort has adopted the Interactive Multimedia Association (INA) Recommended Practices for Interactive Video Portability as the baseline for the migration of computer-based interactive training systems to an open systems environment based upon international standards. The PORTCO strategy includes an evolutionary migration to a standards-based, Open System Environments (OSE). An important aspect of this migration strategy is to move to open systems via stepwise evolution rather than via quantum leaps. Another area of concern is that of infrastructure issues, such as maintaining and supporting the technologies required for computer-based interactive training. The federal multimedia initiative will use the RFA-based architecture to differentiate between those technologies that can be maintained and supported by existing infrastructure mechanisms and those that require new mechanisms. Existing infrastructure mechanisms will be used and where infrastructure mechanisms do not exist, the approach will be to place high priority on establishing the appropriate mechanisms. Establishing an infrastructure mechanism is a nontrivial task requiring sustained investment of resources.
SSME Investment in Turbomachinery Inducer Impeller Design Tools and Methodology

NASA Technical Reports Server (NTRS)

Zoladz, Thomas; Mitchell, William; Lunde, Kevin

2010-01-01

Within the rocket engine industry, SSME turbomachines are the de facto standards of success with regard to meeting aggressive performance requirements under challenging operational environments. Over the Shuttle era, SSME has invested heavily in our national inducer impeller design infrastructure. While both low and high pressure turbopump failures/anomaly resolution efforts spurred some of these investments, the SSME program was a major benefactor of key areas of turbomachinery inducer-impeller research outside of flight manifest pressures. Over the past several decades, key turbopump internal environments have been interrogated via highly instrumented hot-fire and cold-flow testing. Likewise, SSME has sponsored the advancement of time accurate and cavitating inducer impeller computation fluid dynamics (CFD) tools. These investments together have led to a better understanding of the complex internal flow fields within aggressive high performing inducers and impellers. New design tools and methodologies have evolved which intend to provide confident blade designs which strike an appropriate balance between performance and self induced load management.
Tempest: GPU-CPU computing for high-throughput database spectral matching.

PubMed

Milloy, Jeffrey A; Faherty, Brendan K; Gerber, Scott A

2012-07-06

Modern mass spectrometers are now capable of producing hundreds of thousands of tandem (MS/MS) spectra per experiment, making the translation of these fragmentation spectra into peptide matches a common bottleneck in proteomics research. When coupled with experimental designs that enrich for post-translational modifications such as phosphorylation and/or include isotopically labeled amino acids for quantification, additional burdens are placed on this computational infrastructure by shotgun sequencing. To address this issue, we have developed a new database searching program that utilizes the massively parallel compute capabilities of a graphical processing unit (GPU) to produce peptide spectral matches in a very high throughput fashion. Our program, named Tempest, combines efficient database digestion and MS/MS spectral indexing on a CPU with fast similarity scoring on a GPU. In our implementation, the entire similarity score, including the generation of full theoretical peptide candidate fragmentation spectra and its comparison to experimental spectra, is conducted on the GPU. Although Tempest uses the classical SEQUEST XCorr score as a primary metric for evaluating similarity for spectra collected at unit resolution, we have developed a new "Accelerated Score" for MS/MS spectra collected at high resolution that is based on a computationally inexpensive dot product but exhibits scoring accuracy similar to that of the classical XCorr. In our experience, Tempest provides compute-cluster level performance in an affordable desktop computer.
Data Center Consolidation: A Step towards Infrastructure Clouds

NASA Astrophysics Data System (ADS)

Winter, Markus

Application service providers face enormous challenges and rising costs in managing and operating a growing number of heterogeneous system and computing landscapes. Limitations of traditional computing environments force IT decision-makers to reorganize computing resources within the data center, as continuous growth leads to an inefficient utilization of the underlying hardware infrastructure. This paper discusses a way for infrastructure providers to improve data center operations based on the findings of a case study on resource utilization of very large business applications and presents an outlook beyond server consolidation endeavors, transforming corporate data centers into compute clouds.
Network and computing infrastructure for scientific applications in Georgia

NASA Astrophysics Data System (ADS)

Kvatadze, R.; Modebadze, Z.

2016-09-01

Status of network and computing infrastructure and available services for research and education community of Georgia are presented. Research and Educational Networking Association - GRENA provides the following network services: Internet connectivity, network services, cyber security, technical support, etc. Computing resources used by the research teams are located at GRENA and at major state universities. GE-01-GRENA site is included in European Grid infrastructure. Paper also contains information about programs of Learning Center and research and development projects in which GRENA is participating.
Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment.

PubMed

Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Abdulhamid, Shafi'i Muhammad; Usman, Mohammed Joda

2017-01-01

Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing.
Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment

PubMed Central

Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Usman, Mohammed Joda

2017-01-01

Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing. PMID:28467505
Real-time contaminant sensing and control in civil infrastructure systems

NASA Astrophysics Data System (ADS)

Rimer, Sara; Katopodes, Nikolaos

2014-11-01

A laboratory-scale prototype has been designed and implemented to test the feasibility of real-time contaminant sensing and control in civil infrastructure systems. A blower wind tunnel is the basis of the prototype design, with propylene glycol smoke as the ``contaminant.'' A camera sensor and compressed-air vacuum nozzle system is set up at the test section portion of the prototype to visually sense and then control the contaminant; a real-time controller is programmed to read in data from the camera sensor and administer pressure to regulators controlling the compressed air operating the vacuum nozzles. A computational fluid dynamics model is being integrated in with this prototype to inform the correct pressure to supply to the regulators in order to optimally control the contaminant's removal from the prototype. The performance of the prototype has been evaluated against the computational fluid dynamics model and is discussed in this presentation. Furthermore, the initial performance of the sensor-control system implemented in the test section of the prototype is discussed. NSF-CMMI 0856438.

It Takes A 'Village of Partnerships' To Raise A 'Big Data Facility' In A 'Big Data World'.

NASA Astrophysics Data System (ADS)

Evans, B. J. K.; Wyborn, L. A.

2015-12-01

The National Computational Infrastructure (NCI) at the Australian National University (ANU) has collocated a priority set of national and international data assets that span a wide range of domains from climate, oceans, geophysics, environment, astronomy, bioinformatics and the social sciences. The data are located on a 10 PB High Performance Data (HPD) Node that is integrated with a High Performance Computing (HPC) facility to enable a new style of Data-intensive in-situ analysis. Investigators can either log in via direct access to the data collections: access is also provided via modern standards-based web services. The NCI integrated HPD/HPC facility is supported by a 'village' of partnerships. NCI itself operates as a formal partnership between the ANU and three major National Scientific Agencies: CSIRO, the Bureau of Meteorology (BoM) and Geoscience Australia (GA). These same agencies are also the custodians of many of the national data collections hosted at NCI, and in partnership with other collaborating national and overseas organisations have agreed to work together to develop a shared data environment and use standards that enable interoperability between the collections, rather than isolating their collections as separate entities that each agency runs independently. To effectively analyse these complex and large volume data sets, NCI has entered into a series of national and national partnerships with international agencies to provide world-class digital analytical environments that allow computational to be conducted and shared. The ability for government and research to work in partnership at the NCI has been well established over the last decade, mainly with BoM, CSIRO, and GA. New emerging industry linkages are now being encouraged by revised government agendas and these promises to foster a new series of partnerships that will increase uptake of this major government funded infrastructure and promise to foster further collaboration and innovation.
A new algorithm for grid-based hydrologic analysis by incorporating stormwater infrastructure

NASA Astrophysics Data System (ADS)

Choi, Yosoon; Yi, Huiuk; Park, Hyeong-Dong

2011-08-01

We developed a new algorithm, the Adaptive Stormwater Infrastructure (ASI) algorithm, to incorporate ancillary data sets related to stormwater infrastructure into the grid-based hydrologic analysis. The algorithm simultaneously considers the effects of the surface stormwater collector network (e.g., diversions, roadside ditches, and canals) and underground stormwater conveyance systems (e.g., waterway tunnels, collector pipes, and culverts). The surface drainage flows controlled by the surface runoff collector network are superimposed onto the flow directions derived from a DEM. After examining the connections between inlets and outfalls in the underground stormwater conveyance system, the flow accumulation and delineation of watersheds are calculated based on recursive computations. Application of the algorithm to the Sangdong tailings dam in Korea revealed superior performance to that of a conventional D8 single-flow algorithm in terms of providing reasonable hydrologic information on watersheds with stormwater infrastructure.
Robust, Optimal Water Infrastructure Planning Under Deep Uncertainty Using Metamodels

NASA Astrophysics Data System (ADS)

Maier, H. R.; Beh, E. H. Y.; Zheng, F.; Dandy, G. C.; Kapelan, Z.

2015-12-01

Optimal long-term planning plays an important role in many water infrastructure problems. However, this task is complicated by deep uncertainty about future conditions, such as the impact of population dynamics and climate change. One way to deal with this uncertainty is by means of robustness, which aims to ensure that water infrastructure performs adequately under a range of plausible future conditions. However, as robustness calculations require computationally expensive system models to be run for a large number of scenarios, it is generally computationally intractable to include robustness as an objective in the development of optimal long-term infrastructure plans. In order to overcome this shortcoming, an approach is developed that uses metamodels instead of computationally expensive simulation models in robustness calculations. The approach is demonstrated for the optimal sequencing of water supply augmentation options for the southern portion of the water supply for Adelaide, South Australia. A 100-year planning horizon is subdivided into ten equal decision stages for the purpose of sequencing various water supply augmentation options, including desalination, stormwater harvesting and household rainwater tanks. The objectives include the minimization of average present value of supply augmentation costs, the minimization of average present value of greenhouse gas emissions and the maximization of supply robustness. The uncertain variables are rainfall, per capita water consumption and population. Decision variables are the implementation stages of the different water supply augmentation options. Artificial neural networks are used as metamodels to enable all objectives to be calculated in a computationally efficient manner at each of the decision stages. The results illustrate the importance of identifying optimal staged solutions to ensure robustness and sustainability of water supply into an uncertain long-term future.
The iPlant Collaborative: Cyberinfrastructure for Plant Biology.

PubMed

Goff, Stephen A; Vaughn, Matthew; McKay, Sheldon; Lyons, Eric; Stapleton, Ann E; Gessler, Damian; Matasci, Naim; Wang, Liya; Hanlon, Matthew; Lenards, Andrew; Muir, Andy; Merchant, Nirav; Lowry, Sonya; Mock, Stephen; Helmke, Matthew; Kubach, Adam; Narro, Martha; Hopkins, Nicole; Micklos, David; Hilgert, Uwe; Gonzales, Michael; Jordan, Chris; Skidmore, Edwin; Dooley, Rion; Cazes, John; McLay, Robert; Lu, Zhenyuan; Pasternak, Shiran; Koesterke, Lars; Piel, William H; Grene, Ruth; Noutsos, Christos; Gendler, Karla; Feng, Xin; Tang, Chunlao; Lent, Monica; Kim, Seung-Jin; Kvilekval, Kristian; Manjunath, B S; Tannen, Val; Stamatakis, Alexandros; Sanderson, Michael; Welch, Stephen M; Cranston, Karen A; Soltis, Pamela; Soltis, Doug; O'Meara, Brian; Ane, Cecile; Brutnell, Tom; Kleibenstein, Daniel J; White, Jeffery W; Leebens-Mack, James; Donoghue, Michael J; Spalding, Edgar P; Vision, Todd J; Myers, Christopher R; Lowenthal, David; Enquist, Brian J; Boyle, Brad; Akoglu, Ali; Andrews, Greg; Ram, Sudha; Ware, Doreen; Stein, Lincoln; Stanzione, Dan

2011-01-01

The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanity's projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services.
The iPlant Collaborative: Cyberinfrastructure for Plant Biology

PubMed Central

Goff, Stephen A.; Vaughn, Matthew; McKay, Sheldon; Lyons, Eric; Stapleton, Ann E.; Gessler, Damian; Matasci, Naim; Wang, Liya; Hanlon, Matthew; Lenards, Andrew; Muir, Andy; Merchant, Nirav; Lowry, Sonya; Mock, Stephen; Helmke, Matthew; Kubach, Adam; Narro, Martha; Hopkins, Nicole; Micklos, David; Hilgert, Uwe; Gonzales, Michael; Jordan, Chris; Skidmore, Edwin; Dooley, Rion; Cazes, John; McLay, Robert; Lu, Zhenyuan; Pasternak, Shiran; Koesterke, Lars; Piel, William H.; Grene, Ruth; Noutsos, Christos; Gendler, Karla; Feng, Xin; Tang, Chunlao; Lent, Monica; Kim, Seung-Jin; Kvilekval, Kristian; Manjunath, B. S.; Tannen, Val; Stamatakis, Alexandros; Sanderson, Michael; Welch, Stephen M.; Cranston, Karen A.; Soltis, Pamela; Soltis, Doug; O'Meara, Brian; Ane, Cecile; Brutnell, Tom; Kleibenstein, Daniel J.; White, Jeffery W.; Leebens-Mack, James; Donoghue, Michael J.; Spalding, Edgar P.; Vision, Todd J.; Myers, Christopher R.; Lowenthal, David; Enquist, Brian J.; Boyle, Brad; Akoglu, Ali; Andrews, Greg; Ram, Sudha; Ware, Doreen; Stein, Lincoln; Stanzione, Dan

2011-01-01

The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanity's projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services. PMID:22645531
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System.

PubMed

Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C; Parisot, Sarah; Rueckert, Daniel

2017-01-01

OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI).
Reproducible Large-Scale Neuroimaging Studies with the OpenMOLE Workflow Management System

PubMed Central

Passerat-Palmbach, Jonathan; Reuillon, Romain; Leclaire, Mathieu; Makropoulos, Antonios; Robinson, Emma C.; Parisot, Sarah; Rueckert, Daniel

2017-01-01

OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. OpenMOLE hides the complexity of designing complex experiments thanks to its DSL. Users can embed their own applications and scale their pipelines from a small prototype running on their desktop computer to a large-scale study harnessing distributed computing infrastructures, simply by changing a single line in the pipeline definition. The construction of the pipeline itself is decoupled from the execution context. The high-level DSL abstracts the underlying execution environment, contrary to classic shell-script based pipelines. These two aspects allow pipelines to be shared and studies to be replicated across different computing environments. Workflows can be run as traditional batch pipelines or coupled with OpenMOLE's advanced exploration methods in order to study the behavior of an application, or perform automatic parameter tuning. In this work, we briefly present the strong assets of OpenMOLE and detail recent improvements targeting re-executability of workflows across various Linux platforms. We have tightly coupled OpenMOLE with CARE, a standalone containerization solution that allows re-executing on a Linux host any application that has been packaged on another Linux host previously. The solution is evaluated against a Python-based pipeline involving packages such as scikit-learn as well as binary dependencies. All were packaged and re-executed successfully on various HPC environments, with identical numerical results (here prediction scores) obtained on each environment. Our results show that the pair formed by OpenMOLE and CARE is a reliable solution to generate reproducible results and re-executable pipelines. A demonstration of the flexibility of our solution showcases three neuroimaging pipelines harnessing distributed computing environments as heterogeneous as local clusters or the European Grid Infrastructure (EGI). PMID:28381997
DIRAC universal pilots

NASA Astrophysics Data System (ADS)

Stagni, F.; McNab, A.; Luzzi, C.; Krzemien, W.; Consortium, DIRAC

2017-10-01

In the last few years, new types of computing models, such as IAAS (Infrastructure as a Service) and IAAC (Infrastructure as a Client), gained popularity. New resources may come as part of pledged resources, while others are in the form of opportunistic ones. Most but not all of these new infrastructures are based on virtualization techniques. In addition, some of them, present opportunities for multi-processor computing slots to the users. Virtual Organizations are therefore facing heterogeneity of the available resources and the use of an Interware software like DIRAC to provide the transparent, uniform interface has become essential. The transparent access to the underlying resources is realized by implementing the pilot model. DIRAC’s newest generation of generic pilots (the so-called Pilots 2.0) are the “pilots for all the skies”, and have been successfully released in production more than a year ago. They use a plugin mechanism that makes them easily adaptable. Pilots 2.0 have been used for fetching and running jobs on every type of resource, being it a Worker Node (WN) behind a CREAM/ARC/HTCondor/DIRAC Computing element, a Virtual Machine running on IaaC infrastructures like Vac or BOINC, on IaaS cloud resources managed by Vcycle, the LHCb High Level Trigger farm nodes, and any type of opportunistic computing resource. Make a machine a “Pilot Machine”, and all diversities between them will disappear. This contribution describes how pilots are made suitable for different resources, and the recent steps taken towards a fully unified framework, including monitoring. Also, the cases of multi-processor computing slots either on real or virtual machines, with the whole node or a partition of it, is discussed.
Estuarine wetland evolution including sea-level rise and infrastructure effects.

NASA Astrophysics Data System (ADS)

Rodriguez, Jose Fernando; Trivisonno, Franco; Rojas, Steven Sandi; Riccardi, Gerardo; Stenta, Hernan; Saco, Patricia Mabel

2015-04-01

Estuarine wetlands are an extremely valuable resource in terms of biotic diversity, flood attenuation, storm surge protection, groundwater recharge, filtering of surface flows and carbon sequestration. On a large scale the survival of these systems depends on the slope of the land and a balance between the rates of accretion and sea-level rise, but local man-made flow disturbances can have comparable effects. Climate change predictions for most of Australia include an accelerated sea level rise, which may challenge the survival of estuarine wetlands. Furthermore, coastal infrastructure poses an additional constraint on the adaptive capacity of these ecosystems. Numerical models are increasingly being used to assess wetland dynamics and to help manage some of these situations. We present results of a wetland evolution model that is based on computed values of hydroperiod and tidal range that drive vegetation preference. Our first application simulates the long term evolution of an Australian wetland heavily constricted by infrastructure that is undergoing the effects of predicted accelerated sea level rise. The wetland presents a vegetation zonation sequence mudflats - mangrove - saltmarsh from the seaward margin and up the topographic gradient but is also affected by compartmentalization due to internal road embankments and culverts that effectively attenuates tidal input to the upstream compartments. For this reason, the evolution model includes a 2D hydrodynamic module which is able to handle man-made flow controls and spatially varying roughness. It continually simulates tidal inputs into the wetland and computes annual values of hydroperiod and tidal range to update vegetation distribution based on preference to hydrodynamic conditions of the different vegetation types. It also computes soil accretion rates and updates roughness coefficient values according to evolving vegetation types. In order to explore in more detail the magnitude of flow attenuation due to roughness and its effects on the computation of tidal range and hydroperiod, we performed numerical experiments simulating floodplain flow on the side of a tidal creek using different roughness values. Even though the values of roughness that produce appreciable changes in hydroperiod and tidal range are relatively high, they are within the range expected for some of the wetland vegetation. Both applications of the model show that flow attenuation can play a major role in wetland hydrodynamics and that its effects must be considered when predicting wetland evolution under climate change scenarios, particularly in situations where existing infrastructure affects the flow.
A new implementation of full resolution SBAS-DInSAR processing chain for the effective monitoring of structures and infrastructures

NASA Astrophysics Data System (ADS)

Bonano, Manuela; Buonanno, Sabatino; Ojha, Chandrakanta; Berardino, Paolo; Lanari, Riccardo; Zeni, Giovanni; Manunta, Michele

2017-04-01

The advanced DInSAR technique referred to as Small BAseline Subset (SBAS) algorithm has already largely demonstrated its effectiveness to carry out multi-scale and multi-platform surface deformation analyses relevant to both natural and man-made hazards. Thanks to its capability to generate displacement maps and long-term deformation time series at both regional (low resolution analysis) and local (full resolution analysis) spatial scales, it allows to get more insights on the spatial and temporal patterns of localized displacements relevant to single buildings and infrastructures over extended urban areas, with a key role in supporting risk mitigation and preservation activities. The extensive application of the multi-scale SBAS-DInSAR approach in many scientific contexts has gone hand in hand with new SAR satellite mission development, characterized by different frequency bands, spatial resolution, revisit times and ground coverage. This brought to the generation of huge DInSAR data stacks to be efficiently handled, processed and archived, with a strong impact on both the data storage and the computational requirements needed for generating the full resolution SBAS-DInSAR results. Accordingly, innovative and effective solutions for the automatic processing of massive SAR data archives and for the operational management of the derived SBAS-DInSAR products need to be designed and implemented, by exploiting the high efficiency (in terms of portability, scalability and computing performances) of the new ICT methodologies. In this work, we present a novel parallel implementation of the full resolution SBAS-DInSAR processing chain, aimed at investigating localized displacements affecting single buildings and infrastructures relevant to very large urban areas, relying on different granularity level parallelization strategies. The image granularity level is applied in most steps of the SBAS-DInSAR processing chain and exploits the multiprocessor systems with distributed memory. Moreover, in some processing steps very heavy from the computational point of view, the Graphical Processing Units (GPU) are exploited for the processing of blocks working on a pixel-by-pixel basis, requiring strong modifications on some key parts of the sequential full resolution SBAS-DInSAR processing chain. GPU processing is implemented by efficiently exploiting parallel processing architectures (as CUDA) for increasing the computing performances, in terms of optimization of the available GPU memory, as well as reduction of the Input/Output operations on the GPU and of the whole processing time for specific blocks w.r.t. the corresponding sequential implementation, particularly critical in presence of huge DInSAR datasets. Moreover, to efficiently handle the massive amount of DInSAR measurements provided by the new generation SAR constellations (CSK and Sentinel-1), we perform a proper re-design strategy aimed at the robust assimilation of the full resolution SBAS-DInSAR results into the web-based Geonode platform of the Spatial Data Infrastructure, thus allowing the efficient management, analysis and integration of the interferometric results with different data sources.
A characterization of workflow management systems for extreme-scale applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia

We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
Computational Science in Armenia (Invited Talk)

NASA Astrophysics Data System (ADS)

Marandjian, H.; Shoukourian, Yu.

This survey is devoted to the development of informatics and computer science in Armenia. The results in theoretical computer science (algebraic models, solutions to systems of general form recursive equations, the methods of coding theory, pattern recognition and image processing), constitute the theoretical basis for developing problem-solving-oriented environments. As examples can be mentioned: a synthesizer of optimized distributed recursive programs, software tools for cluster-oriented implementations of two-dimensional cellular automata, a grid-aware web interface with advanced service trading for linear algebra calculations. In the direction of solving scientific problems that require high-performance computing resources, examples of completed projects include the field of physics (parallel computing of complex quantum systems), astrophysics (Armenian virtual laboratory), biology (molecular dynamics study of human red blood cell membrane), meteorology (implementing and evaluating the Weather Research and Forecast Model for the territory of Armenia). The overview also notes that the Institute for Informatics and Automation Problems of the National Academy of Sciences of Armenia has established a scientific and educational infrastructure, uniting computing clusters of scientific and educational institutions of the country and provides the scientific community with access to local and international computational resources, that is a strong support for computational science in Armenia.
A characterization of workflow management systems for extreme-scale applications

DOE PAGES

Ferreira da Silva, Rafael; Filgueira, Rosa; Pietri, Ilia; ...

2017-02-16

We present that the automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today’s computational and data science applications that process vast amounts of data keep increasing, there is a compellingmore » case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. Finally, the paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.« less
Understanding the Performance and Potential of Cloud Computing for Scientific Applications

DOE PAGES

Sadooghi, Iman; Martin, Jesus Hernandez; Li, Tonglin; ...

2015-02-19

In this paper, commercial clouds bring a great opportunity to the scientific computing area. Scientific applications usually require significant resources, however not all scientists have access to sufficient high-end computing systems, may of which can be found in the Top500 list. Cloud Computing has gained the attention of scientists as a competitive resource to run HPC applications at a potentially lower cost. But as a different infrastructure, it is unclear whether clouds are capable of running scientific applications with a reasonable performance per money spent. This work studies the performance of public clouds and places this performance in context tomore » price. We evaluate the raw performance of different services of AWS cloud in terms of the basic resources, such as compute, memory, network and I/O. We also evaluate the performance of the scientific applications running in the cloud. This paper aims to assess the ability of the cloud to perform well, as well as to evaluate the cost of the cloud running scientific applications. We developed a full set of metrics and conducted a comprehensive performance evlauation over the Amazon cloud. We evaluated EC2, S3, EBS and DynamoDB among the many Amazon AWS services. We evaluated the memory sub-system performance with CacheBench, the network performance with iperf, processor and network performance with the HPL benchmark application, and shared storage with NFS and PVFS in addition to S3. We also evaluated a real scientific computing application through the Swift parallel scripting system at scale. Armed with both detailed benchmarks to gauge expected performance and a detailed monetary cost analysis, we expect this paper will be a recipe cookbook for scientists to help them decide where to deploy and run their scientific applications between public clouds, private clouds, or hybrid clouds.« less
Understanding the Performance and Potential of Cloud Computing for Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sadooghi, Iman; Martin, Jesus Hernandez; Li, Tonglin

In this paper, commercial clouds bring a great opportunity to the scientific computing area. Scientific applications usually require significant resources, however not all scientists have access to sufficient high-end computing systems, may of which can be found in the Top500 list. Cloud Computing has gained the attention of scientists as a competitive resource to run HPC applications at a potentially lower cost. But as a different infrastructure, it is unclear whether clouds are capable of running scientific applications with a reasonable performance per money spent. This work studies the performance of public clouds and places this performance in context tomore » price. We evaluate the raw performance of different services of AWS cloud in terms of the basic resources, such as compute, memory, network and I/O. We also evaluate the performance of the scientific applications running in the cloud. This paper aims to assess the ability of the cloud to perform well, as well as to evaluate the cost of the cloud running scientific applications. We developed a full set of metrics and conducted a comprehensive performance evlauation over the Amazon cloud. We evaluated EC2, S3, EBS and DynamoDB among the many Amazon AWS services. We evaluated the memory sub-system performance with CacheBench, the network performance with iperf, processor and network performance with the HPL benchmark application, and shared storage with NFS and PVFS in addition to S3. We also evaluated a real scientific computing application through the Swift parallel scripting system at scale. Armed with both detailed benchmarks to gauge expected performance and a detailed monetary cost analysis, we expect this paper will be a recipe cookbook for scientists to help them decide where to deploy and run their scientific applications between public clouds, private clouds, or hybrid clouds.« less
Cloudbursting - Solving the 3-body problem

NASA Astrophysics Data System (ADS)

Chang, G.; Heistand, S.; Vakhnin, A.; Huang, T.; Zimdars, P.; Hua, H.; Hood, R.; Koenig, J.; Mehrotra, P.; Little, M. M.; Law, E.

2014-12-01

Many science projects in the future will be accomplished through collaboration among 2 or more NASA centers along with, potentially, external scientists. Science teams will be composed of more geographically dispersed individuals and groups. However, the current computing environment does not make this easy and seamless. By being able to share computing resources among members of a multi-center team working on a science/ engineering project, limited pre-competition funds could be more efficiently applied and technical work could be conducted more effectively with less time spent moving data or waiting for computing resources to free up. Based on the work from an NASA CIO IT Labs task, this presentation will highlight our prototype work in identifying the feasibility and identify the obstacles, both technical and management, to perform "Cloudbursting" among private clouds located at three different centers. We will demonstrate the use of private cloud computing infrastructure at the Jet Propulsion Laboratory, Langley Research Center, and Ames Research Center to provide elastic computation to each other to perform parallel Earth Science data imaging. We leverage elastic load balancing and auto-scaling features at each data center so that each location can independently define how many resources to allocate to a particular job that was "bursted" from another data center and demonstrate that compute capacity scales up and down with the job. We will also discuss future work in the area, which could include the use of cloud infrastructure from different cloud framework providers as well as other cloud service providers.
Towards a Multi-Mission, Airborne Science Data System Environment

NASA Astrophysics Data System (ADS)

Crichton, D. J.; Hardman, S.; Law, E.; Freeborn, D.; Kay-Im, E.; Lau, G.; Oswald, J.

2011-12-01

NASA earth science instruments are increasingly relying on airborne missions. However, traditionally, there has been limited common infrastructure support available to principal investigators in the area of science data systems. As a result, each investigator has been required to develop their own computing infrastructures for the science data system. Typically there is little software reuse and many projects lack sufficient resources to provide a robust infrastructure to capture, process, distribute and archive the observations acquired from airborne flights. At NASA's Jet Propulsion Laboratory (JPL), we have been developing a multi-mission data system infrastructure for airborne instruments called the Airborne Cloud Computing Environment (ACCE). ACCE encompasses the end-to-end lifecycle covering planning, provisioning of data system capabilities, and support for scientific analysis in order to improve the quality, cost effectiveness, and capabilities to enable new scientific discovery and research in earth observation. This includes improving data system interoperability across each instrument. A principal characteristic is being able to provide an agile infrastructure that is architected to allow for a variety of configurations of the infrastructure from locally installed compute and storage services to provisioning those services via the "cloud" from cloud computer vendors such as Amazon.com. Investigators often have different needs that require a flexible configuration. The data system infrastructure is built on the Apache's Object Oriented Data Technology (OODT) suite of components which has been used for a number of spaceborne missions and provides a rich set of open source software components and services for constructing science processing and data management systems. In 2010, a partnership was formed between the ACCE team and the Carbon in Arctic Reservoirs Vulnerability Experiment (CARVE) mission to support the data processing and data management needs. A principal goal is to provide support for the Fourier Transform Spectrometer (FTS) instrument which will produce over 700,000 soundings over the life of their three-year mission. The cost to purchase and operate a cluster-based system in order to generate Level 2 Full Physics products from this data was prohibitive. Through an evaluation of cloud computing solutions, Amazon's Elastic Compute Cloud (EC2) was selected for the CARVE deployment. As the ACCE infrastructure is developed and extended to form an infrastructure for airborne missions, the experience of working with CARVE has provided a number of lessons learned and has proven to be important in reinforcing the unique aspects of airborne missions and the importance of the ACCE infrastructure in developing a cost effective, flexible multi-mission capability that leverages emerging capabilities in cloud computing, workflow management, and distributed computing.
Soil Monitor: an open source web application for real-time soil sealing monitoring and assessment

NASA Astrophysics Data System (ADS)

Langella, Giuliano; Basile, Angelo; Giannecchini, Simone; Iamarino, Michela; Munafò, Michele; Terribile, Fabio

2016-04-01

Soil sealing is one of the most important causes of land degradation and desertification. In Europe, soil covered by impermeable materials has increased by about 80% from the Second World War till nowadays, while population has only grown by one third. There is an increasing concern at the high political levels about the need to attenuate imperviousness itself and its effects on soil functions. European Commission promulgated a roadmap (COM(2011) 571) by which the net land take would be zero by 2050. Furthermore, European Commission also published a report in 2011 providing best practices and guidelines for limiting soil sealing and imperviousness. In this scenario, we developed an open source and an open source based Soil Sealing Geospatial Cyber Infrastructure (SS-GCI) named as "Soil Monitor". This tool merges a webGIS with parallel geospatial computation in a fast and dynamic fashion in order to provide real-time assessments of soil sealing at high spatial resolution (20 meters and below) over the whole Italy. Common open source webGIS packages are used to implement both the data management and visualization infrastructures, such as GeoServer and MapStore. The high-speed geospatial computation is ensured by a GPU parallelism using the CUDA (Computing Unified Device Architecture) framework by NVIDIA®. This kind of parallelism required the writing - from scratch - all codes needed to fulfil the geospatial computation built behind the soil sealing toolbox. The combination of GPU computing with webGIS infrastructures is relatively novel and required particular attention at the Java-CUDA programming interface. As a result, Soil Monitor is smart because it can perform very high time-consuming calculations (querying for instance an Italian administrative region as area of interest) in less than one minute. The web application is embedded in a web browser and nothing must be installed before using it. Potentially everybody can use it, but the main targets are the stakeholders dealing with sealing, such as policy makers, land owners and asphalt/cement companies. As a matter of fact, Soil Monitor can be used to improve the spatial planning therefore limiting the progression of disordered soil sealing which causes both the direct loss of soils due to imperviousness but also the indirect loss caused by fragmentation of soils (which has different negative effects on the durability of soil functions, such as habitat corridors). Further, in a future version, Soil Monitor would estimate the best location for a new building or help compensating soil losses by actions in other areas to offset drawbacks at zero. The presented SS-GCI dealing with soil sealing - if opportunely scaled - would aid the implementation of best practices for limiting soil sealing or mitigating its effects on soil functions.
Fiber optic sensors for infrastructure applications

DOT National Transportation Integrated Search

1998-02-01

Fiber optic sensor technology offers the possibility of implementing "nervous systems" for infrastructure elements that allow high performance, cost effective health and damage assessment systems to be achieved. This is possible, largely due to syner...
GenomeVIP: a cloud platform for genomic variant discovery and interpretation

PubMed Central

Mashl, R. Jay; Scott, Adam D.; Huang, Kuan-lin; Wyczalkowski, Matthew A.; Yoon, Christopher J.; Niu, Beifang; DeNardo, Erin; Yellapantula, Venkata D.; Handsaker, Robert E.; Chen, Ken; Koboldt, Daniel C.; Ye, Kai; Fenyö, David; Raphael, Benjamin J.; Wendl, Michael C.; Ding, Li

2017-01-01

Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional “download and analyze” paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets. PMID:28522612

Current Capabilities at SNL for the Integration of Small Modular Reactors onto Smart Microgrids Using Sandia's Smart Microgrid Technology High Performance Computing and Advanced Manufacturing.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rodriguez, Salvador B.

Smart grids are a crucial component for enabling the nation’s future energy needs, as part of a modernization effort led by the Department of Energy. Smart grids and smart microgrids are being considered in niche applications, and as part of a comprehensive energy strategy to help manage the nation’s growing energy demands, for critical infrastructures, military installations, small rural communities, and large populations with limited water supplies. As part of a far-reaching strategic initiative, Sandia National Laboratories (SNL) presents herein a unique, three-pronged approach to integrate small modular reactors (SMRs) into microgrids, with the goal of providing economically-competitive, reliable, andmore » secure energy to meet the nation’s needs. SNL’s triad methodology involves an innovative blend of smart microgrid technology, high performance computing (HPC), and advanced manufacturing (AM). In this report, Sandia’s current capabilities in those areas are summarized, as well as paths forward that will enable DOE to achieve its energy goals. In the area of smart grid/microgrid technology, Sandia’s current computational capabilities can model the entire grid, including temporal aspects and cyber security issues. Our tools include system development, integration, testing and evaluation, monitoring, and sustainment.« less
Three-dimensional integration of nanotechnologies for computing and data storage on a single chip

NASA Astrophysics Data System (ADS)

Shulaker, Max M.; Hills, Gage; Park, Rebecca S.; Howe, Roger T.; Saraswat, Krishna; Wong, H.-S. Philip; Mitra, Subhasish

2017-07-01

The computing demands of future data-intensive applications will greatly exceed the capabilities of current electronics, and are unlikely to be met by isolated improvements in transistors, data storage technologies or integrated circuit architectures alone. Instead, transformative nanosystems, which use new nanotechnologies to simultaneously realize improved devices and new integrated circuit architectures, are required. Here we present a prototype of such a transformative nanosystem. It consists of more than one million resistive random-access memory cells and more than two million carbon-nanotube field-effect transistors—promising new nanotechnologies for use in energy-efficient digital logic circuits and for dense data storage—fabricated on vertically stacked layers in a single chip. Unlike conventional integrated circuit architectures, the layered fabrication realizes a three-dimensional integrated circuit architecture with fine-grained and dense vertical connectivity between layers of computing, data storage, and input and output (in this instance, sensing). As a result, our nanosystem can capture massive amounts of data every second, store it directly on-chip, perform in situ processing of the captured data, and produce ‘highly processed’ information. As a working prototype, our nanosystem senses and classifies ambient gases. Furthermore, because the layers are fabricated on top of silicon logic circuitry, our nanosystem is compatible with existing infrastructure for silicon-based technologies. Such complex nano-electronic systems will be essential for future high-performance and highly energy-efficient electronic systems.
Three-dimensional integration of nanotechnologies for computing and data storage on a single chip.

PubMed

Shulaker, Max M; Hills, Gage; Park, Rebecca S; Howe, Roger T; Saraswat, Krishna; Wong, H-S Philip; Mitra, Subhasish

2017-07-05

The computing demands of future data-intensive applications will greatly exceed the capabilities of current electronics, and are unlikely to be met by isolated improvements in transistors, data storage technologies or integrated circuit architectures alone. Instead, transformative nanosystems, which use new nanotechnologies to simultaneously realize improved devices and new integrated circuit architectures, are required. Here we present a prototype of such a transformative nanosystem. It consists of more than one million resistive random-access memory cells and more than two million carbon-nanotube field-effect transistors-promising new nanotechnologies for use in energy-efficient digital logic circuits and for dense data storage-fabricated on vertically stacked layers in a single chip. Unlike conventional integrated circuit architectures, the layered fabrication realizes a three-dimensional integrated circuit architecture with fine-grained and dense vertical connectivity between layers of computing, data storage, and input and output (in this instance, sensing). As a result, our nanosystem can capture massive amounts of data every second, store it directly on-chip, perform in situ processing of the captured data, and produce 'highly processed' information. As a working prototype, our nanosystem senses and classifies ambient gases. Furthermore, because the layers are fabricated on top of silicon logic circuitry, our nanosystem is compatible with existing infrastructure for silicon-based technologies. Such complex nano-electronic systems will be essential for future high-performance and highly energy-efficient electronic systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Aderholdt, Ferrol; Caldwell, Blake A.; Hicks, Susan Elaine

High performance computing environments are often used for a wide variety of workloads ranging from simulation, data transformation and analysis, and complex workflows to name just a few. These systems may process data at various security levels but in so doing are often enclaved at the highest security posture. This approach places significant restrictions on the users of the system even when processing data at a lower security level and exposes data at higher levels of confidentiality to a much broader population than otherwise necessary. The traditional approach of isolation, while effective in establishing security enclaves poses significant challenges formore » the use of shared infrastructure in HPC environments. This report details current state-of-the-art in reconfigurable network enclaving through Software Defined Networking (SDN) and Network Function Virtualization (NFV) and their applicability to secure enclaves in HPC environments. SDN and NFV methods are based on a solid foundation of system wide virtualization. The purpose of which is very straight forward, the system administrator can deploy networks that are more amenable to customer needs, and at the same time achieve increased scalability making it easier to increase overall capacity as needed without negatively affecting functionality. The network administration of both the server system and the virtual sub-systems is simplified allowing control of the infrastructure through well-defined APIs (Application Programming Interface). While SDN and NFV technologies offer significant promise in meeting these goals, they also provide the ability to address a significant component of the multi-tenant challenge in HPC environments, namely resource isolation. Traditional HPC systems are built upon scalable high-performance networking technologies designed to meet specific application requirements. Dynamic isolation of resources within these environments has remained difficult to achieve. SDN and NFV methodology provide us with relevant concepts and available open standards based APIs that isolate compute and storage resources within an otherwise common networking infrastructure. Additionally, the integration of the networking APIs within larger system frameworks such as OpenStack provide the tools necessary to establish isolated enclaves dynamically allowing the benefits of HPC while providing a controlled security structure surrounding these systems.« less
Big Data Solution for CTBT Monitoring Using Global Cross Correlation

NASA Astrophysics Data System (ADS)

Gaillard, P.; Bobrov, D.; Dupont, A.; Grenouille, A.; Kitov, I. O.; Rozhkov, M.

2014-12-01

Due to the mismatch between data volume and the performance of the Information Technology infrastructure used in seismic data centers, it becomes more and more difficult to process all the data with traditional applications in a reasonable elapsed time. To fulfill their missions, the International Data Centre of the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO/IDC) and the Département Analyse Surveillance Environnement of Commissariat à l'Energie atomique et aux énergies alternatives (CEA/DASE) collect, process and produce complex data sets whose volume is growing exponentially. In the medium term, computer architectures, data management systems and application algorithms will require fundamental changes to meet the needs. This problem is well known and identified as a "Big Data" challenge. To tackle this major task, the CEA/DASE takes part during two years to the "DataScale" project. Started in September 2013, DataScale gathers a large set of partners (research laboratories, SMEs and big companies). The common objective is to design efficient solutions using the synergy between Big Data solutions and the High Performance Computing (HPC). The project will evaluate the relevance of these technological solutions by implementing a demonstrator for seismic event detections thanks to massive waveform correlations. The IDC has developed an expertise on such techniques leading to an algorithm called "Master Event" and provides a high-quality dataset for an extensive cross correlation study. The objective of the project is to enhance the Master Event algorithm and to reanalyze 10 years of waveform data from the International Monitoring System (IMS) network thanks to a dedicated HPC infrastructure operated by the "Centre de Calcul Recherche et Technologie" at the CEA of Bruyères-le-Châtel. The dataset used for the demonstrator includes more than 300,000 seismic events, tens of millions of raw detections and more than 30 terabytes of continuous seismic data from the primary IMS stations. In this talk, we will present the Master Event algorithm and the associated workflow, we will give an overview of the designed technical solutions (from the building blocks to the global infrastructure), and we will show the preliminary results at a regional scale.
A high performance computing framework for physics-based modeling and simulation of military ground vehicles

NASA Astrophysics Data System (ADS)

Negrut, Dan; Lamb, David; Gorsich, David

2011-06-01

This paper describes a software infrastructure made up of tools and libraries designed to assist developers in implementing computational dynamics applications running on heterogeneous and distributed computing environments. Together, these tools and libraries compose a so called Heterogeneous Computing Template (HCT). The heterogeneous and distributed computing hardware infrastructure is assumed herein to be made up of a combination of CPUs and Graphics Processing Units (GPUs). The computational dynamics applications targeted to execute on such a hardware topology include many-body dynamics, smoothed-particle hydrodynamics (SPH) fluid simulation, and fluid-solid interaction analysis. The underlying theme of the solution approach embraced by HCT is that of partitioning the domain of interest into a number of subdomains that are each managed by a separate core/accelerator (CPU/GPU) pair. Five components at the core of HCT enable the envisioned distributed computing approach to large-scale dynamical system simulation: (a) the ability to partition the problem according to the one-to-one mapping; i.e., spatial subdivision, discussed above (pre-processing); (b) a protocol for passing data between any two co-processors; (c) algorithms for element proximity computation; and (d) the ability to carry out post-processing in a distributed fashion. In this contribution the components (a) and (b) of the HCT are demonstrated via the example of the Discrete Element Method (DEM) for rigid body dynamics with friction and contact. The collision detection task required in frictional-contact dynamics (task (c) above), is shown to benefit on the GPU of a two order of magnitude gain in efficiency when compared to traditional sequential implementations. Note: Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not imply its endorsement, recommendation, or favoring by the United States Army. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Army, and shall not be used for advertising or product endorsement purposes.
High-Performance First-Principles Molecular Dynamics for Predictive Theory and Modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gygi, Francois; Galli, Giulia; Schwegler, Eric

This project focused on developing high-performance software tools for First-Principles Molecular Dynamics (FPMD) simulations, and applying them in investigations of materials relevant to energy conversion processes. FPMD is an atomistic simulation method that combines a quantum-mechanical description of electronic structure with the statistical description provided by molecular dynamics (MD) simulations. This reliance on fundamental principles allows FPMD simulations to provide a consistent description of structural, dynamical and electronic properties of a material. This is particularly useful in systems for which reliable empirical models are lacking. FPMD simulations are increasingly used as a predictive tool for applications such as batteries, solarmore » energy conversion, light-emitting devices, electro-chemical energy conversion devices and other materials. During the course of the project, several new features were developed and added to the open-source Qbox FPMD code. The code was further optimized for scalable operation of large-scale, Leadership-Class DOE computers. When combined with Many-Body Perturbation Theory (MBPT) calculations, this infrastructure was used to investigate structural and electronic properties of liquid water, ice, aqueous solutions, nanoparticles and solid-liquid interfaces. Computing both ionic trajectories and electronic structure in a consistent manner enabled the simulation of several spectroscopic properties, such as Raman spectra, infrared spectra, and sum-frequency generation spectra. The accuracy of the approximations used allowed for direct comparisons of results with experimental data such as optical spectra, X-ray and neutron diffraction spectra. The software infrastructure developed in this project, as applied to various investigations of solids, liquids and interfaces, demonstrates that FPMD simulations can provide a detailed, atomic-scale picture of structural, vibrational and electronic properties of complex systems relevant to energy conversion devices.« less
A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.

PubMed

Siretskiy, Alexey; Sundqvist, Tore; Voznesenskiy, Mikhail; Spjuth, Ola

2015-01-01

New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology. In this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories. From our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable option for the common data sizes that are currently used in massively parallel sequencing. Given that datasets are expected to increase over time, Hadoop is a framework that we envision will have an increasingly important role in future biological data analysis.
Using the iPlant collaborative discovery environment.

PubMed

Oliver, Shannon L; Lenards, Andrew J; Barthelson, Roger A; Merchant, Nirav; McKay, Sheldon J

2013-06-01

The iPlant Collaborative is an academic consortium whose mission is to develop an informatics and social infrastructure to address the "grand challenges" in plant biology. Its cyberinfrastructure supports the computational needs of the research community and facilitates solving major challenges in plant science. The Discovery Environment provides a powerful and rich graphical interface to the iPlant Collaborative cyberinfrastructure by creating an accessible virtual workbench that enables all levels of expertise, ranging from students to traditional biology researchers and computational experts, to explore, analyze, and share their data. By providing access to iPlant's robust data-management system and high-performance computing resources, the Discovery Environment also creates a unified space in which researchers can access scalable tools. Researchers can use available Applications (Apps) to execute analyses on their data, as well as customize or integrate their own tools to better meet the specific needs of their research. These Apps can also be used in workflows that automate more complicated analyses. This module describes how to use the main features of the Discovery Environment, using bioinformatics workflows for high-throughput sequence data as examples. © 2013 by John Wiley & Sons, Inc.
Integration and Exposure of Large Scale Computational Resources Across the Earth System Grid Federation (ESGF)

NASA Astrophysics Data System (ADS)

Duffy, D.; Maxwell, T. P.; Doutriaux, C.; Williams, D. N.; Chaudhary, A.; Ames, S.

2015-12-01

As the size of remote sensing observations and model output data grows, the volume of the data has become overwhelming, even to many scientific experts. As societies are forced to better understand, mitigate, and adapt to climate changes, the combination of Earth observation data and global climate model projects is crucial to not only scientists but to policy makers, downstream applications, and even the public. Scientific progress on understanding climate is critically dependent on the availability of a reliable infrastructure that promotes data access, management, and provenance. The Earth System Grid Federation (ESGF) has created such an environment for the Intergovernmental Panel on Climate Change (IPCC). ESGF provides a federated global cyber infrastructure for data access and management of model outputs generated for the IPCC Assessment Reports (AR). The current generation of the ESGF federated grid allows consumers of the data to find and download data with limited capabilities for server-side processing. Since the amount of data for future AR is expected to grow dramatically, ESGF is working on integrating server-side analytics throughout the federation. The ESGF Compute Working Team (CWT) has created a Web Processing Service (WPS) Application Programming Interface (API) to enable access scalable computational resources. The API is the exposure point to high performance computing resources across the federation. Specifically, the API allows users to execute simple operations, such as maximum, minimum, average, and anomalies, on ESGF data without having to download the data. These operations are executed at the ESGF data node site with access to large amounts of parallel computing capabilities. This presentation will highlight the WPS API, its capabilities, provide implementation details, and discuss future developments.
Analysis Report for Exascale Storage Requirements for Scientific Data.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ruwart, Thomas M.

Over the next 10 years, the Department of Energy will be transitioning from Petascale to Exascale Computing resulting in data storage, networking, and infrastructure requirements to increase by three orders of magnitude. The technologies and best practices used today are the result of a relatively slow evolution of ancestral technologies developed in the 1950s and 1960s. These include magnetic tape, magnetic disk, networking, databases, file systems, and operating systems. These technologies will continue to evolve over the next 10 to 15 years on a reasonably predictable path. Experience with the challenges involved in transitioning these fundamental technologies from Terascale tomore » Petascale computing systems has raised questions about how these will scale another 3 or 4 orders of magnitude to meet the requirements imposed by Exascale computing systems. This report is focused on the most concerning scaling issues with data storage systems as they relate to High Performance Computing- and presents options for a path forward. Given the ability to store exponentially increasing amounts of data, far more advanced concepts and use of metadata will be critical to managing data in Exascale computing systems.« less
Cloud Infrastructure & Applications - CloudIA

NASA Astrophysics Data System (ADS)

Sulistio, Anthony; Reich, Christoph; Doelitzscher, Frank

The idea behind Cloud Computing is to deliver Infrastructure-as-a-Services and Software-as-a-Service over the Internet on an easy pay-per-use business model. To harness the potentials of Cloud Computing for e-Learning and research purposes, and to small- and medium-sized enterprises, the Hochschule Furtwangen University establishes a new project, called Cloud Infrastructure & Applications (CloudIA). The CloudIA project is a market-oriented cloud infrastructure that leverages different virtualization technologies, by supporting Service-Level Agreements for various service offerings. This paper describes the CloudIA project in details and mentions our early experiences in building a private cloud using an existing infrastructure.
Rapid prototyping of soil moisture estimates using the NASA Land Information System

NASA Astrophysics Data System (ADS)

Anantharaj, V.; Mostovoy, G.; Li, B.; Peters-Lidard, C.; Houser, P.; Moorhead, R.; Kumar, S.

2007-12-01

The Land Information System (LIS), developed at the NASA Goddard Space Flight Center, is a functional Land Data Assimilation System (LDAS) that incorporates a suite of land models in an interoperable computational framework. LIS has been integrated into a computational Rapid Prototyping Capabilities (RPC) infrastructure. LIS consists of a core, a number of community land models, data servers, and visualization systems - integrated in a high-performance computing environment. The land surface models (LSM) in LIS incorporate surface and atmospheric parameters of temperature, snow/water, vegetation, albedo, soil conditions, topography, and radiation. Many of these parameters are available from in-situ observations, numerical model analysis, and from NASA, NOAA, and other remote sensing satellite platforms at various spatial and temporal resolutions. The computational resources, available to LIS via the RPC infrastructure, support e- Science experiments involving the global modeling of land-atmosphere studies at 1km spatial resolutions as well as regional studies at finer resolutions. The Noah Land Surface Model, available with-in the LIS is being used to rapidly prototype soil moisture estimates in order to evaluate the viability of other science applications for decision making purposes. For example, LIS has been used to further extend the utility of the USDA Soil Climate Analysis Network of in-situ soil moisture observations. In addition, LIS also supports data assimilation capabilities that are used to assimilate remotely sensed soil moisture retrievals from the AMSR-E instrument onboard the Aqua satellite. The rapid prototyping of soil moisture estimates using LIS and their applications will be illustrated during the presentation.
Cloud access to interoperable IVOA-compliant VOSpace storage

NASA Astrophysics Data System (ADS)

Bertocco, S.; Dowler, P.; Gaudet, S.; Major, B.; Pasian, F.; Taffoni, G.

2018-07-01

Handling, processing and archiving the huge amount of data produced by the new generation of experiments and instruments in Astronomy and Astrophysics are among the more exciting challenges to address in designing the future data management infrastructures and computing services. We investigated the feasibility of a data management and computation infrastructure, available world-wide, with the aim of merging the FAIR data management provided by IVOA standards with the efficiency and reliability of a cloud approach. Our work involved the Canadian Advanced Network for Astronomy Research (CANFAR) infrastructure and the European EGI federated cloud (EFC). We designed and deployed a pilot data management and computation infrastructure that provides IVOA-compliant VOSpace storage resources and wide access to interoperable federated clouds. In this paper, we detail the main user requirements covered, the technical choices and the implemented solutions and we describe the resulting Hybrid cloud Worldwide infrastructure, its benefits and limitations.
Fiber optic sensors for infrastructure applications : final report.

DOT National Transportation Integrated Search

1998-02-01

Fiber optic sensor technology offers the possibility of implementing "nervous systems" for infrastructure elements that allow high performance, cost effective health and damage assessment systems to be achieved. This is possible, largely due to syner...
Toward an automated parallel computing environment for geosciences

NASA Astrophysics Data System (ADS)

Zhang, Huai; Liu, Mian; Shi, Yaolin; Yuen, David A.; Yan, Zhenzhen; Liang, Guoping

2007-08-01

Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.
The cost of getting CCS wrong: Uncertainty, infrastructure design, and stranded CO 2

DOE PAGES

Middleton, Richard Stephen; Yaw, Sean Patrick

2018-01-11

Carbon capture, and storage (CCS) infrastructure will require industry—such as fossil-fuel power, ethanol production, and oil and gas extraction—to make massive investment in infrastructure. The cost of getting these investments wrong will be substantial and will impact the success of CCS technology. Multiple factors can and will impact the success of commercial-scale CCS, including significant uncertainties regarding capture, transport, and injection-storage decisions. Uncertainties throughout the CCS supply chain include policy, technology, engineering performance, economics, and market forces. In particular, large uncertainties exist for the injection and storage of CO 2. Even taking into account upfront investment in site characterization, themore » final performance of the storage phase is largely unknown until commercial-scale injection has started. We explore and quantify the impact of getting CCS infrastructure decisions wrong based on uncertain injection rates and uncertain CO 2 storage capacities using a case study managing CO 2 emissions from the Canadian oil sands industry in Alberta. We use SimCCS, a widely used CCS infrastructure design framework, to develop multiple CCS infrastructure scenarios. Each scenario consists of a CCS infrastructure network that connects CO 2 sources (oil sands extraction and processing) with CO 2 storage reservoirs (acid gas storage reservoirs) using a dedicated CO 2 pipeline network. Each scenario is analyzed under a range of uncertain storage estimates and infrastructure performance is assessed and quantified in terms of cost to build additional infrastructure to store all CO 2. We also include the role of stranded CO 2, CO 2 that a source was expecting to but cannot capture due substandard performance in the transport and storage infrastructure. Results show that the cost of getting the original infrastructure design wrong are significant and that comprehensive planning will be required to ensure that CCS becomes a successful climate mitigation technology. Here, we show that the concept of stranded CO 2 can transform a seemingly high-performing infrastructure design into the worst case scenario.« less
The cost of getting CCS wrong: Uncertainty, infrastructure design, and stranded CO 2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Middleton, Richard Stephen; Yaw, Sean Patrick

Carbon capture, and storage (CCS) infrastructure will require industry—such as fossil-fuel power, ethanol production, and oil and gas extraction—to make massive investment in infrastructure. The cost of getting these investments wrong will be substantial and will impact the success of CCS technology. Multiple factors can and will impact the success of commercial-scale CCS, including significant uncertainties regarding capture, transport, and injection-storage decisions. Uncertainties throughout the CCS supply chain include policy, technology, engineering performance, economics, and market forces. In particular, large uncertainties exist for the injection and storage of CO 2. Even taking into account upfront investment in site characterization, themore » final performance of the storage phase is largely unknown until commercial-scale injection has started. We explore and quantify the impact of getting CCS infrastructure decisions wrong based on uncertain injection rates and uncertain CO 2 storage capacities using a case study managing CO 2 emissions from the Canadian oil sands industry in Alberta. We use SimCCS, a widely used CCS infrastructure design framework, to develop multiple CCS infrastructure scenarios. Each scenario consists of a CCS infrastructure network that connects CO 2 sources (oil sands extraction and processing) with CO 2 storage reservoirs (acid gas storage reservoirs) using a dedicated CO 2 pipeline network. Each scenario is analyzed under a range of uncertain storage estimates and infrastructure performance is assessed and quantified in terms of cost to build additional infrastructure to store all CO 2. We also include the role of stranded CO 2, CO 2 that a source was expecting to but cannot capture due substandard performance in the transport and storage infrastructure. Results show that the cost of getting the original infrastructure design wrong are significant and that comprehensive planning will be required to ensure that CCS becomes a successful climate mitigation technology. Here, we show that the concept of stranded CO 2 can transform a seemingly high-performing infrastructure design into the worst case scenario.« less
OpenSoC Fabric

DOE Office of Scientific and Technical Information (OSTI.GOV)

2014-08-21

Recent advancements in technology scaling have shown a trend towards greater integration with large-scale chips containing thousands of processors connected to memories and other I/O devices using non-trivial network topologies. Software simulation proves insufficient to study the tradeoffs in such complex systems due to slow execution time, whereas hardware RTL development is too time-consuming. We present OpenSoC Fabric, an on-chip network generation infrastructure which aims to provide a parameterizable and powerful on-chip network generator for evaluating future high performance computing architectures based on SoC technology. OpenSoC Fabric leverages a new hardware DSL, Chisel, which contains powerful abstractions provided by itsmore » base language, Scala, and generates both software (C++) and hardware (Verilog) models from a single code base. The OpenSoC Fabric2 infrastructure is modeled after existing state-of-the-art simulators, offers large and powerful collections of configuration options, and follows object-oriented design and functional programming to make functionality extension as easy as possible.« less
Parallel Infrastructure Modeling and Inversion Module for E4D

DOE Office of Scientific and Technical Information (OSTI.GOV)

2014-10-09

Electrical resistivity tomography ERT is a method of imaging the electrical conductivity of the subsurface. Electrical conductivity is a useful metric for understanding the subsurface because it is governed by geomechanical and geochemical properties that drive subsurface systems. ERT works by injecting current into the subsurface across a pair of electrodes, and measuring the corresponding electrical potential response across another pair of electrodes. Many such measurements are strategically taken across an array of electrodes to produce an ERT data set. These data are then processed through a computationally demanding process known as inversion to produce an image of the subsurfacemore » conductivity structure that gave rise to the measurements. Data can be inverted to provide 2D images, 3D images, or in the case of time-lapse 3D imaging, 4D images. ERT is generally not well suited for environments with buried electrically conductive infrastructure such as pipes, tanks, or well casings, because these features tend to dominate and degrade ERT images. This reduces or eliminates the utility of ERT imaging where it would otherwise be highly useful for, for example, imaging fluid migration from leaking pipes, imaging soil contamination beneath leaking subusurface tanks, and monitoring contaminant migration in locations with dense network of metal cased monitoring wells. The location and dimension of buried metallic infrastructure is often known. If so, then the effects of the infrastructure can be explicitly modeled within the ERT imaging algorithm, and thereby removed from the corresponding ERT image. However,there are a number of obstacles limiting this application. 1) Metallic infrastructure cannot be accurately modeled with standard codes because of the large contrast in conductivity between the metal and host material. 2) Modeling infrastructure in true dimension requires the computational mesh to be highly refined near the metal inclusions, which increases computational demands. 3) The ERT imaging algorithm requires specialized modifications to accomodate high conductivty inclusions within the computational mesh. The solution to each of these challenges was implemented within E4D (formerly FERM3D), which is a parallel ERT imaging code developed at PNNL (IPID #30249). The infrastructure modeling module implement in E4D uses a method of decoupling the model at the metallic interface(s) boundaries, into several well posed sub-problems (one for each distinct metallicinclusion) that are subsequently solved and recombined to form the global solution. The approach is based on the immersed interface method, with has been applied for similar problems in other fields (e.g. semiconductor industry). Comparisons to analytic solutions have shown the results to be very accurate, addressing item 1 above. The solution is implemented about an unstructured mesh, which enables arbitrary shapes to be efficiently modelled, thereby addressing item 2 above. In addition, the algorithm is written in parallel and shows excellent scalability, which also addresses equation 2 above. Finally, because only the boundaries of metallic inclusions are modeled, there are no high conductivity cells within the modeling mesh, and the problem described by item 3 above is no longer applicable.« less

Mantle convection on modern supercomputers

NASA Astrophysics Data System (ADS)

Weismüller, Jens; Gmeiner, Björn; Mohr, Marcus; Waluga, Christian; Wohlmuth, Barbara; Rüde, Ulrich; Bunge, Hans-Peter

2015-04-01

Mantle convection is the cause for plate tectonics, the formation of mountains and oceans, and the main driving mechanism behind earthquakes. The convection process is modeled by a system of partial differential equations describing the conservation of mass, momentum and energy. Characteristic to mantle flow is the vast disparity of length scales from global to microscopic, turning mantle convection simulations into a challenging application for high-performance computing. As system size and technical complexity of the simulations continue to increase, design and implementation of simulation models for next generation large-scale architectures demand an interdisciplinary co-design. Here we report about recent advances of the TERRA-NEO project, which is part of the high visibility SPPEXA program, and a joint effort of four research groups in computer sciences, mathematics and geophysical application under the leadership of FAU Erlangen. TERRA-NEO develops algorithms for future HPC infrastructures, focusing on high computational efficiency and resilience in next generation mantle convection models. We present software that can resolve the Earth's mantle with up to 1012 grid points and scales efficiently to massively parallel hardware with more than 50,000 processors. We use our simulations to explore the dynamic regime of mantle convection assessing the impact of small scale processes on global mantle flow.
The Petascale Data Storage Institute

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gibson, Garth; Long, Darrell; Honeyman, Peter

2013-07-01

Petascale computing infrastructures for scientific discovery make petascale demands on information storage capacity, performance, concurrency, reliability, availability, and manageability.The Petascale Data Storage Institute focuses on the data storage problems found in petascale scientific computing environments, with special attention to community issues such as interoperability, community buy-in, and shared tools.The Petascale Data Storage Institute is a collaboration between researchers at Carnegie Mellon University, National Energy Research Scientific Computing Center, Pacific Northwest National Laboratory, Oak Ridge National Laboratory, Sandia National Laboratory, Los Alamos National Laboratory, University of Michigan, and the University of California at Santa Cruz.
Processing LHC data in the UK

PubMed Central

Colling, D.; Britton, D.; Gordon, J.; Lloyd, S.; Doyle, A.; Gronbech, P.; Coles, J.; Sansum, A.; Patrick, G.; Jones, R.; Middleton, R.; Kelsey, D.; Cass, A.; Geddes, N.; Clark, P.; Barnby, L.

2013-01-01

The Large Hadron Collider (LHC) is one of the greatest scientific endeavours to date. The construction of the collider itself and the experiments that collect data from it represent a huge investment, both financially and in terms of human effort, in our hope to understand the way the Universe works at a deeper level. Yet the volumes of data produced are so large that they cannot be analysed at any single computing centre. Instead, the experiments have all adopted distributed computing models based on the LHC Computing Grid. Without the correct functioning of this grid infrastructure the experiments would not be able to understand the data that they have collected. Within the UK, the Grid infrastructure needed by the experiments is provided by the GridPP project. We report on the operations, performance and contributions made to the experiments by the GridPP project during the years of 2010 and 2011—the first two significant years of the running of the LHC. PMID:23230163
Cloud Infrastructures for In Silico Drug Discovery: Economic and Practical Aspects

PubMed Central

Clematis, Andrea; Quarati, Alfonso; Cesini, Daniele; Milanesi, Luciano; Merelli, Ivan

2013-01-01

Cloud computing opens new perspectives for small-medium biotechnology laboratories that need to perform bioinformatics analysis in a flexible and effective way. This seems particularly true for hybrid clouds that couple the scalability offered by general-purpose public clouds with the greater control and ad hoc customizations supplied by the private ones. A hybrid cloud broker, acting as an intermediary between users and public providers, can support customers in the selection of the most suitable offers, optionally adding the provisioning of dedicated services with higher levels of quality. This paper analyses some economic and practical aspects of exploiting cloud computing in a real research scenario for the in silico drug discovery in terms of requirements, costs, and computational load based on the number of expected users. In particular, our work is aimed at supporting both the researchers and the cloud broker delivering an IaaS cloud infrastructure for biotechnology laboratories exposing different levels of nonfunctional requirements. PMID:24106693
Advances in Grid Computing for the FabrIc for Frontier Experiments Project at Fermialb

DOE Office of Scientific and Technical Information (OSTI.GOV)

Herner, K.; Alba Hernandex, A. F.; Bhat, S.

The FabrIc for Frontier Experiments (FIFE) project is a major initiative within the Fermilab Scientic Computing Division charged with leading the computing model for Fermilab experiments. Work within the FIFE project creates close collaboration between experimenters and computing professionals to serve high-energy physics experiments of diering size, scope, and physics area. The FIFE project has worked to develop common tools for job submission, certicate management, software and reference data distribution through CVMFS repositories, robust data transfer, job monitoring, and databases for project tracking. Since the projects inception the experiments under the FIFE umbrella have signicantly matured, and present an increasinglymore » complex list of requirements to service providers. To meet these requirements, the FIFE project has been involved in transitioning the Fermilab General Purpose Grid cluster to support a partitionable slot model, expanding the resources available to experiments via the Open Science Grid, assisting with commissioning dedicated high-throughput computing resources for individual experiments, supporting the eorts of the HEP Cloud projects to provision a variety of back end resources, including public clouds and high performance computers, and developing rapid onboarding procedures for new experiments and collaborations. The larger demands also require enhanced job monitoring tools, which the project has developed using such tools as ElasticSearch and Grafana. in helping experiments manage their large-scale production work ows. This group in turn requires a structured service to facilitate smooth management of experiment requests, which FIFE provides in the form of the Production Operations Management Service (POMS). POMS is designed to track and manage requests from the FIFE experiments to run particular work ows, and support troubleshooting and triage in case of problems. Recently a new certicate management infrastructure called Distributed Computing Access with Federated Identities (DCAFI) has been put in place that has eliminated our dependence on a Fermilab-specic third-party Certicate Authority service and better accommodates FIFE collaborators without a Fermilab Kerberos account. DCAFI integrates the existing InCommon federated identity infrastructure, CILogon Basic CA, and a MyProxy service using a new general purpose open source tool. We will discuss the general FIFE onboarding strategy, progress in expanding FIFE experiments presence on the Open Science Grid, new tools for job monitoring, the POMS service, and the DCAFI project.« less
Advances in Grid Computing for the Fabric for Frontier Experiments Project at Fermilab

NASA Astrophysics Data System (ADS)

Herner, K.; Alba Hernandez, A. F.; Bhat, S.; Box, D.; Boyd, J.; Di Benedetto, V.; Ding, P.; Dykstra, D.; Fattoruso, M.; Garzoglio, G.; Kirby, M.; Kreymer, A.; Levshina, T.; Mazzacane, A.; Mengel, M.; Mhashilkar, P.; Podstavkov, V.; Retzke, K.; Sharma, N.; Teheran, J.

2017-10-01

The Fabric for Frontier Experiments (FIFE) project is a major initiative within the Fermilab Scientific Computing Division charged with leading the computing model for Fermilab experiments. Work within the FIFE project creates close collaboration between experimenters and computing professionals to serve high-energy physics experiments of differing size, scope, and physics area. The FIFE project has worked to develop common tools for job submission, certificate management, software and reference data distribution through CVMFS repositories, robust data transfer, job monitoring, and databases for project tracking. Since the projects inception the experiments under the FIFE umbrella have significantly matured, and present an increasingly complex list of requirements to service providers. To meet these requirements, the FIFE project has been involved in transitioning the Fermilab General Purpose Grid cluster to support a partitionable slot model, expanding the resources available to experiments via the Open Science Grid, assisting with commissioning dedicated high-throughput computing resources for individual experiments, supporting the efforts of the HEP Cloud projects to provision a variety of back end resources, including public clouds and high performance computers, and developing rapid onboarding procedures for new experiments and collaborations. The larger demands also require enhanced job monitoring tools, which the project has developed using such tools as ElasticSearch and Grafana. in helping experiments manage their large-scale production workflows. This group in turn requires a structured service to facilitate smooth management of experiment requests, which FIFE provides in the form of the Production Operations Management Service (POMS). POMS is designed to track and manage requests from the FIFE experiments to run particular workflows, and support troubleshooting and triage in case of problems. Recently a new certificate management infrastructure called Distributed Computing Access with Federated Identities (DCAFI) has been put in place that has eliminated our dependence on a Fermilab-specific third-party Certificate Authority service and better accommodates FIFE collaborators without a Fermilab Kerberos account. DCAFI integrates the existing InCommon federated identity infrastructure, CILogon Basic CA, and a MyProxy service using a new general purpose open source tool. We will discuss the general FIFE onboarding strategy, progress in expanding FIFE experiments presence on the Open Science Grid, new tools for job monitoring, the POMS service, and the DCAFI project.
Validation of Storm Water Management Model Storm Control Measures Modules

NASA Astrophysics Data System (ADS)

Simon, M. A.; Platz, M. C.

2017-12-01

EPA's Storm Water Management Model (SWMM) is a computational code heavily relied upon by industry for the simulation of wastewater and stormwater infrastructure performance. Many municipalities are relying on SWMM results to design multi-billion-dollar, multi-decade infrastructure upgrades. Since the 1970's, EPA and others have developed five major releases, the most recent ones containing storm control measures modules for green infrastructure. The main objective of this study was to quantify the accuracy with which SWMM v5.1.10 simulates the hydrologic activity of previously monitored low impact developments. Model performance was evaluated with a mathematical comparison of outflow hydrographs and total outflow volumes, using empirical data and a multi-event, multi-objective calibration method. The calibration methodology utilized PEST++ Version 3, a parameter estimation tool, which aided in the selection of unmeasured hydrologic parameters. From the validation study and sensitivity analysis, several model improvements were identified to advance SWMM LID Module performance for permeable pavements, infiltration units and green roofs, and these were performed and reported herein. Overall, it was determined that SWMM can successfully simulate low impact development controls given accurate model confirmation, parameter measurement, and model calibration.
Kepler Science Operations Center Architecture

NASA Technical Reports Server (NTRS)

Middour, Christopher; Klaus, Todd; Jenkins, Jon; Pletcher, David; Cote, Miles; Chandrasekaran, Hema; Wohler, Bill; Girouard, Forrest; Gunter, Jay P.; Uddin, Kamal;

2010-01-01

We give an overview of the operational concepts and architecture of the Kepler Science Data Pipeline. Designed, developed, operated, and maintained by the Science Operations Center (SOC) at NASA Ames Research Center, the Kepler Science Data Pipeline is central element of the Kepler Ground Data System. The SOC charter is to analyze stellar photometric data from the Kepler spacecraft and report results to the Kepler Science Office for further analysis. We describe how this is accomplished via the Kepler Science Data Pipeline, including the hardware infrastructure, scientific algorithms, and operational procedures. The SOC consists of an office at Ames Research Center, software development and operations departments, and a data center that hosts the computers required to perform data analysis. We discuss the high-performance, parallel computing software modules of the Kepler Science Data Pipeline that perform transit photometry, pixel-level calibration, systematic error-correction, attitude determination, stellar target management, and instrument characterization. We explain how data processing environments are divided to support operational processing and test needs. We explain the operational timelines for data processing and the data constructs that flow into the Kepler Science Data Pipeline.

Interactive Supercomputing’s Star-P Platform

DOE Office of Scientific and Technical Information (OSTI.GOV)

Edelman, Alan; Husbands, Parry; Leibman, Steve

2006-09-19

The thesis of this extended abstract is simple. High productivity comes from high level infrastructures. To measure this, we introduce a methodology that goes beyond the tradition of timing software in serial and tuned parallel modes. We perform a classroom productivity study involving 29 students who have written a homework exercise in a low level language (MPI message passing) and a high level language (Star-P with MATLAB client). Our conclusions indicate what perhaps should be of little surprise: (1) the high level language is always far easier on the students than the low level language. (2) The early versions ofmore » the high level language perform inadequately compared to the tuned low level language, but later versions substantially catch up. Asymptotically, the analogy must hold that message passing is to high level language parallel programming as assembler is to high level environments such as MATLAB, Mathematica, Maple, or even Python. We follow the Kepner method that correctly realizes that traditional speedup numbers without some discussion of the human cost of reaching these numbers can fail to reflect the true human productivity cost of high performance computing. Traditional data compares low level message passing with serial computation. With the benefit of a high level language system in place, in our case Star-P running with MATLAB client, and with the benefit of a large data pool: 29 students, each running the same code ten times on three evolutions of the same platform, we can methodically demonstrate the productivity gains. To date we are not aware of any high level system as extensive and interoperable as Star-P, nor are we aware of an experiment of this kind performed with this volume of data.« less
Future Naval Use of COTS Networking Infrastructure

DTIC Science & Technology

2009-07-01

user to benefit from Google’s vast databases and computational resources. Obviously, the ability to harness the full power of the Cloud could be... Computing Impact Findings Action Items Take-Aways Appendices: Pages 54-68 A. Terms of Reference Document B. Sample Definitions of Cloud ...and definition of Cloud Computing . While Cloud Computing is developing in many variations – including Infrastructure as a Service (IaaS), Platform as
Standard requirements for GCP-compliant data management in multinational clinical trials.

PubMed

Ohmann, Christian; Kuchinke, Wolfgang; Canham, Steve; Lauritsen, Jens; Salas, Nader; Schade-Brittinger, Carmen; Wittenberg, Michael; McPherson, Gladys; McCourt, John; Gueyffier, Francois; Lorimer, Andrea; Torres, Ferràn

2011-03-22

A recent survey has shown that data management in clinical trials performed by academic trial units still faces many difficulties (e.g. heterogeneity of software products, deficits in quality management, limited human and financial resources and the complexity of running a local computer centre). Unfortunately, no specific, practical and open standard for both GCP-compliant data management and the underlying IT-infrastructure is available to improve the situation. For that reason the "Working Group on Data Centres" of the European Clinical Research Infrastructures Network (ECRIN) has developed a standard specifying the requirements for high quality GCP-compliant data management in multinational clinical trials. International, European and national regulations and guidelines relevant to GCP, data security and IT infrastructures, as well as ECRIN documents produced previously, were evaluated to provide a starting point for the development of standard requirements. The requirements were produced by expert consensus of the ECRIN Working group on Data Centres, using a structured and standardised process. The requirements were divided into two main parts: an IT part covering standards for the underlying IT infrastructure and computer systems in general, and a Data Management (DM) part covering requirements for data management applications in clinical trials. The standard developed includes 115 IT requirements, split into 15 separate sections, 107 DM requirements (in 12 sections) and 13 other requirements (2 sections). Sections IT01 to IT05 deal with the basic IT infrastructure while IT06 and IT07 cover validation and local software development. IT08 to IT015 concern the aspects of IT systems that directly support clinical trial management. Sections DM01 to DM03 cover the implementation of a specific clinical data management application, i.e. for a specific trial, whilst DM04 to DM12 address the data management of trials across the unit. Section IN01 is dedicated to international aspects and ST01 to the competence of a trials unit's staff. The standard is intended to provide an open and widely used set of requirements for GCP-compliant data management, particularly in academic trial units. It is the intention that ECRIN will use these requirements as the basis for the certification of ECRIN data centres.
SPARX, a new environment for Cryo-EM image processing.

PubMed

Hohn, Michael; Tang, Grant; Goodyear, Grant; Baldwin, P R; Huang, Zhong; Penczek, Pawel A; Yang, Chao; Glaeser, Robert M; Adams, Paul D; Ludtke, Steven J

2007-01-01

SPARX (single particle analysis for resolution extension) is a new image processing environment with a particular emphasis on transmission electron microscopy (TEM) structure determination. It includes a graphical user interface that provides a complete graphical programming environment with a novel data/process-flow infrastructure, an extensive library of Python scripts that perform specific TEM-related computational tasks, and a core library of fundamental C++ image processing functions. In addition, SPARX relies on the EMAN2 library and cctbx, the open-source computational crystallography library from PHENIX. The design of the system is such that future inclusion of other image processing libraries is a straightforward task. The SPARX infrastructure intelligently handles retention of intermediate values, even those inside programming structures such as loops and function calls. SPARX and all dependencies are free for academic use and available with complete source.
The open science grid

NASA Astrophysics Data System (ADS)

Pordes, Ruth; OSG Consortium; Petravick, Don; Kramer, Bill; Olson, Doug; Livny, Miron; Roy, Alain; Avery, Paul; Blackburn, Kent; Wenaus, Torre; Würthwein, Frank; Foster, Ian; Gardner, Rob; Wilde, Mike; Blatecky, Alan; McGee, John; Quick, Rob

2007-07-01

The Open Science Grid (OSG) provides a distributed facility where the Consortium members provide guaranteed and opportunistic access to shared computing and storage resources. OSG provides support for and evolution of the infrastructure through activities that cover operations, security, software, troubleshooting, addition of new capabilities, and support for existing and engagement with new communities. The OSG SciDAC-2 project provides specific activities to manage and evolve the distributed infrastructure and support it's use. The innovative aspects of the project are the maintenance and performance of a collaborative (shared & common) petascale national facility over tens of autonomous computing sites, for many hundreds of users, transferring terabytes of data a day, executing tens of thousands of jobs a day, and providing robust and usable resources for scientific groups of all types and sizes. More information can be found at the OSG web site: www.opensciencegrid.org.
Cloud@Home: A New Enhanced Computing Paradigm

NASA Astrophysics Data System (ADS)

Distefano, Salvatore; Cunsolo, Vincenzo D.; Puliafito, Antonio; Scarpa, Marco

Cloud computing is a distributed computing paradigm that mixes aspects of Grid computing, ("… hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities" (Foster, 2002)) Internet Computing ("…a computing platform geographically distributed across the Internet" (Milenkovic et al., 2003)), Utility computing ("a collection of technologies and business practices that enables computing to be delivered seamlessly and reliably across multiple computers, ... available as needed and billed according to usage, much like water and electricity are today" (Ross & Westerman, 2004)) Autonomic computing ("computing systems that can manage themselves given high-level objectives from administrators" (Kephart & Chess, 2003)), Edge computing ("… provides a generic template facility for any type of application to spread its execution across a dedicated grid, balancing the load …" Davis, Parikh, & Weihl, 2004) and Green computing (a new frontier of Ethical computing1 starting from the assumption that in next future energy costs will be related to the environment pollution).
NFFA-Europe: enhancing European competitiveness in nanoscience research and innovation (Conference Presentation)

NASA Astrophysics Data System (ADS)

Carsughi, Flavio; Fonseca, Luis

2017-06-01

NFFA-EUROPE is an European open access resource for experimental and theoretical nanoscience and sets out a platform to carry out comprehensive projects for multidisciplinary research at the nanoscale extending from synthesis to nanocharacterization to theory and numerical simulation. Advanced infrastructures specialized on growth, nano-lithography, nano-characterization, theory and simulation and fine-analysis with Synchrotron, FEL and Neutron radiation sources are integrated in a multi-site combination to develop frontier research on methods for reproducible nanoscience research and to enable European and international researchers from diverse disciplines to carry out advanced proposals impacting science and innovation. NFFA-EUROPE will enable coordinated access to infrastructures on different aspects of nanoscience research that is not currently available at single specialized ones and without duplicating their specific scopes. Approved user projects will have access to the best suited instruments and support competences for performing the research, including access to analytical large scale facilities, theory and simulation and high-performance computing facilities. Access is offered free of charge to European users and users will receive a financial contribution for their travel, accommodation and subsistence costs. The users access will include several "installations" and will be coordinated through a single entry point portal that will activate an advanced user-infrastructure dialogue to build up a personalized access programme with an increasing return on science and innovation production. The own research activity of NFFA-EUROPE will address key bottlenecks of nanoscience research: nanostructure traceability, protocol reproducibility, in-operando nano-manipulation and analysis, open data.
OpenTopography: Addressing Big Data Challenges Using Cloud Computing, HPC, and Data Analytics

NASA Astrophysics Data System (ADS)

Crosby, C. J.; Nandigam, V.; Phan, M.; Youn, C.; Baru, C.; Arrowsmith, R.

2014-12-01

OpenTopography (OT) is a geoinformatics-based data facility initiated in 2009 for democratizing access to high-resolution topographic data, derived products, and tools. Hosted at the San Diego Supercomputer Center (SDSC), OT utilizes cyberinfrastructure, including large-scale data management, high-performance computing, and service-oriented architectures to provide efficient Web based access to large, high-resolution topographic datasets. OT collocates data with processing tools to enable users to quickly access custom data and derived products for their application. OT's ongoing R&D efforts aim to solve emerging technical challenges associated with exponential growth in data, higher order data products, as well as user base. Optimization of data management strategies can be informed by a comprehensive set of OT user access metrics that allows us to better understand usage patterns with respect to the data. By analyzing the spatiotemporal access patterns within the datasets, we can map areas of the data archive that are highly active (hot) versus the ones that are rarely accessed (cold). This enables us to architect a tiered storage environment consisting of high performance disk storage (SSD) for the hot areas and less expensive slower disk for the cold ones, thereby optimizing price to performance. From a compute perspective, OT is looking at cloud based solutions such as the Microsoft Azure platform to handle sudden increases in load. An OT virtual machine image in Microsoft's VM Depot can be invoked and deployed quickly in response to increased system demand. OT has also integrated SDSC HPC systems like the Gordon supercomputer into our infrastructure tier to enable compute intensive workloads like parallel computation of hydrologic routing on high resolution topography. This capability also allows OT to scale to HPC resources during high loads to meet user demand and provide more efficient processing. With a growing user base and maturing scientific user community comes new requests for algorithms and processing capabilities. To address this demand, OT is developing an extensible service based architecture for integrating community-developed software. This "plugable" approach to Web service deployment will enable new processing and analysis tools to run collocated with OT hosted data.
A numerical study on swimming micro-organisms inside a capillary tube

NASA Astrophysics Data System (ADS)

Zhu, Lailai; Lauga, Eric; Brandt, Luca

2011-11-01

The locomotivity of micro-organisms is highly dependent on the surrounding environments such as walls, free surface and neighbouring cells. In our current work, we perform simulations of swimming micro-organisms inside a capillary tube based on boundary element method. We focus on the swimming speed, power consumption and locomotive trajectory of swimming cells for different levels of confinement. For a cell propelling itself by tangential surface deformation, we show that it will swim along a helical trajectory with a specified swimming gait. Such a helical trajectory was observed before by experiments on swimming Paramecium inside a capillary tube. Funding by VR (the Swedish Research Council) and the National Science Foundation (grant CBET-0746285 to E.L.) is gratefully acknowledged. Computer time provided by SNIC (Swedish National Infrastructure for Computing) is also acknowledged.
FDA's Activities Supporting Regulatory Application of "Next Gen" Sequencing Technologies.

PubMed

Wilson, Carolyn A; Simonyan, Vahan

2014-01-01

Applications of next-generation sequencing (NGS) technologies require availability and access to an information technology (IT) infrastructure and bioinformatics tools for large amounts of data storage and analyses. The U.S. Food and Drug Administration (FDA) anticipates that the use of NGS data to support regulatory submissions will continue to increase as the scientific and clinical communities become more familiar with the technologies and identify more ways to apply these advanced methods to support development and evaluation of new biomedical products. FDA laboratories are conducting research on different NGS platforms and developing the IT infrastructure and bioinformatics tools needed to enable regulatory evaluation of the technologies and the data sponsors will submit. A High-performance Integrated Virtual Environment, or HIVE, has been launched, and development and refinement continues as a collaborative effort between the FDA and George Washington University to provide the tools to support these needs. The use of a highly parallelized environment facilitated by use of distributed cloud storage and computation has resulted in a platform that is both rapid and responsive to changing scientific needs. The FDA plans to further develop in-house capacity in this area, while also supporting engagement by the external community, by sponsoring an open, public workshop to discuss NGS technologies and data formats standardization, and to promote the adoption of interoperability protocols in September 2014. Next-generation sequencing (NGS) technologies are enabling breakthroughs in how the biomedical community is developing and evaluating medical products. One example is the potential application of this method to the detection and identification of microbial contaminants in biologic products. In order for the U.S. Food and Drug Administration (FDA) to be able to evaluate the utility of this technology, we need to have the information technology infrastructure and bioinformatics tools to be able to store and analyze large amounts of data. To address this need, we have developed the High-performance Integrated Virtual Environment, or HIVE. HIVE uses a combination of distributed cloud storage and distributed cloud computations to provide a platform that is both rapid and responsive to support the growing and increasingly diverse scientific and regulatory needs of FDA scientists in their evaluation of NGS in research and ultimately for evaluation of NGS data in regulatory submissions. © PDA, Inc. 2014.
Collaborative Working Architecture for IoT-Based Applications.

PubMed

Mora, Higinio; Signes-Pont, María Teresa; Gil, David; Johnsson, Magnus

2018-05-23

The new sensing applications need enhanced computing capabilities to handle the requirements of complex and huge data processing. The Internet of Things (IoT) concept brings processing and communication features to devices. In addition, the Cloud Computing paradigm provides resources and infrastructures for performing the computations and outsourcing the work from the IoT devices. This scenario opens new opportunities for designing advanced IoT-based applications, however, there is still much research to be done to properly gear all the systems for working together. This work proposes a collaborative model and an architecture to take advantage of the available computing resources. The resulting architecture involves a novel network design with different levels which combines sensing and processing capabilities based on the Mobile Cloud Computing (MCC) paradigm. An experiment is included to demonstrate that this approach can be used in diverse real applications. The results show the flexibility of the architecture to perform complex computational tasks of advanced applications.
A General Purpose High Performance Linux Installation Infrastructure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wachsmann, Alf

2002-06-17

With more and more and larger and larger Linux clusters, the question arises how to install them. This paper addresses this question by proposing a solution using only standard software components. This installation infrastructure scales well for a large number of nodes. It is also usable for installing desktop machines or diskless Linux clients, thus, is not designed for cluster installations in particular but is, nevertheless, highly performant. The infrastructure proposed uses PXE as the network boot component on the nodes. It uses DHCP and TFTP servers to get IP addresses and a bootloader to all nodes. It then usesmore » kickstart to install Red Hat Linux over NFS. We have implemented this installation infrastructure at SLAC with our given server hardware and installed a 256 node cluster in 30 minutes. This paper presents the measurements from this installation and discusses the bottlenecks in our installation.« less

Interactive analysis of geographically distributed population imaging data collections over light-path data networks

NASA Astrophysics Data System (ADS)

van Lew, Baldur; Botha, Charl P.; Milles, Julien R.; Vrooman, Henri A.; van de Giessen, Martijn; Lelieveldt, Boudewijn P. F.

2015-03-01

The cohort size required in epidemiological imaging genetics studies often mandates the pooling of data from multiple hospitals. Patient data, however, is subject to strict privacy protection regimes, and physical data storage may be legally restricted to a hospital network. To enable biomarker discovery, fast data access and interactive data exploration must be combined with high-performance computing resources, while respecting privacy regulations. We present a system using fast and inherently secure light-paths to access distributed data, thereby obviating the need for a central data repository. A secure private cloud computing framework facilitates interactive, computationally intensive exploration of this geographically distributed, privacy sensitive data. As a proof of concept, MRI brain imaging data hosted at two remote sites were processed in response to a user command at a third site. The system was able to automatically start virtual machines, run a selected processing pipeline and write results to a user accessible database, while keeping data locally stored in the hospitals. Individual tasks took approximately 50% longer compared to a locally hosted blade server but the cloud infrastructure reduced the total elapsed time by a factor of 40 using 70 virtual machines in the cloud. We demonstrated that the combination light-path and private cloud is a viable means of building an analysis infrastructure for secure data analysis. The system requires further work in the areas of error handling, load balancing and secure support of multiple users.
Monitoring techniques and alarm procedures for CMS services and sites in WLCG

DOE Office of Scientific and Technical Information (OSTI.GOV)

Molina-Perez, J.; Bonacorsi, D.; Gutsche, O.

2012-01-01

The CMS offline computing system is composed of roughly 80 sites (including most experienced T3s) and a number of central services to distribute, process and analyze data worldwide. A high level of stability and reliability is required from the underlying infrastructure and services, partially covered by local or automated monitoring and alarming systems such as Lemon and SLS, the former collects metrics from sensors installed on computing nodes and triggers alarms when values are out of range, the latter measures the quality of service and warns managers when service is affected. CMS has established computing shift procedures with personnel operatingmore » worldwide from remote Computing Centers, under the supervision of the Computing Run Coordinator at CERN. This dedicated 24/7 computing shift personnel is contributing to detect and react timely on any unexpected error and hence ensure that CMS workflows are carried out efficiently and in a sustained manner. Synergy among all the involved actors is exploited to ensure the 24/7 monitoring, alarming and troubleshooting of the CMS computing sites and services. We review the deployment of the monitoring and alarming procedures, and report on the experience gained throughout the first two years of LHC operation. We describe the efficiency of the communication tools employed, the coherent monitoring framework, the proactive alarming systems and the proficient troubleshooting procedures that helped the CMS Computing facilities and infrastructure to operate at high reliability levels.« less
Wide-area, real-time monitoring and visualization system

DOEpatents

Budhraja, Vikram S.; Dyer, James D.; Martinez Morales, Carlos A.

2013-03-19

A real-time performance monitoring system for monitoring an electric power grid. The electric power grid has a plurality of grid portions, each grid portion corresponding to one of a plurality of control areas. The real-time performance monitoring system includes a monitor computer for monitoring at least one of reliability metrics, generation metrics, transmission metrics, suppliers metrics, grid infrastructure security metrics, and markets metrics for the electric power grid. The data for metrics being monitored by the monitor computer are stored in a data base, and a visualization of the metrics is displayed on at least one display computer having a monitor. The at least one display computer in one said control area enables an operator to monitor the grid portion corresponding to a different said control area.
Wide-area, real-time monitoring and visualization system

DOEpatents

Budhraja, Vikram S [Los Angeles, CA; Dyer, James D [La Mirada, CA; Martinez Morales, Carlos A [Upland, CA

2011-11-15

A real-time performance monitoring system for monitoring an electric power grid. The electric power grid has a plurality of grid portions, each grid portion corresponding to one of a plurality of control areas. The real-time performance monitoring system includes a monitor computer for monitoring at least one of reliability metrics, generation metrics, transmission metrics, suppliers metrics, grid infrastructure security metrics, and markets metrics for the electric power grid. The data for metrics being monitored by the monitor computer are stored in a data base, and a visualization of the metrics is displayed on at least one display computer having a monitor. The at least one display computer in one said control area enables an operator to monitor the grid portion corresponding to a different said control area.
Information Power Grid: Distributed High-Performance Computing and Large-Scale Data Management for Science and Engineering

NASA Technical Reports Server (NTRS)

Johnston, William E.; Gannon, Dennis; Nitzberg, Bill

2000-01-01

We use the term "Grid" to refer to distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. This infrastructure includes: (1) Tools for constructing collaborative, application oriented Problem Solving Environments / Frameworks (the primary user interfaces for Grids); (2) Programming environments, tools, and services providing various approaches for building applications that use aggregated computing and storage resources, and federated data sources; (3) Comprehensive and consistent set of location independent tools and services for accessing and managing dynamic collections of widely distributed resources: heterogeneous computing systems, storage systems, real-time data sources and instruments, human collaborators, and communications systems; (4) Operational infrastructure including management tools for distributed systems and distributed resources, user services, accounting and auditing, strong and location independent user authentication and authorization, and overall system security services The vision for NASA's Information Power Grid - a computing and data Grid - is that it will provide significant new capabilities to scientists and engineers by facilitating routine construction of information based problem solving environments / frameworks. Such Grids will knit together widely distributed computing, data, instrument, and human resources into just-in-time systems that can address complex and large-scale computing and data analysis problems. Examples of these problems include: (1) Coupled, multidisciplinary simulations too large for single systems (e.g., multi-component NPSS turbomachine simulation); (2) Use of widely distributed, federated data archives (e.g., simultaneous access to metrological, topological, aircraft performance, and flight path scheduling databases supporting a National Air Space Simulation systems}; (3) Coupling large-scale computing and data systems to scientific and engineering instruments (e.g., realtime interaction with experiments through real-time data analysis and interpretation presented to the experimentalist in ways that allow direct interaction with the experiment (instead of just with instrument control); (5) Highly interactive, augmented reality and virtual reality remote collaborations (e.g., Ames / Boeing Remote Help Desk providing field maintenance use of coupled video and NDI to a remote, on-line airframe structures expert who uses this data to index into detailed design databases, and returns 3D internal aircraft geometry to the field); (5) Single computational problems too large for any single system (e.g. the rotocraft reference calculation). Grids also have the potential to provide pools of resources that could be called on in extraordinary / rapid response situations (such as disaster response) because they can provide common interfaces and access mechanisms, standardized management, and uniform user authentication and authorization, for large collections of distributed resources (whether or not they normally function in concert). IPG development and deployment is addressing requirements obtained by analyzing a number of different application areas, in particular from the NASA Aero-Space Technology Enterprise. This analysis has focussed primarily on two types of users: the scientist / design engineer whose primary interest is problem solving (e.g. determining wing aerodynamic characteristics in many different operating environments), and whose primary interface to IPG will be through various sorts of problem solving frameworks. The second type of user is the tool designer: the computational scientists who convert physics and mathematics into code that can simulate the physical world. These are the two primary users of IPG, and they have rather different requirements. The results of the analysis of the needs of these two types of users provides a broad set of requirements that gives rise to a general set of required capabilities. The IPG project is intended to address all of these requirements. In some cases the required computing technology exists, and in some cases it must be researched and developed. The project is using available technology to provide a prototype set of capabilities in a persistent distributed computing testbed. Beyond this, there are required capabilities that are not immediately available, and whose development spans the range from near-term engineering development (one to two years) to much longer term R&D (three to six years). Additional information is contained in the original.
The computing and data infrastructure to interconnect EEE stations

NASA Astrophysics Data System (ADS)

Noferini, F.; EEE Collaboration

2016-07-01

The Extreme Energy Event (EEE) experiment is devoted to the search of high energy cosmic rays through a network of telescopes installed in about 50 high schools distributed throughout the Italian territory. This project requires a peculiar data management infrastructure to collect data registered in stations very far from each other and to allow a coordinated analysis. Such an infrastructure is realized at INFN-CNAF, which operates a Cloud facility based on the OpenStack opensource Cloud framework and provides Infrastructure as a Service (IaaS) for its users. In 2014 EEE started to use it for collecting, monitoring and reconstructing the data acquired in all the EEE stations. For the synchronization between the stations and the INFN-CNAF infrastructure we used BitTorrent Sync, a free peer-to-peer software designed to optimize data syncronization between distributed nodes. All data folders are syncronized with the central repository in real time to allow an immediate reconstruction of the data and their publication in a monitoring webpage. We present the architecture and the functionalities of this data management system that provides a flexible environment for the specific needs of the EEE project.
GSKY: A scalable distributed geospatial data server on the cloud

NASA Astrophysics Data System (ADS)

Rozas Larraondo, Pablo; Pringle, Sean; Antony, Joseph; Evans, Ben

2017-04-01

Earth systems, environmental and geophysical datasets are an extremely valuable sources of information about the state and evolution of the Earth. Being able to combine information coming from different geospatial collections is in increasing demand by the scientific community, and requires managing and manipulating data with different formats and performing operations such as map reprojections, resampling and other transformations. Due to the large data volume inherent in these collections, storing multiple copies of them is unfeasible and so such data manipulation must be performed on-the-fly using efficient, high performance techniques. Ideally this should be performed using a trusted data service and common system libraries to ensure wide use and reproducibility. Recent developments in distributed computing based on dynamic access to significant cloud infrastructure opens the door for such new ways of processing geospatial data on demand. The National Computational Infrastructure (NCI), hosted at the Australian National University (ANU), has over 10 Petabytes of nationally significant research data collections. Some of these collections, which comprise a variety of observed and modelled geospatial data, are now made available via a highly distributed geospatial data server, called GSKY (pronounced [jee-skee]). GSKY supports on demand processing of large geospatial data products such as satellite earth observation data as well as numerical weather products, allowing interactive exploration and analysis of the data. It dynamically and efficiently distributes the required computations among cloud nodes providing a scalable analysis framework that can adapt to serve large number of concurrent users. Typical geospatial workflows handling different file formats and data types, or blending data in different coordinate projections and spatio-temporal resolutions, is handled transparently by GSKY. This is achieved by decoupling the data ingestion and indexing process as an independent service. An indexing service crawls data collections either locally or remotely by extracting, storing and indexing all spatio-temporal metadata associated with each individual record. GSKY provides the user with the ability of specifying how ingested data should be aggregated, transformed and presented. It presents an OGC standards-compliant interface, allowing ready accessibility for users of the data via Web Map Services (WMS), Web Processing Services (WPS) or raw data arrays using Web Coverage Services (WCS). The presentation will show some cases where we have used this new capability to provide a significant improvement over previous approaches.
Health care information infrastructure: what will it be and how will we get there?

NASA Astrophysics Data System (ADS)

Kun, Luis G.

1996-02-01

During the first Health Care Technology Policy [HCTPI conference last year, during Health Care Reform, four major issues were brought up in regards to the underway efforts to develop a Computer Based Patient Record (CBPR)I the National Information Infrastructure (NIl) as part of the High Performance Computers & Communications (HPCC), and the so-called "Patient Card" . More specifically it was explained how a national information system will greatly affect the way health care delivery is provided to the United States public and reduce its costs. These four issues were: Constructing a National Information Infrastructure (NIl); Building a Computer Based Patient Record System; Bringing the collective resources of our National Laboratories to bear in developing and implementing the NIl and CBPR, as well as a security system with which to safeguard the privacy rights of patients and the physician-patient privilege; Utilizing Government (e.g. DOD, DOE) capabilities (technology and human resources) to maximize resource utilization, create new jobs and accelerate technology transfer to address health care issues. During the second HCTP conference, in mid 1 995, a section of this meeting entitled: "Health Care Technology Assets of the Federal Government" addressed benefits of the technology transfer which should occur for maximizing already developed resources. Also a section entitled:"Transfer and Utilization of Government Technology Assets to the Private Sector", looked at both Health Care and non-Health Care related technologies since many areas such as Information Technologies (i.e. imaging, communications, archival I retrieval, systems integration, information display, multimedia, heterogeneous data bases, etc.) already exist and are part of our National Labs and/or other federal agencies, i.e. ARPA. These technologies although they are not labeled under "Health Care" programs they could provide enormous value to address technical needs. An additional issue deals with both the technical (hardware, software) and human expertise that resides within these labs and their possible role in creating cost effective solutions.
The CMS High Level Trigger System: Experience and Future Development

NASA Astrophysics Data System (ADS)

Bauer, G.; Behrens, U.; Bowen, M.; Branson, J.; Bukowiec, S.; Cittolin, S.; Coarasa, J. A.; Deldicque, C.; Dobson, M.; Dupont, A.; Erhan, S.; Flossdorf, A.; Gigi, D.; Glege, F.; Gomez-Reino, R.; Hartl, C.; Hegeman, J.; Holzner, A.; Hwong, Y. L.; Masetti, L.; Meijers, F.; Meschi, E.; Mommsen, R. K.; O'Dell, V.; Orsini, L.; Paus, C.; Petrucci, A.; Pieri, M.; Polese, G.; Racz, A.; Raginel, O.; Sakulin, H.; Sani, M.; Schwick, C.; Shpakov, D.; Simon, S.; Spataru, A. C.; Sumorok, K.

2012-12-01

The CMS experiment at the LHC features a two-level trigger system. Events accepted by the first level trigger, at a maximum rate of 100 kHz, are read out by the Data Acquisition system (DAQ), and subsequently assembled in memory in a farm of computers running a software high-level trigger (HLT), which selects interesting events for offline storage and analysis at a rate of order few hundred Hz. The HLT algorithms consist of sequences of offline-style reconstruction and filtering modules, executed on a farm of 0(10000) CPU cores built from commodity hardware. Experience from the operation of the HLT system in the collider run 2010/2011 is reported. The current architecture of the CMS HLT, its integration with the CMS reconstruction framework and the CMS DAQ, are discussed in the light of future development. The possible short- and medium-term evolution of the HLT software infrastructure to support extensions of the HLT computing power, and to address remaining performance and maintenance issues, are discussed.
Approaches to a global quantum key distribution network

NASA Astrophysics Data System (ADS)

Islam, Tanvirul; Bedington, Robert; Ling, Alexander

2017-10-01

Progress in realising quantum computers threatens to weaken existing public key encryption infrastructure. A global quantum key distribution (QKD) network can play a role in computational attack-resistant encryption. Such a network could use a constellation of high altitude platforms such as airships and satellites as trusted nodes to facilitate QKD between any two points on the globe on demand. This requires both space-to-ground and inter-platform links. However, the prohibitive cost of traditional satellite based development limits the experimental work demonstrating relevant technologies. To accelerate progress towards a global network, we use an emerging class of shoe-box sized spacecraft known as CubeSats. We have designed a polarization entangled photon pair source that can operate on board CubeSats. The robustness and miniature form factor of our entanglement source makes it especially suitable for performing pathfinder missions that studies QKD between two high altitude platforms. The technological outcomes of such mission would be the essential building blocks for a global QKD network.
OCCAM: a flexible, multi-purpose and extendable HPC cluster

NASA Astrophysics Data System (ADS)

Aldinucci, M.; Bagnasco, S.; Lusso, S.; Pasteris, P.; Rabellino, S.; Vallero, S.

2017-10-01

The Open Computing Cluster for Advanced data Manipulation (OCCAM) is a multipurpose flexible HPC cluster designed and operated by a collaboration between the University of Torino and the Sezione di Torino of the Istituto Nazionale di Fisica Nucleare. It is aimed at providing a flexible, reconfigurable and extendable infrastructure to cater to a wide range of different scientific computing use cases, including ones from solid-state chemistry, high-energy physics, computer science, big data analytics, computational biology, genomics and many others. Furthermore, it will serve as a platform for R&D activities on computational technologies themselves, with topics ranging from GPU acceleration to Cloud Computing technologies. A heterogeneous and reconfigurable system like this poses a number of challenges related to the frequency at which heterogeneous hardware resources might change their availability and shareability status, which in turn affect methods and means to allocate, manage, optimize, bill, monitor VMs, containers, virtual farms, jobs, interactive bare-metal sessions, etc. This work describes some of the use cases that prompted the design and construction of the HPC cluster, its architecture and resource provisioning model, along with a first characterization of its performance by some synthetic benchmark tools and a few realistic use-case tests.
Automation of multi-agent control for complex dynamic systems in heterogeneous computational network

NASA Astrophysics Data System (ADS)

Oparin, Gennady; Feoktistov, Alexander; Bogdanova, Vera; Sidorov, Ivan

2017-01-01

The rapid progress of high-performance computing entails new challenges related to solving large scientific problems for various subject domains in a heterogeneous distributed computing environment (e.g., a network, Grid system, or Cloud infrastructure). The specialists in the field of parallel and distributed computing give the special attention to a scalability of applications for problem solving. An effective management of the scalable application in the heterogeneous distributed computing environment is still a non-trivial issue. Control systems that operate in networks, especially relate to this issue. We propose a new approach to the multi-agent management for the scalable applications in the heterogeneous computational network. The fundamentals of our approach are the integrated use of conceptual programming, simulation modeling, network monitoring, multi-agent management, and service-oriented programming. We developed a special framework for an automation of the problem solving. Advantages of the proposed approach are demonstrated on the parametric synthesis example of the static linear regulator for complex dynamic systems. Benefits of the scalable application for solving this problem include automation of the multi-agent control for the systems in a parallel mode with various degrees of its detailed elaboration.
Spatial 3D infrastructure: display-independent software framework, high-speed rendering electronics, and several new displays

NASA Astrophysics Data System (ADS)

Chun, Won-Suk; Napoli, Joshua; Cossairt, Oliver S.; Dorval, Rick K.; Hall, Deirdre M.; Purtell, Thomas J., II; Schooler, James F.; Banker, Yigal; Favalora, Gregg E.

2005-03-01

We present a software and hardware foundation to enable the rapid adoption of 3-D displays. Different 3-D displays - such as multiplanar, multiview, and electroholographic displays - naturally require different rendering methods. The adoption of these displays in the marketplace will be accelerated by a common software framework. The authors designed the SpatialGL API, a new rendering framework that unifies these display methods under one interface. SpatialGL enables complementary visualization assets to coexist through a uniform infrastructure. Also, SpatialGL supports legacy interfaces such as the OpenGL API. The authors" first implementation of SpatialGL uses multiview and multislice rendering algorithms to exploit the performance of modern graphics processing units (GPUs) to enable real-time visualization of 3-D graphics from medical imaging, oil & gas exploration, and homeland security. At the time of writing, SpatialGL runs on COTS workstations (both Windows and Linux) and on Actuality"s high-performance embedded computational engine that couples an NVIDIA GeForce 6800 Ultra GPU, an AMD Athlon 64 processor, and a proprietary, high-speed, programmable volumetric frame buffer that interfaces to a 1024 x 768 x 3 digital projector. Progress is illustrated using an off-the-shelf multiview display, Actuality"s multiplanar Perspecta Spatial 3D System, and an experimental multiview display. The experimental display is a quasi-holographic view-sequential system that generates aerial imagery measuring 30 mm x 25 mm x 25 mm, providing 198 horizontal views.
Consolidation and development roadmap of the EMI middleware

NASA Astrophysics Data System (ADS)

Kónya, B.; Aiftimiei, C.; Cecchi, M.; Field, L.; Fuhrmann, P.; Nilsen, J. K.; White, J.

2012-12-01

Scientific research communities have benefited recently from the increasing availability of computing and data infrastructures with unprecedented capabilities for large scale distributed initiatives. These infrastructures are largely defined and enabled by the middleware they deploy. One of the major issues in the current usage of research infrastructures is the need to use similar but often incompatible middleware solutions. The European Middleware Initiative (EMI) is a collaboration of the major European middleware providers ARC, dCache, gLite and UNICORE. EMI aims to: deliver a consolidated set of middleware components for deployment in EGI, PRACE and other Distributed Computing Infrastructures; extend the interoperability between grids and other computing infrastructures; strengthen the reliability of the services; establish a sustainable model to maintain and evolve the middleware; fulfil the requirements of the user communities. This paper presents the consolidation and development objectives of the EMI software stack covering the last two years. The EMI development roadmap is introduced along the four technical areas of compute, data, security and infrastructure. The compute area plan focuses on consolidation of standards and agreements through a unified interface for job submission and management, a common format for accounting, the wide adoption of GLUE schema version 2.0 and the provision of a common framework for the execution of parallel jobs. The security area is working towards a unified security model and lowering the barriers to Grid usage by allowing users to gain access with their own credentials. The data area is focusing on implementing standards to ensure interoperability with other grids and industry components and to reuse already existing clients in operating systems and open source distributions. One of the highlights of the infrastructure area is the consolidation of the information system services via the creation of a common information backbone.
Minimizing Overhead for Secure Computation and Fully Homomorphic Encryption: Overhead

DTIC Science & Technology

2015-11-01

many inputs. We also improved our compiler infrastructure to handle very large circuits in a more scalable way. In Jan’13, we employed the AESNI and...Amazon’s elastic compute infrastructure , and is running under a Xen hypervisor. Since we do not have direct access to the bare metal, we cannot...creating novel opportunities for compressing au- thentication overhead. It is especially compelling that existing public key infrastructures can be used
Academia Sinica, TW E-science to Assistant Seismic Observations for Earthquake Research, Monitor and Hazard Reduction Surrounding the South China Sea

NASA Astrophysics Data System (ADS)

Huang, Bor-Shouh; Liu, Chun-Chi; Yen, Eric; Liang, Wen-Tzong; Lin, Simon C.; Huang, Win-Gee; Lee, Shiann-Jong; Chen, Hsin-Yen

Experience from the 1994 giant Sumatra earthquake, seismic and tsunami hazard have been considered as important issues in the South China Sea and its surrounding region, and attracted many seismologist's interesting. Currently, more than 25 broadband seismic instruments are currently operated by Institute of Earth Sciences, Academia Sinica in northern Vietnam to study the geodynamic evolution of the Red river fracture zone and rearranged to distribute to southern Vietnam recently to study the geodynamic evolution and its deep structures of the South China Sea. Similar stations are planned to deploy in Philippines in near future. In planning, some high quality stations may be as permanent stations and added continuous GPS observations, and instruments to be maintained and operated by several cooperation institutes, for instance, Institute of Geophysics, Vietnamese Acadamy of Sciences and Technology in Vietnam and Philippine Institute of Volcanology and Seismology in Philippines. Finally, those stations will be planed to upgrade as real time transmission stations for earthquake monitoring and tsunami warning. However, high speed data transfer within different agencies is always a critical issue for successful network operation. By taking advantage of both EGEE and EUAsiaGrid e-Infrastructure, Academia Sinica Grid Computing Centre coordinates researchers from various Asian countries to construct a platform to high performance data transfer for huge parallel computation. Efforts from this data service and a newly build earthquake data centre for data management may greatly improve seismic network performance. Implementation of Grid infrastructure and e-science issues in this region may assistant development of earthquake research, monitor and natural hazard reduction. In the near future, we will search for new cooperation continually from the surrounding countries of the South China Sea to install new seismic stations to construct a complete seismic network of the South China Sea and encourage studies for earthquake sciences and natural hazard reductions.
High Speed Internet Access Using Cellular Infrastructure

DTIC Science & Technology

2004-09-01

62 Figure 50 Initial Connection from Laptop to Mobile Phone via Bluetooth . ....................64 Figure 51 Mobile Phone Discovered by the Laptop...tailor them to their needs. Other hardware improvements include Infrared and Bluetooth capabilities for external connectivity with a computer, as well...mobile phone can communicate with a computer through a phone’s vendor specific cable, Infrared or Bluetooth . The latter two depend on both devices
Pilot-in-the-Loop CFD Method Development

DTIC Science & Technology

2017-04-20

the methods on the NAVAIR Manned Flight Simulator. Activities this period During this report period, we implemented the CRAFT CFD code on the...Penn State VLRCROE Flight simulator and performed the first Pilot-in-the-Loop PILCFD tests at Penn State using the COCOA5 clusters. The initial tests...integration of the flight simulator and Penn State computing infrastructure. Initial tests showed slower performance than real-time (3x slower than real
Large-Scale Data Collection Metadata Management at the National Computation Infrastructure

NASA Astrophysics Data System (ADS)

Wang, J.; Evans, B. J. K.; Bastrakova, I.; Ryder, G.; Martin, J.; Duursma, D.; Gohar, K.; Mackey, T.; Paget, M.; Siddeswara, G.

2014-12-01

Data Collection management has become an essential activity at the National Computation Infrastructure (NCI) in Australia. NCI's partners (CSIRO, Bureau of Meteorology, Australian National University, and Geoscience Australia), supported by the Australian Government and Research Data Storage Infrastructure (RDSI), have established a national data resource that is co-located with high-performance computing. This paper addresses the metadata management of these data assets over their lifetime. NCI manages 36 data collections (10+ PB) categorised as earth system sciences, climate and weather model data assets and products, earth and marine observations and products, geosciences, terrestrial ecosystem, water management and hydrology, astronomy, social science and biosciences. The data is largely sourced from NCI partners, the custodians of many of the national scientific records, and major research community organisations. The data is made available in a HPC and data-intensive environment - a ~56000 core supercomputer, virtual labs on a 3000 core cloud system, and data services. By assembling these large national assets, new opportunities have arisen to harmonise the data collections, making a powerful cross-disciplinary resource.To support the overall management, a Data Management Plan (DMP) has been developed to record the workflows, procedures, the key contacts and responsibilities. The DMP has fields that can be exported to the ISO19115 schema and to the collection level catalogue of GeoNetwork. The subset or file level metadata catalogues are linked with the collection level through parent-child relationship definition using UUID. A number of tools have been developed that support interactive metadata management, bulk loading of data, and support for computational workflows or data pipelines. NCI creates persistent identifiers for each of the assets. The data collection is tracked over its lifetime, and the recognition of the data providers, data owners, data generators and data aggregators are updated. A Digital Object Identifier is assigned using the Australian National Data Service (ANDS). Once the data has been quality assured, a DOI is minted and the metadata record updated. NCI's data citation policy establishes the relationship between research outcomes, data providers, and the data.
The GENIUS Grid Portal and robot certificates: a new tool for e-Science

PubMed Central

Barbera, Roberto; Donvito, Giacinto; Falzone, Alberto; La Rocca, Giuseppe; Milanesi, Luciano; Maggi, Giorgio Pietro; Vicario, Saverio

2009-01-01

Background Grid technology is the computing model which allows users to share a wide pletora of distributed computational resources regardless of their geographical location. Up to now, the high security policy requested in order to access distributed computing resources has been a rather big limiting factor when trying to broaden the usage of Grids into a wide community of users. Grid security is indeed based on the Public Key Infrastructure (PKI) of X.509 certificates and the procedure to get and manage those certificates is unfortunately not straightforward. A first step to make Grids more appealing for new users has recently been achieved with the adoption of robot certificates. Methods Robot certificates have recently been introduced to perform automated tasks on Grids on behalf of users. They are extremely useful for instance to automate grid service monitoring, data processing production, distributed data collection systems. Basically these certificates can be used to identify a person responsible for an unattended service or process acting as client and/or server. Robot certificates can be installed on a smart card and used behind a portal by everyone interested in running the related applications in a Grid environment using a user-friendly graphic interface. In this work, the GENIUS Grid Portal, powered by EnginFrame, has been extended in order to support the new authentication based on the adoption of these robot certificates. Results The work carried out and reported in this manuscript is particularly relevant for all users who are not familiar with personal digital certificates and the technical aspects of the Grid Security Infrastructure (GSI). The valuable benefits introduced by robot certificates in e-Science can so be extended to users belonging to several scientific domains, providing an asset in raising Grid awareness to a wide number of potential users. Conclusion The adoption of Grid portals extended with robot certificates, can really contribute to creating transparent access to computational resources of Grid Infrastructures, enhancing the spread of this new paradigm in researchers' working life to address new global scientific challenges. The evaluated solution can of course be extended to other portals, applications and scientific communities. PMID:19534747

The GENIUS Grid Portal and robot certificates: a new tool for e-Science.

PubMed

Barbera, Roberto; Donvito, Giacinto; Falzone, Alberto; La Rocca, Giuseppe; Milanesi, Luciano; Maggi, Giorgio Pietro; Vicario, Saverio

2009-06-16

Grid technology is the computing model which allows users to share a wide pletora of distributed computational resources regardless of their geographical location. Up to now, the high security policy requested in order to access distributed computing resources has been a rather big limiting factor when trying to broaden the usage of Grids into a wide community of users. Grid security is indeed based on the Public Key Infrastructure (PKI) of X.509 certificates and the procedure to get and manage those certificates is unfortunately not straightforward. A first step to make Grids more appealing for new users has recently been achieved with the adoption of robot certificates. Robot certificates have recently been introduced to perform automated tasks on Grids on behalf of users. They are extremely useful for instance to automate grid service monitoring, data processing production, distributed data collection systems. Basically these certificates can be used to identify a person responsible for an unattended service or process acting as client and/or server. Robot certificates can be installed on a smart card and used behind a portal by everyone interested in running the related applications in a Grid environment using a user-friendly graphic interface. In this work, the GENIUS Grid Portal, powered by EnginFrame, has been extended in order to support the new authentication based on the adoption of these robot certificates. The work carried out and reported in this manuscript is particularly relevant for all users who are not familiar with personal digital certificates and the technical aspects of the Grid Security Infrastructure (GSI). The valuable benefits introduced by robot certificates in e-Science can so be extended to users belonging to several scientific domains, providing an asset in raising Grid awareness to a wide number of potential users. The adoption of Grid portals extended with robot certificates, can really contribute to creating transparent access to computational resources of Grid Infrastructures, enhancing the spread of this new paradigm in researchers' working life to address new global scientific challenges. The evaluated solution can of course be extended to other portals, applications and scientific communities.
75 FR 70899 - Submission for OMB Review; Comment Request

Federal Register 2010, 2011, 2012, 2013, 2014

2010-11-19

... submit to the Office of Management and Budget (OMB) for clearance the following proposal for collection... Annual Burden Hours: 2,952. Public Computer Center Reports (Quarterly and Annually) Number of Respondents... specific to Infrastructure and Comprehensive Community Infrastructure, Public Computer Center, and...
Enterprise Cloud Architecture for Chinese Ministry of Railway

NASA Astrophysics Data System (ADS)

Shan, Xumei; Liu, Hefeng

Enterprise like PRC Ministry of Railways (MOR), is facing various challenges ranging from highly distributed computing environment and low legacy system utilization, Cloud Computing is increasingly regarded as one workable solution to address this. This article describes full scale cloud solution with Intel Tashi as virtual machine infrastructure layer, Hadoop HDFS as computing platform, and self developed SaaS interface, gluing virtual machine and HDFS with Xen hypervisor. As a result, on demand computing task application and deployment have been tackled per MOR real working scenarios at the end of article.
AGIS: The ATLAS Grid Information System

NASA Astrophysics Data System (ADS)

Anisenkov, A.; Di Girolamo, A.; Klimentov, A.; Oleynik, D.; Petrosyan, A.; Atlas Collaboration

2014-06-01

ATLAS, a particle physics experiment at the Large Hadron Collider at CERN, produced petabytes of data annually through simulation production and tens of petabytes of data per year from the detector itself. The ATLAS computing model embraces the Grid paradigm and a high degree of decentralization and computing resources able to meet ATLAS requirements of petabytes scale data operations. In this paper we describe the ATLAS Grid Information System (AGIS), designed to integrate configuration and status information about resources, services and topology of the computing infrastructure used by the ATLAS Distributed Computing applications and services.
High-speed and high-fidelity system and method for collecting network traffic

DOEpatents

Weigle, Eric H [Los Alamos, NM

2010-08-24

A system is provided for the high-speed and high-fidelity collection of network traffic. The system can collect traffic at gigabit-per-second (Gbps) speeds, scale to terabit-per-second (Tbps) speeds, and support additional functions such as real-time network intrusion detection. The present system uses a dedicated operating system for traffic collection to maximize efficiency, scalability, and performance. A scalable infrastructure and apparatus for the present system is provided by splitting the work performed on one host onto multiple hosts. The present system simultaneously addresses the issues of scalability, performance, cost, and adaptability with respect to network monitoring, collection, and other network tasks. In addition to high-speed and high-fidelity network collection, the present system provides a flexible infrastructure to perform virtually any function at high speeds such as real-time network intrusion detection and wide-area network emulation for research purposes.
The ASCI Network for SC 2000: Gigabyte Per Second Networking

DOE Office of Scientific and Technical Information (OSTI.GOV)

PRATT, THOMAS J.; NAEGLE, JOHN H.; MARTINEZ JR., LUIS G.

2001-11-01

This document highlights the Discom's Distance computing and communication team activities at the 2000 Supercomputing conference in Dallas Texas. This conference is sponsored by the IEEE and ACM. Sandia's participation in the conference has now spanned a decade, for the last five years Sandia National Laboratories, Los Alamos National Lab and Lawrence Livermore National Lab have come together at the conference under the DOE's ASCI, Accelerated Strategic Computing Initiatives, Program rubric to demonstrate ASCI's emerging capabilities in computational science and our combined expertise in high performance computer science and communication networking developments within the program. At SC 2000, DISCOM demonstratedmore » an infrastructure. DISCOM2 uses this forum to demonstrate and focus communication and pre-standard implementation of 10 Gigabit Ethernet, the first gigabyte per second data IP network transfer application, and VPN technology that enabled a remote Distributed Resource Management tools demonstration. Additionally a national OC48 POS network was constructed to support applications running between the show floor and home facilities. This network created the opportunity to test PSE's Parallel File Transfer Protocol (PFTP) across a network that had similar speed and distances as the then proposed DISCOM WAN. The SCINET SC2000 showcased wireless networking and the networking team had the opportunity to explore this emerging technology while on the booth. This paper documents those accomplishments, discusses the details of their convention exhibit floor. We also supported the production networking needs of the implementation, and describes how these demonstrations supports DISCOM overall strategies in high performance computing networking.« less
A Low-Cost and Energy-Efficient Multiprocessor System-on-Chip for UWB MAC Layer

NASA Astrophysics Data System (ADS)

Xiao, Hao; Isshiki, Tsuyoshi; Khan, Arif Ullah; Li, Dongju; Kunieda, Hiroaki; Nakase, Yuko; Kimura, Sadahiro

Ultra-wideband (UWB) technology has attracted much attention recently due to its high data rate and low emission power. Its media access control (MAC) protocol, WiMedia MAC, promises a lot of facilities for high-speed and high-quality wireless communication. However, these benefits in turn involve a large amount of computational load, which challenges the traditional uniprocessor architecture based implementation method to provide the required performance. However, the constrained cost and power budget, on the other hand, makes using commercial multiprocessor solutions unrealistic. In this paper, a low-cost and energy-efficient multiprocessor system-on-chip (MPSoC), which tackles at once the aspects of system design, software migration and hardware architecture, is presented for the implementation of UWB MAC layer. Experimental results show that the proposed MPSoC, based on four simple RISC processors and shared-memory infrastructure, achieves up to 45% performance improvement and 65% power saving, but takes 15% less area than the uniprocessor implementation.
Cloud computing approaches to accelerate drug discovery value chain.

PubMed

Garg, Vibhav; Arora, Suchir; Gupta, Chitra

2011-12-01

Continued advancements in the area of technology have helped high throughput screening (HTS) evolve from a linear to parallel approach by performing system level screening. Advanced experimental methods used for HTS at various steps of drug discovery (i.e. target identification, target validation, lead identification and lead validation) can generate data of the order of terabytes. As a consequence, there is pressing need to store, manage, mine and analyze this data to identify informational tags. This need is again posing challenges to computer scientists to offer the matching hardware and software infrastructure, while managing the varying degree of desired computational power. Therefore, the potential of "On-Demand Hardware" and "Software as a Service (SAAS)" delivery mechanisms cannot be denied. This on-demand computing, largely referred to as Cloud Computing, is now transforming the drug discovery research. Also, integration of Cloud computing with parallel computing is certainly expanding its footprint in the life sciences community. The speed, efficiency and cost effectiveness have made cloud computing a 'good to have tool' for researchers, providing them significant flexibility, allowing them to focus on the 'what' of science and not the 'how'. Once reached to its maturity, Discovery-Cloud would fit best to manage drug discovery and clinical development data, generated using advanced HTS techniques, hence supporting the vision of personalized medicine.
Brilliant gamma beams for industrial applications: new opportunities, new challenges

NASA Astrophysics Data System (ADS)

Iancu, V.; Suliman, G.; Turturica, G. V.; Iovea, M.; Daito, I.; Ohgaki, H.; Matei, C.; Ur, C. A.; Balabanski, D. L.

2016-10-01

The Nuclear Physics oriented pillar of the pan-European Extreme Light Infrastructure (ELI-NP) will host an ultra-bright, energy tunable, and quasi-monochromatic gamma-ray beam system in the range of 0.2-19.5 MeV produced by laser-Compton backscattering technique. The applied research program envisioned at ELI-NP targets to use nuclear resonance fluorescence (NRF) and computed tomography to provide new opportunities for industry and society. High sensitivity NRF-based investigations can be successfully applied to safeguard applications and management of radioactive wastes as well as to uncharted fields like cultural heritage and medical imaging. Gamma-ray radioscopy and computed tomography performed at ELI-NP has the potential to achieve high resolution in industrial-sized objects provided the detection challenges introduced by the unique characteristics of the gamma beam are overcome. Here we discuss the foreseen industrial applications that will benefit from the high quality and unique characteristics of ELI-NP gamma beam and the challenges they present. We present the experimental setups proposed to be implemented for this goal, discuss their performance based on analytical calculations and numerical Monte-Carlo simulations, and comment about constrains imposed by the limitation of current scintillator detectors. Several gamma-beam monitoring devices based on scintillator detectors will also be discussed.
New security infrastructure model for distributed computing systems

NASA Astrophysics Data System (ADS)

Dubenskaya, J.; Kryukov, A.; Demichev, A.; Prikhodko, N.

2016-02-01

At the paper we propose a new approach to setting up a user-friendly and yet secure authentication and authorization procedure in a distributed computing system. The security concept of the most heterogeneous distributed computing systems is based on the public key infrastructure along with proxy certificates which are used for rights delegation. In practice a contradiction between the limited lifetime of the proxy certificates and the unpredictable time of the request processing is a big issue for the end users of the system. We propose to use unlimited in time hashes which are individual for each request instead of proxy certificate. Our approach allows to avoid using of the proxy certificates. Thus the security infrastructure of distributed computing system becomes easier for development, support and use.
Cloud-Based Numerical Weather Prediction for Near Real-Time Forecasting and Disaster Response

NASA Technical Reports Server (NTRS)

Molthan, Andrew; Case, Jonathan; Venners, Jason; Schroeder, Richard; Checchi, Milton; Zavodsky, Bradley; Limaye, Ashutosh; O'Brien, Raymond

2015-01-01

The use of cloud computing resources continues to grow within the public and private sector components of the weather enterprise as users become more familiar with cloud-computing concepts, and competition among service providers continues to reduce costs and other barriers to entry. Cloud resources can also provide capabilities similar to high-performance computing environments, supporting multi-node systems required for near real-time, regional weather predictions. Referred to as "Infrastructure as a Service", or IaaS, the use of cloud-based computing hardware in an on-demand payment system allows for rapid deployment of a modeling system in environments lacking access to a large, supercomputing infrastructure. Use of IaaS capabilities to support regional weather prediction may be of particular interest to developing countries that have not yet established large supercomputing resources, but would otherwise benefit from a regional weather forecasting capability. Recently, collaborators from NASA Marshall Space Flight Center and Ames Research Center have developed a scripted, on-demand capability for launching the NOAA/NWS Science and Training Resource Center (STRC) Environmental Modeling System (EMS), which includes pre-compiled binaries of the latest version of the Weather Research and Forecasting (WRF) model. The WRF-EMS provides scripting for downloading appropriate initial and boundary conditions from global models, along with higher-resolution vegetation, land surface, and sea surface temperature data sets provided by the NASA Short-term Prediction Research and Transition (SPoRT) Center. This presentation will provide an overview of the modeling system capabilities and benchmarks performed on the Amazon Elastic Compute Cloud (EC2) environment. In addition, the presentation will discuss future opportunities to deploy the system in support of weather prediction in developing countries supported by NASA's SERVIR Project, which provides capacity building activities in environmental monitoring and prediction across a growing number of regional hubs throughout the world. Capacity-building applications that extend numerical weather prediction to developing countries are intended to provide near real-time applications to benefit public health, safety, and economic interests, but may have a greater impact during disaster events by providing a source for local predictions of weather-related hazards, or impacts that local weather events may have during the recovery phase.
Critical infrastructure protection : significant challenges in developing national capabilities

DOT National Transportation Integrated Search

2001-04-01

To address the concerns about protecting the nation's critical computer-dependent infrastructure, this General Accounting Office (GOA) report describes the progress of the National Infrastructure Protection Center (NIPC) in (1) developing national ca...
SEQADAPT: an adaptable system for the tracking, storage and analysis of high throughput sequencing experiments.

PubMed

Burdick, David B; Cavnor, Chris C; Handcock, Jeremy; Killcoyne, Sarah; Lin, Jake; Marzolf, Bruz; Ramsey, Stephen A; Rovira, Hector; Bressler, Ryan; Shmulevich, Ilya; Boyle, John

2010-07-14

High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.
SEQADAPT: an adaptable system for the tracking, storage and analysis of high throughput sequencing experiments

PubMed Central

2010-01-01

Background High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Results Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. Conclusion The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services. PMID:20630057
Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics

PubMed Central

Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A.; Caron, Christophe

2015-01-01

Summary: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. Availability and implementation: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). Contact: contact@workflow4metabolomics.org PMID:25527831
Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics.

PubMed

Giacomoni, Franck; Le Corguillé, Gildas; Monsoor, Misharl; Landi, Marion; Pericard, Pierre; Pétéra, Mélanie; Duperier, Christophe; Tremblay-Franco, Marie; Martin, Jean-François; Jacob, Daniel; Goulitquer, Sophie; Thévenot, Etienne A; Caron, Christophe

2015-05-01

The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). contact@workflow4metabolomics.org. © The Author 2014. Published by Oxford University Press.
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Mid-year report FY17 Q2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth D.; Pugmire, David; Rogers, David

The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Year-end report FY17.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth D.; Pugmire, David; Rogers, David

The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem. Mid-year report FY16 Q2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth D.; Sewell, Christopher; Childs, Hank

The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less
XVis: Visualization for the Extreme-Scale Scientific-Computation Ecosystem: Year-end report FY15 Q4.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth D.; Sewell, Christopher; Childs, Hank

The XVis project brings together the key elements of research to enable scientific discovery at extreme scale. Scientific computing will no longer be purely about how fast computations can be performed. Energy constraints, processor changes, and I/O limitations necessitate significant changes in both the software applications used in scientific computation and the ways in which scientists use them. Components for modeling, simulation, analysis, and visualization must work together in a computational ecosystem, rather than working independently as they have in the past. This project provides the necessary research and infrastructure for scientific discovery in this new computational ecosystem by addressingmore » four interlocking challenges: emerging processor technology, in situ integration, usability, and proxy analysis.« less

Effective Team Support: From Modeling to Software Agents

NASA Technical Reports Server (NTRS)

Remington, Roger W. (Technical Monitor); John, Bonnie; Sycara, Katia

2003-01-01

The purpose of this research contract was to perform multidisciplinary research between CMU psychologists, computer scientists and engineers and NASA researchers to design a next generation collaborative system to support a team of human experts and intelligent agents. To achieve robust performance enhancement of such a system, we had proposed to perform task and cognitive modeling to thoroughly understand the impact technology makes on the organization and on key individual personnel. Guided by cognitively-inspired requirements, we would then develop software agents that support the human team in decision making, information filtering, information distribution and integration to enhance team situational awareness. During the period covered by this final report, we made substantial progress in modeling infrastructure and task infrastructure. Work is continuing under a different contract to complete empirical data collection, cognitive modeling, and the building of software agents to support the teams task.
Waggle: A Framework for Intelligent Attentive Sensing and Actuation

NASA Astrophysics Data System (ADS)

Sankaran, R.; Jacob, R. L.; Beckman, P. H.; Catlett, C. E.; Keahey, K.

2014-12-01

Advances in sensor-driven computation and computationally steered sensing will greatly enable future research in fields including environmental and atmospheric sciences. We will present "Waggle," an open-source hardware and software infrastructure developed with two goals: (1) reducing the separation and latency between sensing and computing and (2) improving the reliability and longevity of sensing-actuation platforms in challenging and costly deployments. Inspired by "deep-space probe" systems, the Waggle platform design includes features that can support longitudinal studies, deployments with varying communication links, and remote management capabilities. Waggle lowers the barrier for scientists to incorporate real-time data from their sensors into their computations and to manipulate the sensors or provide feedback through actuators. A standardized software and hardware design allows quick addition of new sensors/actuators and associated software in the nodes and enables them to be coupled with computational codes both insitu and on external compute infrastructure. The Waggle framework currently drives the deployment of two observational systems - a portable and self-sufficient weather platform for study of small-scale effects in Chicago's urban core and an open-ended distributed instrument in Chicago that aims to support several research pursuits across a broad range of disciplines including urban planning, microbiology and computer science. Built around open-source software, hardware, and Linux OS, the Waggle system comprises two components - the Waggle field-node and Waggle cloud-computing infrastructure. Waggle field-node affords a modular, scalable, fault-tolerant, secure, and extensible platform for hosting sensors and actuators in the field. It supports insitu computation and data storage, and integration with cloud-computing infrastructure. The Waggle cloud infrastructure is designed with the goal of scaling to several hundreds of thousands of Waggle nodes. It supports aggregating data from sensors hosted by the nodes, staging computation, relaying feedback to the nodes and serving data to end-users. We will discuss the Waggle design principles and their applicability to various observational research pursuits, and demonstrate its capabilities.
Pilots 2.0: DIRAC pilots for all the skies

NASA Astrophysics Data System (ADS)

Stagni, F.; Tsaregorodtsev, A.; McNab, A.; Luzzi, C.

2015-12-01

In the last few years, new types of computing infrastructures, such as IAAS (Infrastructure as a Service) and IAAC (Infrastructure as a Client), gained popularity. New resources may come as part of pledged resources, while others are opportunistic. Most of these new infrastructures are based on virtualization techniques. Meanwhile, some concepts, such as distributed queues, lost appeal, while still supporting a vast amount of resources. Virtual Organizations are therefore facing heterogeneity of the available resources and the use of an Interware software like DIRAC to hide the diversity of underlying resources has become essential. The DIRAC WMS is based on the concept of pilot jobs that was introduced back in 2004. A pilot is what creates the possibility to run jobs on a worker node. Within DIRAC, we developed a new generation of pilot jobs, that we dubbed Pilots 2.0. Pilots 2.0 are not tied to a specific infrastructure; rather they are generic, fully configurable and extendible pilots. A Pilot 2.0 can be sent, as a script to be run, or it can be fetched from a remote location. A pilot 2.0 can run on every computing resource, e.g.: on CREAM Computing elements, on DIRAC Computing elements, on Virtual Machines as part of the contextualization script, or IAAC resources, provided that these machines are properly configured, hiding all the details of the Worker Nodes (WNs) infrastructure. Pilots 2.0 can be generated server and client side. Pilots 2.0 are the “pilots to fly in all the skies”, aiming at easy use of computing power, in whatever form it is presented. Another aim is the unification and simplification of the monitoring infrastructure for all kinds of computing resources, by using pilots as a network of distributed sensors coordinated by a central resource monitoring system. Pilots 2.0 have been developed using the command pattern. VOs using DIRAC can tune pilots 2.0 as they need, and extend or replace each and every pilot command in an easy way. In this paper we describe how Pilots 2.0 work with distributed and heterogeneous resources providing the necessary abstraction to deal with different kind of computing resources.
Evolution of the Virtualized HPC Infrastructure of Novosibirsk Scientific Center

NASA Astrophysics Data System (ADS)

Adakin, A.; Anisenkov, A.; Belov, S.; Chubarov, D.; Kalyuzhny, V.; Kaplin, V.; Korol, A.; Kuchin, N.; Lomakin, S.; Nikultsev, V.; Skovpen, K.; Sukharev, A.; Zaytsev, A.

2012-12-01

Novosibirsk Scientific Center (NSC), also known worldwide as Akademgorodok, is one of the largest Russian scientific centers hosting Novosibirsk State University (NSU) and more than 35 research organizations of the Siberian Branch of Russian Academy of Sciences including Budker Institute of Nuclear Physics (BINP), Institute of Computational Technologies, and Institute of Computational Mathematics and Mathematical Geophysics (ICM&MG). Since each institute has specific requirements on the architecture of computing farms involved in its research field, currently we've got several computing facilities hosted by NSC institutes, each optimized for a particular set of tasks, of which the largest are the NSU Supercomputer Center, Siberian Supercomputer Center (ICM&MG), and a Grid Computing Facility of BINP. A dedicated optical network with the initial bandwidth of 10 Gb/s connecting these three facilities was built in order to make it possible to share the computing resources among the research communities, thus increasing the efficiency of operating the existing computing facilities and offering a common platform for building the computing infrastructure for future scientific projects. Unification of the computing infrastructure is achieved by extensive use of virtualization technology based on XEN and KVM platforms. This contribution gives a thorough review of the present status and future development prospects for the NSC virtualized computing infrastructure and the experience gained while using it for running production data analysis jobs related to HEP experiments being carried out at BINP, especially the KEDR detector experiment at the VEPP-4M electron-positron collider.
The ARES High-level Intermediate Representation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moss, Nicholas David

The LLVM intermediate representation (IR) lacks semantic constructs for depicting common high-performance operations such as parallel and concurrent execution, communication and synchronization. Currently, representing such semantics in LLVM requires either extending the intermediate form (a signi cant undertaking) or the use of ad hoc indirect means such as encoding them as intrinsics and/or the use of metadata constructs. In this paper we discuss a work in progress to explore the design and implementation of a new compilation stage and associated high-level intermediate form that is placed between the abstract syntax tree and when it is lowered to LLVM's IR. Thismore » highlevel representation is a superset of LLVM IR and supports the direct representation of these common parallel computing constructs along with the infrastructure for supporting analysis and transformation passes on this representation.« less
Corridor One:An Integrated Distance Visualization Enuronments for SSI+ASCI Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Christopher R. Johnson, Charles D. Hansen

2001-10-29

The goal of Corridor One: An Integrated Distance Visualization Environment for ASCI and SSI Application was to combine the forces of six leading edge laboratories working in the areas of visualization and distributed computing and high performance networking (Argonne National Laboratory, Lawrence Berkeley National Laboratory, Los Alamos National Laboratory, University of Illinois, University of Utah and Princeton University) to develop and deploy the most advanced integrated distance visualization environment for large-scale scientific visualization and demonstrate it on applications relevant to the DOE SSI and ASCI programs. The Corridor One team brought world class expertise in parallel rendering, deep image basedmore » rendering, immersive environment technology, large-format multi-projector wall based displays, volume and surface visualization algorithms, collaboration tools and streaming media technology, network protocols for image transmission, high-performance networking, quality of service technology and distributed computing middleware. Our strategy was to build on the very successful teams that produced the I-WAY, ''Computational Grids'' and CAVE technology and to add these to the teams that have developed the fastest parallel visualizations systems and the most widely used networking infrastructure for multicast and distributed media. Unfortunately, just as we were getting going on the Corridor One project, DOE cut the program after the first year. As such, our final report consists of our progress during year one of the grant.« less
Cloud computing can simplify HIT infrastructure management.

PubMed

Glaser, John

2011-08-01

Software as a Service (SaaS), built on cloud computing technology, is emerging as the forerunner in IT infrastructure because it helps healthcare providers reduce capital investments. Cloud computing leads to predictable, monthly, fixed operating expenses for hospital IT staff. Outsourced cloud computing facilities are state-of-the-art data centers boasting some of the most sophisticated networking equipment on the market. The SaaS model helps hospitals safeguard against technology obsolescence, minimizes maintenance requirements, and simplifies management.
Bringing the CMS distributed computing system into scalable operations

NASA Astrophysics Data System (ADS)

Belforte, S.; Fanfani, A.; Fisk, I.; Flix, J.; Hernández, J. M.; Kress, T.; Letts, J.; Magini, N.; Miccio, V.; Sciabà, A.

2010-04-01

Establishing efficient and scalable operations of the CMS distributed computing system critically relies on the proper integration, commissioning and scale testing of the data and workload management tools, the various computing workflows and the underlying computing infrastructure, located at more than 50 computing centres worldwide and interconnected by the Worldwide LHC Computing Grid. Computing challenges periodically undertaken by CMS in the past years with increasing scale and complexity have revealed the need for a sustained effort on computing integration and commissioning activities. The Processing and Data Access (PADA) Task Force was established at the beginning of 2008 within the CMS Computing Program with the mandate of validating the infrastructure for organized processing and user analysis including the sites and the workload and data management tools, validating the distributed production system by performing functionality, reliability and scale tests, helping sites to commission, configure and optimize the networking and storage through scale testing data transfers and data processing, and improving the efficiency of accessing data across the CMS computing system from global transfers to local access. This contribution reports on the tools and procedures developed by CMS for computing commissioning and scale testing as well as the improvements accomplished towards efficient, reliable and scalable computing operations. The activities include the development and operation of load generators for job submission and data transfers with the aim of stressing the experiment and Grid data management and workload management systems, site commissioning procedures and tools to monitor and improve site availability and reliability, as well as activities targeted to the commissioning of the distributed production, user analysis and monitoring systems.
Cloud computing for comparative genomics

PubMed Central

2010-01-01

Background Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. Results We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. Conclusions The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems. PMID:20482786
Cloud computing for comparative genomics.

PubMed

Wall, Dennis P; Kudtarkar, Parul; Fusaro, Vincent A; Pivovarov, Rimma; Patil, Prasad; Tonellato, Peter J

2010-05-18

Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.
Progress in Machine Learning Studies for the CMS Computing Infrastructure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bonacorsi, Daniele; Kuznetsov, Valentin; Magini, Nicolo

Here, computing systems for LHC experiments developed together with Grids worldwide. While a complete description of the original Grid-based infrastructure and services for LHC experiments and its recent evolutions can be found elsewhere, it is worth to mention here the scale of the computing resources needed to fulfill the needs of LHC experiments in Run-1 and Run-2 so far.
Progress in Machine Learning Studies for the CMS Computing Infrastructure

DOE PAGES

Bonacorsi, Daniele; Kuznetsov, Valentin; Magini, Nicolo; ...

2017-12-06

Here, computing systems for LHC experiments developed together with Grids worldwide. While a complete description of the original Grid-based infrastructure and services for LHC experiments and its recent evolutions can be found elsewhere, it is worth to mention here the scale of the computing resources needed to fulfill the needs of LHC experiments in Run-1 and Run-2 so far.
The NASA Science Internet: An integrated approach to networking

NASA Technical Reports Server (NTRS)

Rounds, Fred

1991-01-01

An integrated approach to building a networking infrastructure is an absolute necessity for meeting the multidisciplinary science networking requirements of the Office of Space Science and Applications (OSSA) science community. These networking requirements include communication connectivity between computational resources, databases, and library systems, as well as to other scientists and researchers around the world. A consolidated networking approach allows strategic use of the existing science networking within the Federal government, and it provides networking capability that takes into consideration national and international trends towards multivendor and multiprotocol service. It also offers a practical vehicle for optimizing costs and maximizing performance. Finally, and perhaps most important to the development of high speed computing is that an integrated network constitutes a focus for phasing to the National Research and Education Network (NREN). The NASA Science Internet (NSI) program, established in mid 1988, is structured to provide just such an integrated network. A description of the NSI is presented.
Jflow: a workflow management system for web applications.

PubMed

Mariette, Jérôme; Escudié, Frédéric; Bardou, Philippe; Nabihoudine, Ibouniyamine; Noirot, Céline; Trotard, Marie-Stéphane; Gaspin, Christine; Klopp, Christophe

2016-02-01

Biologists produce large data sets and are in demand of rich and simple web portals in which they can upload and analyze their files. Providing such tools requires to mask the complexity induced by the needed High Performance Computing (HPC) environment. The connection between interface and computing infrastructure is usually specific to each portal. With Jflow, we introduce a Workflow Management System (WMS), composed of jQuery plug-ins which can easily be embedded in any web application and a Python library providing all requested features to setup, run and monitor workflows. Jflow is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/jflow. The package is coming with full documentation, quick start and a running test portal. Jerome.Mariette@toulouse.inra.fr. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Diamond Eye: a distributed architecture for image data mining

NASA Astrophysics Data System (ADS)

Burl, Michael C.; Fowlkes, Charless; Roden, Joe; Stechert, Andre; Mukhtar, Saleem

1999-02-01

Diamond Eye is a distributed software architecture, which enables users (scientists) to analyze large image collections by interacting with one or more custom data mining servers via a Java applet interface. Each server is coupled with an object-oriented database and a computational engine, such as a network of high-performance workstations. The database provides persistent storage and supports querying of the 'mined' information. The computational engine provides parallel execution of expensive image processing, object recognition, and query-by-content operations. Key benefits of the Diamond Eye architecture are: (1) the design promotes trial evaluation of advanced data mining and machine learning techniques by potential new users (all that is required is to point a web browser to the appropriate URL), (2) software infrastructure that is common across a range of science mining applications is factored out and reused, and (3) the system facilitates closer collaborations between algorithm developers and domain experts.
The OptIPuter microscopy demonstrator: enabling science through a transatlantic lightpath

PubMed Central

Ellisman, M.; Hutton, T.; Kirkland, A.; Lin, A.; Lin, C.; Molina, T.; Peltier, S.; Singh, R.; Tang, K.; Trefethen, A.E.; Wallom, D.C.H.; Xiong, X.

2009-01-01

The OptIPuter microscopy demonstrator project has been designed to enable concurrent and remote usage of world-class electron microscopes located in Oxford and San Diego. The project has constructed a network consisting of microscopes and computational and data resources that are all connected by a dedicated network infrastructure using the UK Lightpath and US Starlight systems. Key science drivers include examples from both materials and biological science. The resulting system is now a permanent link between the Oxford and San Diego microscopy centres. This will form the basis of further projects between the sites and expansion of the types of systems that can be remotely controlled, including optical, as well as electron, microscopy. Other improvements will include the updating of the Microsoft cluster software to the high performance computing (HPC) server 2008, which includes the HPC basic profile implementation that will enable the development of interoperable clients. PMID:19487201
The OptIPuter microscopy demonstrator: enabling science through a transatlantic lightpath.

PubMed

Ellisman, M; Hutton, T; Kirkland, A; Lin, A; Lin, C; Molina, T; Peltier, S; Singh, R; Tang, K; Trefethen, A E; Wallom, D C H; Xiong, X

2009-07-13

The OptIPuter microscopy demonstrator project has been designed to enable concurrent and remote usage of world-class electron microscopes located in Oxford and San Diego. The project has constructed a network consisting of microscopes and computational and data resources that are all connected by a dedicated network infrastructure using the UK Lightpath and US Starlight systems. Key science drivers include examples from both materials and biological science. The resulting system is now a permanent link between the Oxford and San Diego microscopy centres. This will form the basis of further projects between the sites and expansion of the types of systems that can be remotely controlled, including optical, as well as electron, microscopy. Other improvements will include the updating of the Microsoft cluster software to the high performance computing (HPC) server 2008, which includes the HPC basic profile implementation that will enable the development of interoperable clients.
A Comprehensive and Cost-Effective Computer Infrastructure for K-12 Schools

NASA Technical Reports Server (NTRS)

Warren, G. P.; Seaton, J. M.

1996-01-01

Since 1993, NASA Langley Research Center has been developing and implementing a low-cost Internet connection model, including system architecture, training, and support, to provide Internet access for an entire network of computers. This infrastructure allows local area networks which exceed 50 machines per school to independently access the complete functionality of the Internet by connecting to a central site, using state-of-the-art commercial modem technology, through a single standard telephone line. By locating high-cost resources at this central site and sharing these resources and their costs among the school districts throughout a region, a practical, efficient, and affordable infrastructure for providing scale-able Internet connectivity has been developed. As the demand for faster Internet access grows, the model has a simple expansion path that eliminates the need to replace major system components and re-train personnel. Observations of optical Internet usage within an environment, particularly school classrooms, have shown that after an initial period of 'surfing,' the Internet traffic becomes repetitive. By automatically storing requested Internet information on a high-capacity networked disk drive at the local site (network based disk caching), then updating this information only when it changes, well over 80 percent of the Internet traffic that leaves a location can be eliminated by retrieving the information from the local disk cache.
Secure Large-Scale Airport Simulations Using Distributed Computational Resources

NASA Technical Reports Server (NTRS)

McDermott, William J.; Maluf, David A.; Gawdiak, Yuri; Tran, Peter; Clancy, Dan (Technical Monitor)

2001-01-01

To fully conduct research that will support the far-term concepts, technologies and methods required to improve the safety of Air Transportation a simulation environment of the requisite degree of fidelity must first be in place. The Virtual National Airspace Simulation (VNAS) will provide the underlying infrastructure necessary for such a simulation system. Aerospace-specific knowledge management services such as intelligent data-integration middleware will support the management of information associated with this complex and critically important operational environment. This simulation environment, in conjunction with a distributed network of supercomputers, and high-speed network connections to aircraft, and to Federal Aviation Administration (FAA), airline and other data-sources will provide the capability to continuously monitor and measure operational performance against expected performance. The VNAS will also provide the tools to use this performance baseline to obtain a perspective of what is happening today and of the potential impact of proposed changes before they are introduced into the system.
e-Infrastructures for e-Sciences 2013 A CHAIN-REDS Workshop organised under the aegis of the European Commission

NASA Astrophysics Data System (ADS)

The CHAIN-REDS Project is organising a workshop on "e-Infrastructures for e-Sciences" focusing on Cloud Computing and Data Repositories under the aegis of the European Commission and in co-location with the International Conference on e-Science 2013 (IEEE2013) that will be held in Beijing, P.R. of China on October 17-22, 2013. The core objective of the CHAIN-REDS project is to promote, coordinate and support the effort of a critical mass of non-European e-Infrastructures for Research and Education to collaborate with Europe addressing interoperability and interoperation of Grids and other Distributed Computing Infrastructures (DCI). From this perspective, CHAIN-REDS will optimise the interoperation of European infrastructures with those present in 6 other regions of the world, both from a development and use point of view, and catering to different communities. Overall, CHAIN-REDS will provide input for future strategies and decision-making regarding collaboration with other regions on e-Infrastructure deployment and availability of related data; it will raise the visibility of e-Infrastructures towards intercontinental audiences, covering most of the world and will provide support to establish globally connected and interoperable infrastructures, in particular between the EU and the developing regions. Organised by IHEP, INFN and Sigma Orionis with the support of all project partners, this workshop will aim at: - Presenting the state of the art of Cloud computing in Europe and in China and discussing the opportunities offered by having interoperable and federated e-Infrastructures; - Exploring the existing initiatives of Data Infrastructures in Europe and China, and highlighting the Data Repositories of interest for the Virtual Research Communities in several domains such as Health, Agriculture, Climate, etc.

Cultural and Technological Issues and Solutions for Geodynamics Software Citation

NASA Astrophysics Data System (ADS)

Heien, E. M.; Hwang, L.; Fish, A. E.; Smith, M.; Dumit, J.; Kellogg, L. H.

2014-12-01

Computational software and custom-written codes play a key role in scientific research and teaching, providing tools to perform data analysis and forward modeling through numerical computation. However, development of these codes is often hampered by the fact that there is no well-defined way for the authors to receive credit or professional recognition for their work through the standard methods of scientific publication and subsequent citation of the work. This in turn may discourage researchers from publishing their codes or making them easier for other scientists to use. We investigate the issues involved in citing software in a scientific context, and introduce features that should be components of a citation infrastructure, particularly oriented towards the codes and scientific culture in the area of geodynamics research. The codes used in geodynamics are primarily specialized numerical modeling codes for continuum mechanics problems; they may be developed by individual researchers, teams of researchers, geophysicists in collaboration with computational scientists and applied mathematicians, or by coordinated community efforts such as the Computational Infrastructure for Geodynamics. Some but not all geodynamics codes are open-source. These characteristics are common to many areas of geophysical software development and use. We provide background on the problem of software citation and discuss some of the barriers preventing adoption of such citations, including social/cultural barriers, insufficient technological support infrastructure, and an overall lack of agreement about what a software citation should consist of. We suggest solutions in an initial effort to create a system to support citation of software and promotion of scientific software development.
Waste IPSC : Thermal-Hydrologic-Chemical-Mechanical (THCM) modeling and simulation.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Freeze, Geoffrey A.; Wang, Yifeng; Arguello, Jose Guadalupe, Jr.

2010-10-01

Waste IPSC Objective is to develop an integrated suite of high performance computing capabilities to simulate radionuclide movement through the engineered components and geosphere of a radioactive waste storage or disposal system: (1) with robust thermal-hydrologic-chemical-mechanical (THCM) coupling; (2) for a range of disposal system alternatives (concepts, waste form types, engineered designs, geologic settings); (3) for long time scales and associated large uncertainties; (4) at multiple model fidelities (sub-continuum, high-fidelity continuum, PA); and (5) in accordance with V&V and software quality requirements. THCM Modeling collaborates with: (1) Other Waste IPSC activities: Sub-Continuum Processes (and FMM), Frameworks and Infrastructure (and VU,more » ECT, and CT); (2) Waste Form Campaign; (3) Used Fuel Disposition (UFD) Campaign; and (4) ASCEM.« less
Defense Strategies for Asymmetric Networked Systems with Discrete Components.

PubMed

Rao, Nageswara S V; Ma, Chris Y T; Hausken, Kjell; He, Fei; Yau, David K Y; Zhuang, Jun

2018-05-03

We consider infrastructures consisting of a network of systems, each composed of discrete components. The network provides the vital connectivity between the systems and hence plays a critical, asymmetric role in the infrastructure operations. The individual components of the systems can be attacked by cyber and physical means and can be appropriately reinforced to withstand these attacks. We formulate the problem of ensuring the infrastructure performance as a game between an attacker and a provider, who choose the numbers of the components of the systems and network to attack and reinforce, respectively. The costs and benefits of attacks and reinforcements are characterized using the sum-form, product-form and composite utility functions, each composed of a survival probability term and a component cost term. We present a two-level characterization of the correlations within the infrastructure: (i) the aggregate failure correlation function specifies the infrastructure failure probability given the failure of an individual system or network, and (ii) the survival probabilities of the systems and network satisfy first-order differential conditions that capture the component-level correlations using multiplier functions. We derive Nash equilibrium conditions that provide expressions for individual system survival probabilities and also the expected infrastructure capacity specified by the total number of operational components. We apply these results to derive and analyze defense strategies for distributed cloud computing infrastructures using cyber-physical models.
Defense Strategies for Asymmetric Networked Systems with Discrete Components

PubMed Central

Rao, Nageswara S. V.; Ma, Chris Y. T.; Hausken, Kjell; He, Fei; Yau, David K. Y.

2018-01-01

We consider infrastructures consisting of a network of systems, each composed of discrete components. The network provides the vital connectivity between the systems and hence plays a critical, asymmetric role in the infrastructure operations. The individual components of the systems can be attacked by cyber and physical means and can be appropriately reinforced to withstand these attacks. We formulate the problem of ensuring the infrastructure performance as a game between an attacker and a provider, who choose the numbers of the components of the systems and network to attack and reinforce, respectively. The costs and benefits of attacks and reinforcements are characterized using the sum-form, product-form and composite utility functions, each composed of a survival probability term and a component cost term. We present a two-level characterization of the correlations within the infrastructure: (i) the aggregate failure correlation function specifies the infrastructure failure probability given the failure of an individual system or network, and (ii) the survival probabilities of the systems and network satisfy first-order differential conditions that capture the component-level correlations using multiplier functions. We derive Nash equilibrium conditions that provide expressions for individual system survival probabilities and also the expected infrastructure capacity specified by the total number of operational components. We apply these results to derive and analyze defense strategies for distributed cloud computing infrastructures using cyber-physical models. PMID:29751588
Critical Infrastructure Protection II, The International Federation for Information Processing, Volume 290.

NASA Astrophysics Data System (ADS)

Papa, Mauricio; Shenoi, Sujeet

The information infrastructure -- comprising computers, embedded devices, networks and software systems -- is vital to day-to-day operations in every sector: information and telecommunications, banking and finance, energy, chemicals and hazardous materials, agriculture, food, water, public health, emergency services, transportation, postal and shipping, government and defense. Global business and industry, governments, indeed society itself, cannot function effectively if major components of the critical information infrastructure are degraded, disabled or destroyed. Critical Infrastructure Protection II describes original research results and innovative applications in the interdisciplinary field of critical infrastructure protection. Also, it highlights the importance of weaving science, technology and policy in crafting sophisticated, yet practical, solutions that will help secure information, computer and network assets in the various critical infrastructure sectors. Areas of coverage include: - Themes and Issues - Infrastructure Security - Control Systems Security - Security Strategies - Infrastructure Interdependencies - Infrastructure Modeling and Simulation This book is the second volume in the annual series produced by the International Federation for Information Processing (IFIP) Working Group 11.10 on Critical Infrastructure Protection, an international community of scientists, engineers, practitioners and policy makers dedicated to advancing research, development and implementation efforts focused on infrastructure protection. The book contains a selection of twenty edited papers from the Second Annual IFIP WG 11.10 International Conference on Critical Infrastructure Protection held at George Mason University, Arlington, Virginia, USA in the spring of 2008.
DOE Office of Scientific and Technical Information (OSTI.GOV)

McCaskey, Alexander J.

Hybrid programming models for beyond-CMOS technologies will prove critical for integrating new computing technologies alongside our existing infrastructure. Unfortunately the software infrastructure required to enable this is lacking or not available. XACC is a programming framework for extreme-scale, post-exascale accelerator architectures that integrates alongside existing conventional applications. It is a pluggable framework for programming languages developed for next-gen computing hardware architectures like quantum and neuromorphic computing. It lets computational scientists efficiently off-load classically intractable work to attached accelerators through user-friendly Kernel definitions. XACC makes post-exascale hybrid programming approachable for domain computational scientists.
Machine learning patterns for neuroimaging-genetic studies in the cloud.

PubMed

Da Mota, Benoit; Tudoran, Radu; Costan, Alexandru; Varoquaux, Gaël; Brasche, Goetz; Conrod, Patricia; Lemaitre, Herve; Paus, Tomas; Rietschel, Marcella; Frouin, Vincent; Poline, Jean-Baptiste; Antoniu, Gabriel; Thirion, Bertrand

2014-01-01

Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with increasingly sophisticated techniques and represents a great computational challenge. Fortunately, increasing computational power in distributed architectures can be harnessed, if new neuroinformatics infrastructures are designed and training to use these new tools is provided. Combining a MapReduce framework (TomusBLOB) with machine learning algorithms (Scikit-learn library), we design a scalable analysis tool that can deal with non-parametric statistics on high-dimensional data. End-users describe the statistical procedure to perform and can then test the model on their own computers before running the very same code in the cloud at a larger scale. We illustrate the potential of our approach on real data with an experiment showing how the functional signal in subcortical brain regions can be significantly fit with genome-wide genotypes. This experiment demonstrates the scalability and the reliability of our framework in the cloud with a 2 weeks deployment on hundreds of virtual machines.
Robonaut's Flexible Information Technology Infrastructure

NASA Technical Reports Server (NTRS)

Askew, Scott; Bluethmann, William; Alder, Ken; Ambrose, Robert

2003-01-01

Robonaut, NASA's humanoid robot, is designed to work as both an astronaut assistant and, in certain situations, an astronaut surrogate. This highly dexterous robot performs complex tasks under telepresence control that could previously only be carried out directly by humans. Currently with 47 degrees of freedom (DOF), Robonaut is a state-of-the-art human size telemanipulator system. while many of Robonaut's embedded components have been custom designed to meet packaging or environmental requirements, the primary computing systems used in Robonaut are currently commercial-off-the-shelf (COTS) products which have some correlation to flight qualified computer systems. This loose coupling of information technology (IT) resources allows Robonaut to exploit cost effective solutions while floating the technology base to take advantage of the rapid pace of IT advances. These IT systems utilize a software development environment, which is both compatible with COTS hardware as well as flight proven computing systems, preserving the majority of software development for a flight system. The ability to use highly integrated and flexible COTS software development tools improves productivity while minimizing redesign for a space flight system. Further, the flexibility of Robonaut's software and communication architecture has allowed it to become a widely used distributed development testbed for integrating new capabilities and furthering experimental research.
Software Reuse Methods to Improve Technological Infrastructure for e-Science

NASA Technical Reports Server (NTRS)

Marshall, James J.; Downs, Robert R.; Mattmann, Chris A.

2011-01-01

Social computing has the potential to contribute to scientific research. Ongoing developments in information and communications technology improve capabilities for enabling scientific research, including research fostered by social computing capabilities. The recent emergence of e-Science practices has demonstrated the benefits from improvements in the technological infrastructure, or cyber-infrastructure, that has been developed to support science. Cloud computing is one example of this e-Science trend. Our own work in the area of software reuse offers methods that can be used to improve new technological development, including cloud computing capabilities, to support scientific research practices. In this paper, we focus on software reuse and its potential to contribute to the development and evaluation of information systems and related services designed to support new capabilities for conducting scientific research.
Parallel Processing of Images in Mobile Devices using BOINC

NASA Astrophysics Data System (ADS)

Curiel, Mariela; Calle, David F.; Santamaría, Alfredo S.; Suarez, David F.; Flórez, Leonardo

2018-04-01

Medical image processing helps health professionals make decisions for the diagnosis and treatment of patients. Since some algorithms for processing images require substantial amounts of resources, one could take advantage of distributed or parallel computing. A mobile grid can be an adequate computing infrastructure for this problem. A mobile grid is a grid that includes mobile devices as resource providers. In a previous step of this research, we selected BOINC as the infrastructure to build our mobile grid. However, parallel processing of images in mobile devices poses at least two important challenges: the execution of standard libraries for processing images and obtaining adequate performance when compared to desktop computers grids. By the time we started our research, the use of BOINC in mobile devices also involved two issues: a) the execution of programs in mobile devices required to modify the code to insert calls to the BOINC API, and b) the division of the image among the mobile devices as well as its merging required additional code in some BOINC components. This article presents answers to these four challenges.
Tackling some of the most intricate geophysical challenges via high-performance computing

NASA Astrophysics Data System (ADS)

Khosronejad, A.

2016-12-01

Recently, world has been witnessing significant enhancements in computing power of supercomputers. Computer clusters in conjunction with the advanced mathematical algorithms has set the stage for developing and applying powerful numerical tools to tackle some of the most intricate geophysical challenges that today`s engineers face. One such challenge is to understand how turbulent flows, in real-world settings, interact with (a) rigid and/or mobile complex bed bathymetry of waterways and sea-beds in the coastal areas; (b) objects with complex geometry that are fully or partially immersed; and (c) free-surface of waterways and water surface waves in the coastal area. This understanding is especially important because the turbulent flows in real-world environments are often bounded by geometrically complex boundaries, which dynamically deform and give rise to multi-scale and multi-physics transport phenomena, and characterized by multi-lateral interactions among various phases (e.g. air/water/sediment phases). Herein, I present some of the multi-scale and multi-physics geophysical fluid mechanics processes that I have attempted to study using an in-house high-performance computational model, the so-called VFS-Geophysics. More specifically, I will present the simulation results of turbulence/sediment/solute/turbine interactions in real-world settings. Parts of the simulations I present are performed to gain scientific insights into the processes such as sand wave formation (A. Khosronejad, and F. Sotiropoulos, (2014), Numerical simulation of sand waves in a turbulent open channel flow, Journal of Fluid Mechanics, 753:150-216), while others are carried out to predict the effects of climate change and large flood events on societal infrastructures ( A. Khosronejad, et al., (2016), Large eddy simulation of turbulence and solute transport in a forested headwater stream, Journal of Geophysical Research:, doi: 10.1002/2014JF003423).
Beating the tyranny of scale with a private cloud configured for Big Data

NASA Astrophysics Data System (ADS)

Lawrence, Bryan; Bennett, Victoria; Churchill, Jonathan; Juckes, Martin; Kershaw, Philip; Pepler, Sam; Pritchard, Matt; Stephens, Ag

2015-04-01

The Joint Analysis System, JASMIN, consists of a five significant hardware components: a batch computing cluster, a hypervisor cluster, bulk disk storage, high performance disk storage, and access to a tape robot. Each of the computing clusters consists of a heterogeneous set of servers, supporting a range of possible data analysis tasks - and a unique network environment makes it relatively trivial to migrate servers between the two clusters. The high performance disk storage will include the world's largest (publicly visible) deployment of the Panasas parallel disk system. Initially deployed in April 2012, JASMIN has already undergone two major upgrades, culminating in a system which by April 2015, will have in excess of 16 PB of disk and 4000 cores. Layered on the basic hardware are a range of services, ranging from managed services, such as the curated archives of the Centre for Environmental Data Archival or the data analysis environment for the National Centres for Atmospheric Science and Earth Observation, to a generic Infrastructure as a Service (IaaS) offering for the UK environmental science community. Here we present examples of some of the big data workloads being supported in this environment - ranging from data management tasks, such as checksumming 3 PB of data held in over one hundred million files, to science tasks, such as re-processing satellite observations with new algorithms, or calculating new diagnostics on petascale climate simulation outputs. We will demonstrate how the provision of a cloud environment closely coupled to a batch computing environment, all sharing the same high performance disk system allows massively parallel processing without the necessity to shuffle data excessively - even as it supports many different virtual communities, each with guaranteed performance. We will discuss the advantages of having a heterogeneous range of servers with available memory from tens of GB at the low end to (currently) two TB at the high end. There are some limitations of the JASMIN environment, the high performance disk environment is not fully available in the IaaS environment, and a planned ability to burst compute heavy jobs into the public cloud is not yet fully available. There are load balancing and performance issues that need to be understood. We will conclude with projections for future usage, and our plans to meet those requirements.
Performance Studies on Distributed Virtual Screening

PubMed Central

Krüger, Jens; de la Garza, Luis; Kohlbacher, Oliver; Nagel, Wolfgang E.

2014-01-01

Virtual high-throughput screening (vHTS) is an invaluable method in modern drug discovery. It permits screening large datasets or databases of chemical structures for those structures binding possibly to a drug target. Virtual screening is typically performed by docking code, which often runs sequentially. Processing of huge vHTS datasets can be parallelized by chunking the data because individual docking runs are independent of each other. The goal of this work is to find an optimal splitting maximizing the speedup while considering overhead and available cores on Distributed Computing Infrastructures (DCIs). We have conducted thorough performance studies accounting not only for the runtime of the docking itself, but also for structure preparation. Performance studies were conducted via the workflow-enabled science gateway MoSGrid (Molecular Simulation Grid). As input we used benchmark datasets for protein kinases. Our performance studies show that docking workflows can be made to scale almost linearly up to 500 concurrent processes distributed even over large DCIs, thus accelerating vHTS campaigns significantly. PMID:25032219
Infrastructure for the Global Village.

ERIC Educational Resources Information Center

Gore, Al

1991-01-01

Technologies that enhance the ability to create and understand information have led to changes in the traditional legal concepts of property, ownership, originality, privacy, and intellectual freedom. The importance of government addressing such issues and investing in development of a high-capacity computer network is discussed. (KR)
PGAS in-memory data processing for the Processing Unit of the Upgraded Electronics of the Tile Calorimeter of the ATLAS Detector

NASA Astrophysics Data System (ADS)

Ohene-Kwofie, Daniel; Otoo, Ekow

2015-10-01

The ATLAS detector, operated at the Large Hadron Collider (LHC) records proton-proton collisions at CERN every 50ns resulting in a sustained data flow up to PB/s. The upgraded Tile Calorimeter of the ATLAS experiment will sustain about 5PB/s of digital throughput. These massive data rates require extremely fast data capture and processing. Although there has been a steady increase in the processing speed of CPU/GPGPU assembled for high performance computing, the rate of data input and output, even under parallel I/O, has not kept up with the general increase in computing speeds. The problem then is whether one can implement an I/O subsystem infrastructure capable of meeting the computational speeds of the advanced computing systems at the petascale and exascale level. We propose a system architecture that leverages the Partitioned Global Address Space (PGAS) model of computing to maintain an in-memory data-store for the Processing Unit (PU) of the upgraded electronics of the Tile Calorimeter which is proposed to be used as a high throughput general purpose co-processor to the sROD of the upgraded Tile Calorimeter. The physical memory of the PUs are aggregated into a large global logical address space using RDMA- capable interconnects such as PCI- Express to enhance data processing throughput.
JINR cloud infrastructure evolution

NASA Astrophysics Data System (ADS)

Baranov, A. V.; Balashov, N. A.; Kutovskiy, N. A.; Semenov, R. N.

2016-09-01

To fulfil JINR commitments in different national and international projects related to the use of modern information technologies such as cloud and grid computing as well as to provide a modern tool for JINR users for their scientific research a cloud infrastructure was deployed at Laboratory of Information Technologies of Joint Institute for Nuclear Research. OpenNebula software was chosen as a cloud platform. Initially it was set up in simple configuration with single front-end host and a few cloud nodes. Some custom development was done to tune JINR cloud installation to fit local needs: web form in the cloud web-interface for resources request, a menu item with cloud utilization statistics, user authentication via Kerberos, custom driver for OpenVZ containers. Because of high demand in that cloud service and its resources over-utilization it was re-designed to cover increasing users' needs in capacity, availability and reliability. Recently a new cloud instance has been deployed in high-availability configuration with distributed network file system and additional computing power.
Introducing high performance distributed logging service for ACS

NASA Astrophysics Data System (ADS)

Avarias, Jorge A.; López, Joao S.; Maureira, Cristián; Sommer, Heiko; Chiozzi, Gianluca

2010-07-01

The ALMA Common Software (ACS) is a software framework that provides the infrastructure for the Atacama Large Millimeter Array and other projects. ACS, based on CORBA, offers basic services and common design patterns for distributed software. Every properly built system needs to be able to log status and error information. Logging in a single computer scenario can be as easy as using fprintf statements. However, in a distributed system, it must provide a way to centralize all logging data in a single place without overloading the network nor complicating the applications. ACS provides a complete logging service infrastructure in which every log has an associated priority and timestamp, allowing filtering at different levels of the system (application, service and clients). Currently the ACS logging service uses an implementation of the CORBA Telecom Log Service in a customized way, using only a minimal subset of the features provided by the standard. The most relevant feature used by ACS is the ability to treat the logs as event data that gets distributed over the network in a publisher-subscriber paradigm. For this purpose the CORBA Notification Service, which is resource intensive, is used. On the other hand, the Data Distribution Service (DDS) provides an alternative standard for publisher-subscriber communication for real-time systems, offering better performance and featuring decentralized message processing. The current document describes how the new high performance logging service of ACS has been modeled and developed using DDS, replacing the Telecom Log Service. Benefits and drawbacks are analyzed. A benchmark is presented comparing the differences between the implementations.
State of Technology for Rehabilitation of Water Distribution Systems

EPA Science Inventory

The impact that the lack of investment in water infrastructure will have on the performance of aging underground infrastructure over time is well documented and the needed funding estimates range as high as $325 billion over the next 20 years. With the current annual replacement...
Elastic Cloud Computing Architecture and System for Heterogeneous Spatiotemporal Computing

NASA Astrophysics Data System (ADS)

Shi, X.

2017-10-01

Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs), while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC) or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA) may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.
S3DB core: a framework for RDF generation and management in bioinformatics infrastructures

PubMed Central

2010-01-01

Background Biomedical research is set to greatly benefit from the use of semantic web technologies in the design of computational infrastructure. However, beyond well defined research initiatives, substantial issues of data heterogeneity, source distribution, and privacy currently stand in the way towards the personalization of Medicine. Results A computational framework for bioinformatic infrastructure was designed to deal with the heterogeneous data sources and the sensitive mixture of public and private data that characterizes the biomedical domain. This framework consists of a logical model build with semantic web tools, coupled with a Markov process that propagates user operator states. An accompanying open source prototype was developed to meet a series of applications that range from collaborative multi-institution data acquisition efforts to data analysis applications that need to quickly traverse complex data structures. This report describes the two abstractions underlying the S3DB-based infrastructure, logical and numerical, and discusses its generality beyond the immediate confines of existing implementations. Conclusions The emergence of the "web as a computer" requires a formal model for the different functionalities involved in reading and writing to it. The S3DB core model proposed was found to address the design criteria of biomedical computational infrastructure, such as those supporting large scale multi-investigator research, clinical trials, and molecular epidemiology. PMID:20646315

A PACS archive architecture supported on cloud services.

PubMed

Silva, Luís A Bastião; Costa, Carlos; Oliveira, José Luis

2012-05-01

Diagnostic imaging procedures have continuously increased over the last decade and this trend may continue in coming years, creating a great impact on storage and retrieval capabilities of current PACS. Moreover, many smaller centers do not have financial resources or requirements that justify the acquisition of a traditional infrastructure. Alternative solutions, such as cloud computing, may help address this emerging need. A tremendous amount of ubiquitous computational power, such as that provided by Google and Amazon, are used every day as a normal commodity. Taking advantage of this new paradigm, an architecture for a Cloud-based PACS archive that provides data privacy, integrity, and availability is proposed. The solution is independent from the cloud provider and the core modules were successfully instantiated in examples of two cloud computing providers. Operational metrics for several medical imaging modalities were tabulated and compared for Google Storage, Amazon S3, and LAN PACS. A PACS-as-a-Service archive that provides storage of medical studies using the Cloud was developed. The results show that the solution is robust and that it is possible to store, query, and retrieve all desired studies in a similar way as in a local PACS approach. Cloud computing is an emerging solution that promises high scalability of infrastructures, software, and applications, according to a "pay-as-you-go" business model. The presented architecture uses the cloud to setup medical data repositories and can have a significant impact on healthcare institutions by reducing IT infrastructures.
An infrastructure for the integration of geoscience instruments and sensors on the Grid

NASA Astrophysics Data System (ADS)

Pugliese, R.; Prica, M.; Kourousias, G.; Del Linz, A.; Curri, A.

2009-04-01

The Grid, as a computing paradigm, has long been in the attention of both academia and industry[1]. The distributed and expandable nature of its general architecture result to scalability and more efficient utilisation of the computing infrastructures. The scientific community, including that of geosciences, often handles problems with very high requirements in data processing, transferring, and storing[2,3]. This has raised the interest on Grid technologies but these are often viewed solely as an access gateway to HPC. Suitable Grid infrastructures could provide the geoscience community with additional benefits like those of sharing, remote access and control of scientific systems. These systems can be scientific instruments, sensors, robots, cameras and any other device used in geosciences. The solution for practical, general, and feasible Grid-enabling of such devices requires non-intrusive extensions on core parts of the current Grid architecture. We propose an extended version of an architecture[4] that can serve as the solution to the problem. The solution we propose is called Grid Instrument Element (IE) [5]. It is an addition to the existing core Grid parts; the Computing Element (CE) and the Storage Element (SE) that serve the purposes that their name suggests. The IE that we will be referring to, and the related technologies have been developed in the EU project on the Deployment of Remote Instrumentation Infrastructure (DORII1). In DORII, partners of various scientific communities including those of Earthquake, Environmental science, and Experimental science, have adopted the technology of the Instrument Element in order to integrate to the Grid their devices. The Oceanographic and coastal observation and modelling Mediterranean Ocean Observing Network (OGS2), a DORII partner, is in the process of deploying the above mentioned Grid technologies on two types of observational modules: Argo profiling floats and a novel Autonomous Underwater Vehicle (AUV). In this paper i) we define the need for integration of instrumentation in the Grid, ii) we introduce the solution of the Instrument Element, iii) we demonstrate a suitable end-user web portal for accessing Grid resources, iv) we describe from the Grid-technological point of view the process of the integration to the Grid of two advanced environmental monitoring devices. References [1] M. Surridge, S. Taylor, D. De Roure, and E. Zaluska, "Experiences with GRIA—Industrial Applications on a Web Services Grid," e-Science and Grid Computing, First International Conference on e-Science and Grid Computing, 2005, pp. 98-105. [2] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke, "The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets," Journal of Network and Computer Applications, vol. 23, 2000, pp. 187-200. [3] B. Allcock, J. Bester, J. Bresnahan, A.L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke, "Data management and transfer in high-performance computational grid environments," Parallel Computing, vol. 28, 2002, pp. 749-771. [4] E. Frizziero, M. Gulmini, F. Lelli, G. Maron, A. Oh, S. Orlando, A. Petrucci, S. Squizzato, and S. Traldi, "Instrument Element: A New Grid component that Enables the Control of Remote Instrumentation," Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)-Volume 00, IEEE Computer Society Washington, DC, USA, 2006. [5] R. Ranon, L. De Marco, A. Senerchia, S. Gabrielli, L. Chittaro, R. Pugliese, L. Del Cano, F. Asnicar, and M. Prica, "A Web-based Tool for Collaborative Access to Scientific Instruments in Cyberinfrastructures." 1 The DORII project is supported by the European Commission within the 7th Framework Programme (FP7/2007-2013) under grant agreement no. RI-213110. URL: http://www.dorii.eu 2 Istituto Nazionale di Oceanografia e di Geofisica Sperimentale. URL: http://www.ogs.trieste.it
The Czech National Grid Infrastructure

NASA Astrophysics Data System (ADS)

Chudoba, J.; Křenková, I.; Mulač, M.; Ruda, M.; Sitera, J.

2017-10-01

The Czech National Grid Infrastructure is operated by MetaCentrum, a CESNET department responsible for coordinating and managing activities related to distributed computing. CESNET as the Czech National Research and Education Network (NREN) provides many e-infrastructure services, which are used by 94% of the scientific and research community in the Czech Republic. Computing and storage resources owned by different organizations are connected by fast enough network to provide transparent access to all resources. We describe in more detail the computing infrastructure, which is based on several different technologies and covers grid, cloud and map-reduce environment. While the largest part of CPUs is still accessible via distributed torque servers, providing environment for long batch jobs, part of infrastructure is available via standard EGI tools in EGI, subset of NGI resources is provided into EGI FedCloud environment with cloud interface and there is also Hadoop cluster provided by the same e-infrastructure.A broad spectrum of computing servers is offered; users can choose from standard 2 CPU servers to large SMP machines with up to 6 TB of RAM or servers with GPU cards. Different groups have different priorities on various resources, resource owners can even have an exclusive access. The software is distributed via AFS. Storage servers offering up to tens of terabytes of disk space to individual users are connected via NFS4 on top of GPFS and access to long term HSM storage with peta-byte capacity is also provided. Overview of available resources and recent statistics of usage will be given.
Computational Infrastructure for Geodynamics (CIG)

NASA Astrophysics Data System (ADS)

Gurnis, M.; Kellogg, L. H.; Bloxham, J.; Hager, B. H.; Spiegelman, M.; Willett, S.; Wysession, M. E.; Aivazis, M.

2004-12-01

Solid earth geophysicists have a long tradition of writing scientific software to address a wide range of problems. In particular, computer simulations came into wide use in geophysics during the decade after the plate tectonic revolution. Solution schemes and numerical algorithms that developed in other areas of science, most notably engineering, fluid mechanics, and physics, were adapted with considerable success to geophysics. This software has largely been the product of individual efforts and although this approach has proven successful, its strength for solving problems of interest is now starting to show its limitations as we try to share codes and algorithms or when we want to recombine codes in novel ways to produce new science. With funding from the NSF, the US community has embarked on a Computational Infrastructure for Geodynamics (CIG) that will develop, support, and disseminate community-accessible software for the greater geodynamics community from model developers to end-users. The software is being developed for problems involving mantle and core dynamics, crustal and earthquake dynamics, magma migration, seismology, and other related topics. With a high level of community participation, CIG is leveraging state-of-the-art scientific computing into a suite of open-source tools and codes. The infrastructure that we are now starting to develop will consist of: (a) a coordinated effort to develop reusable, well-documented and open-source geodynamics software; (b) the basic building blocks - an infrastructure layer - of software by which state-of-the-art modeling codes can be quickly assembled; (c) extension of existing software frameworks to interlink multiple codes and data through a superstructure layer; (d) strategic partnerships with the larger world of computational science and geoinformatics; and (e) specialized training and workshops for both the geodynamics and broader Earth science communities. The CIG initiative has already started to leverage and develop long-term strategic partnerships with open source development efforts within the larger thrusts of scientific computing and geoinformatics. These strategic partnerships are essential as the frontier has moved into multi-scale and multi-physics problems in which many investigators now want to use simulation software for data interpretation, data assimilation, and hypothesis testing.
An infrastructure with a unified control plane to integrate IP into optical metro networks to provide flexible and intelligent bandwidth on demand for cloud computing

NASA Astrophysics Data System (ADS)

Yang, Wei; Hall, Trevor

2012-12-01

The Internet is entering an era of cloud computing to provide more cost effective, eco-friendly and reliable services to consumer and business users and the nature of the Internet traffic will undertake a fundamental transformation. Consequently, the current Internet will no longer suffice for serving cloud traffic in metro areas. This work proposes an infrastructure with a unified control plane that integrates simple packet aggregation technology with optical express through the interoperation between IP routers and electrical traffic controllers in optical metro networks. The proposed infrastructure provides flexible, intelligent, and eco-friendly bandwidth on demand for cloud computing in metro areas.
EUROPLANET-RI modelling service for the planetary science community: European Modelling and Data Analysis Facility (EMDAF)

NASA Astrophysics Data System (ADS)

Khodachenko, Maxim; Miller, Steven; Stoeckler, Robert; Topf, Florian

2010-05-01

Computational modeling and observational data analysis are two major aspects of the modern scientific research. Both appear nowadays under extensive development and application. Many of the scientific goals of planetary space missions require robust models of planetary objects and environments as well as efficient data analysis algorithms, to predict conditions for mission planning and to interpret the experimental data. Europe has great strength in these areas, but it is insufficiently coordinated; individual groups, models, techniques and algorithms need to be coupled and integrated. Existing level of scientific cooperation and the technical capabilities for operative communication, allow considerable progress in the development of a distributed international Research Infrastructure (RI) which is based on the existing in Europe computational modelling and data analysis centers, providing the scientific community with dedicated services in the fields of their computational and data analysis expertise. These services will appear as a product of the collaborative communication and joint research efforts of the numerical and data analysis experts together with planetary scientists. The major goal of the EUROPLANET-RI / EMDAF is to make computational models and data analysis algorithms associated with particular national RIs and teams, as well as their outputs, more readily available to their potential user community and more tailored to scientific user requirements, without compromising front-line specialized research on model and data analysis algorithms development and software implementation. This objective will be met through four keys subdivisions/tasks of EMAF: 1) an Interactive Catalogue of Planetary Models; 2) a Distributed Planetary Modelling Laboratory; 3) a Distributed Data Analysis Laboratory, and 4) enabling Models and Routines for High Performance Computing Grids. Using the advantages of the coordinated operation and efficient communication between the involved computational modelling, research and data analysis expert teams and their related research infrastructures, EMDAF will provide a 1) flexible, 2) scientific user oriented, 3) continuously developing and fast upgrading computational and data analysis service to support and intensify the European planetary scientific research. At the beginning EMDAF will create a set of demonstrators and operational tests of this service in key areas of European planetary science. This work will aim at the following objectives: (a) Development and implementation of tools for distant interactive communication between the planetary scientists and computing experts (including related RIs); (b) Development of standard routine packages, and user-friendly interfaces for operation of the existing numerical codes and data analysis algorithms by the specialized planetary scientists; (c) Development of a prototype of numerical modelling services "on demand" for space missions and planetary researchers; (d) Development of a prototype of data analysis services "on demand" for space missions and planetary researchers; (e) Development of a prototype of coordinated interconnected simulations of planetary phenomena and objects (global multi-model simulators); (f) Providing the demonstrators of a coordinated use of high performance computing facilities (super-computer networks), done in cooperation with European HPC Grid DEISA.
Autonomous Robotic Inspection in Tunnels

NASA Astrophysics Data System (ADS)

Protopapadakis, E.; Stentoumis, C.; Doulamis, N.; Doulamis, A.; Loupos, K.; Makantasis, K.; Kopsiaftis, G.; Amditis, A.

2016-06-01

In this paper, an automatic robotic inspector for tunnel assessment is presented. The proposed platform is able to autonomously navigate within the civil infrastructures, grab stereo images and process/analyse them, in order to identify defect types. At first, there is the crack detection via deep learning approaches. Then, a detailed 3D model of the cracked area is created, utilizing photogrammetric methods. Finally, a laser profiling of the tunnel's lining, for a narrow region close to detected crack is performed; allowing for the deduction of potential deformations. The robotic platform consists of an autonomous mobile vehicle; a crane arm, guided by the computer vision-based crack detector, carrying ultrasound sensors, the stereo cameras and the laser scanner. Visual inspection is based on convolutional neural networks, which support the creation of high-level discriminative features for complex non-linear pattern classification. Then, real-time 3D information is accurately calculated and the crack position and orientation is passed to the robotic platform. The entire system has been evaluated in railway and road tunnels, i.e. in Egnatia Highway and London underground infrastructure.
Belle II grid computing: An overview of the distributed data management system.

NASA Astrophysics Data System (ADS)

Bansal, Vikas; Schram, Malachi; Belle Collaboration, II

2017-01-01

The Belle II experiment at the SuperKEKB collider in Tsukuba, Japan, will start physics data taking in 2018 and will accumulate 50/ab of e +e- collision data, about 50 times larger than the data set of the Belle experiment. The computing requirements of Belle II are comparable to those of a Run I LHC experiment. Computing at this scale requires efficient use of the compute grids in North America, Asia and Europe and will take advantage of upgrades to the high-speed global network. We present the architecture of data flow and data handling as a part of the Belle II computing infrastructure.
INDIGO-DataCloud solutions for Earth Sciences

NASA Astrophysics Data System (ADS)

Aguilar Gómez, Fernando; de Lucas, Jesús Marco; Fiore, Sandro; Monna, Stephen; Chen, Yin

2017-04-01

INDIGO-DataCloud (https://www.indigo-datacloud.eu/) is a European Commission funded project aiming to develop a data and computing platform targeting scientific communities, deployable on multiple hardware and provisioned over hybrid (private or public) e-infrastructures. The development of INDIGO solutions covers the different layers in cloud computing (IaaS, PaaS, SaaS), and provides tools to exploit resources like HPC or GPGPUs. INDIGO is oriented to support European Scientific research communities, that are well represented in the project. Twelve different Case Studies have been analyzed in detail from different fields: Biological & Medical sciences, Social sciences & Humanities, Environmental and Earth sciences and Physics & Astrophysics. INDIGO-DataCloud provides solutions to emerging challenges in Earth Science like: -Enabling an easy deployment of community services at different cloud sites. Many Earth Science research infrastructures often involve distributed observation stations across countries, and also have distributed data centers to support the corresponding data acquisition and curation. There is a need to easily deploy new data center services while the research infrastructure continuous spans. As an example: LifeWatch (ESFRI, Ecosystems and Biodiversity) uses INDIGO solutions to manage the deployment of services to perform complex hydrodynamics and water quality modelling over a Cloud Computing environment, predicting algae blooms, using the Docker technology: TOSCA requirement description, Docker repository, Orchestrator for deployment, AAI (AuthN, AuthZ) and OneData (Distributed Storage System). -Supporting Big Data Analysis. Nowadays, many Earth Science research communities produce large amounts of data and and are challenged by the difficulties of processing and analysing it. A climate models intercomparison data analysis case study for the European Network for Earth System Modelling (ENES) community has been setup, based on the Ophidia big data analysis framework and the Kepler workflow management system. Such services normally involve a large and distributed set of data and computing resources. In this regard, this case study exploits the INDIGO PaaS for a flexible and dynamic allocation of the resources at the infrastructural level. -Providing Distributed Data Storage Solutions. In order to allow scientific communities to perform heavy computation on huge datasets, INDIGO provides global data access solutions allowing researchers to access data in a distributed environment like fashion regardless of its location, and also to publish and share their research results with public or close communities. INDIGO solutions that support the access to distributed data storage (OneData) are being tested on EMSO infrastructure (Ocean Sciences and Geohazards) data. Another aspect of interest for the EMSO community is in efficient data processing by exploiting INDIGO services like PaaS Orchestrator. Further, for HPC exploitation, a new solution named Udocker has been implemented, enabling users to execute docker containers in supercomputers, without requiring administration privileges. This presentation will overview INDIGO solutions that are interesting and useful for Earth science communities and will show how they can be applied to other Case Studies.
Conditions for Ubiquitous Computing: What Can Be Learned from a Longitudinal Study

ERIC Educational Resources Information Center

Lei, Jing

2010-01-01

Based on survey data and interview data collected over four academic years, this longitudinal study examined how a ubiquitous computing project evolved along with the changes in teachers, students, the human infrastructure, and technology infrastructure in the school. This study also investigated what conditions were necessary for successful…
Reconfiguring practice: the interdependence of experimental procedure and computing infrastructure in distributed earthquake engineering.

PubMed

De La Flor, Grace; Ojaghi, Mobin; Martínez, Ignacio Lamata; Jirotka, Marina; Williams, Martin S; Blakeborough, Anthony

2010-09-13

When transitioning local laboratory practices into distributed environments, the interdependent relationship between experimental procedure and the technologies used to execute experiments becomes highly visible and a focal point for system requirements. We present an analysis of ways in which this reciprocal relationship is reconfiguring laboratory practices in earthquake engineering as a new computing infrastructure is embedded within three laboratories in order to facilitate the execution of shared experiments across geographically distributed sites. The system has been developed as part of the UK Network for Earthquake Engineering Simulation e-Research project, which links together three earthquake engineering laboratories at the universities of Bristol, Cambridge and Oxford. We consider the ways in which researchers have successfully adapted their local laboratory practices through the modification of experimental procedure so that they may meet the challenges of coordinating distributed earthquake experiments.
OOI CyberInfrastructure - Next Generation Oceanographic Research

NASA Astrophysics Data System (ADS)

Farcas, C.; Fox, P.; Arrott, M.; Farcas, E.; Klacansky, I.; Krueger, I.; Meisinger, M.; Orcutt, J.

2008-12-01

Software has become a key enabling technology for scientific discovery, observation, modeling, and exploitation of natural phenomena. New value emerges from the integration of individual subsystems into networked federations of capabilities exposed to the scientific community. Such data-intensive interoperability networks are crucial for future scientific collaborative research, as they open up new ways of fusing data from different sources and across various domains, and analysis on wide geographic areas. The recently established NSF OOI program, through its CyberInfrastructure component addresses this challenge by providing broad access from sensor networks for data acquisition up to computational grids for massive computations and binding infrastructure facilitating policy management and governance of the emerging system-of-scientific-systems. We provide insight into the integration core of this effort, namely, a hierarchic service-oriented architecture for a robust, performant, and maintainable implementation. We first discuss the relationship between data management and CI crosscutting concerns such as identity management, policy and governance, which define the organizational contexts for data access and usage. Next, we detail critical services including data ingestion, transformation, preservation, inventory, and presentation. To address interoperability issues between data represented in various formats we employ a semantic framework derived from the Earth System Grid technology, a canonical representation for scientific data based on DAP/OPeNDAP, and related data publishers such as ERDDAP. Finally, we briefly present the underlying transport based on a messaging infrastructure over the AMQP protocol, and the preservation based on a distributed file system through SDSC iRODS.
A Short History of Performance Assessment: Lessons Learned.

ERIC Educational Resources Information Center

Madaus, George F.; O'Dwyer, Laura M.

1999-01-01

Places performance assessment in the context of high-stakes uses, describes underlying technologies, and outlines the history of performance testing from 210 B.C.E. to the present. Historical issues of fairness, efficiency, cost, and infrastructure influence contemporary efforts to use performance assessments in large-scale, high-stakes testing…
Standard requirements for GCP-compliant data management in multinational clinical trials

PubMed Central

2011-01-01

Background A recent survey has shown that data management in clinical trials performed by academic trial units still faces many difficulties (e.g. heterogeneity of software products, deficits in quality management, limited human and financial resources and the complexity of running a local computer centre). Unfortunately, no specific, practical and open standard for both GCP-compliant data management and the underlying IT-infrastructure is available to improve the situation. For that reason the "Working Group on Data Centres" of the European Clinical Research Infrastructures Network (ECRIN) has developed a standard specifying the requirements for high quality GCP-compliant data management in multinational clinical trials. Methods International, European and national regulations and guidelines relevant to GCP, data security and IT infrastructures, as well as ECRIN documents produced previously, were evaluated to provide a starting point for the development of standard requirements. The requirements were produced by expert consensus of the ECRIN Working group on Data Centres, using a structured and standardised process. The requirements were divided into two main parts: an IT part covering standards for the underlying IT infrastructure and computer systems in general, and a Data Management (DM) part covering requirements for data management applications in clinical trials. Results The standard developed includes 115 IT requirements, split into 15 separate sections, 107 DM requirements (in 12 sections) and 13 other requirements (2 sections). Sections IT01 to IT05 deal with the basic IT infrastructure while IT06 and IT07 cover validation and local software development. IT08 to IT015 concern the aspects of IT systems that directly support clinical trial management. Sections DM01 to DM03 cover the implementation of a specific clinical data management application, i.e. for a specific trial, whilst DM04 to DM12 address the data management of trials across the unit. Section IN01 is dedicated to international aspects and ST01 to the competence of a trials unit's staff. Conclusions The standard is intended to provide an open and widely used set of requirements for GCP-compliant data management, particularly in academic trial units. It is the intention that ECRIN will use these requirements as the basis for the certification of ECRIN data centres. PMID:21426576
Use of Emerging Grid Computing Technologies for the Analysis of LIGO Data

NASA Astrophysics Data System (ADS)

Koranda, Scott

2004-03-01

The LIGO Scientific Collaboration (LSC) today faces the challenge of enabling analysis of terabytes of LIGO data by hundreds of scientists from institutions all around the world. To meet this challenge the LSC is developing tools, infrastructure, applications, and expertise leveraging Grid Computing technologies available today, and making available to LSC scientists compute resources at sites across the United States and Europe. We use digital credentials for strong and secure authentication and authorization to compute resources and data. Building on top of products from the Globus project for high-speed data transfer and information discovery we have created the Lightweight Data Replicator (LDR) to securely and robustly replicate data to resource sites. We have deployed at our computing sites the Virtual Data Toolkit (VDT) Server and Client packages, developed in collaboration with our partners in the GriPhyN and iVDGL projects, providing uniform access to distributed resources for users and their applications. Taken together these Grid Computing technologies and infrastructure have formed the LSC DataGrid--a coherent and uniform environment across two continents for the analysis of gravitational-wave detector data. Much work, however, remains in order to scale current analyses and recent lessons learned need to be integrated into the next generation of Grid middleware.
Bond Behavior of Reinforcing Steel in Ultra-High Performance Concrete

DOT National Transportation Integrated Search

2014-11-01

Ultra-high performance concrete (UHPC) has garnered interest from the highway infrastructure community for its greatly enhanced mechanical and durability properties. The objective of this research is to extensively evaluate the factors that affect bo...
DRIHM: Distributed Research Infrastructure for Hydro-Meteorology

NASA Astrophysics Data System (ADS)

Parodi, A.; Rebora, N.; Kranzlmueller, D.; Schiffers, M.; Clematis, A.; Tafferner, A.; Garrote, L. M.; Llasat Botija, M.; Caumont, O.; Richard, E.; Cros, P.; Dimitrijevic, V.; Jagers, B.; Harpham, Q.; Hooper, R. P.

2012-12-01

Hydro-Meteorology Research (HMR) is an area of critical scientific importance and of high societal relevance. It plays a key role in guiding predictions relevant to the safety and prosperity of humans and ecosystems from highly urbanized areas, to coastal zones, and to agricultural landscapes. Of special interest and urgency within HMR is the problem of understanding and predicting the impacts of severe hydro-meteorological events, such as flash-floods and landslides in complex orography areas, on humans and the environment, under the incoming climate change effects. At the heart of this challenge lies the ability to have easy access to hydrometeorological data and models, and facilitate the collaboration between meteorologists, hydrologists, and Earth science experts for accelerated scientific advances in this field. To face these problems the DRIHM (Distributed Research Infrastructure for Hydro-Meteorology) project is developing a prototype e-Science environment to facilitate this collaboration and provide end-to-end HMR services (models, datasets and post-processing tools) at the European level, with the ability to expand to global scale (e.g. cooperation with Earth Cube related initiatives). The objectives of DRIHM are to lead the definition of a common long-term strategy, to foster the development of new HMR models and observational archives for the study of severe hydrometeorological events, to promote the execution and analysis of high-end simulations, and to support the dissemination of predictive models as decision analysis tools. DRIHM combines the European expertise in HMR, in Grid and High Performance Computing (HPC). Joint research activities will improve the efficient use of the European e-Infrastructures, notably Grid and HPC, for HMR modelling and observational databases, model evaluation tool sets and access to HMR model results. Networking activities will disseminate DRIHM results at the European and global levels in order to increase the cohesion of European and possibly worldwide HMR communities and increase the awareness of ICT potential for HMR. Service activities will deploy the end-to-end DRIHM services and tools in support of HMR networks and virtual organizations on top of the existing European e-Infrastructures.
Proceedings Second Annual Cyber Security and Information Infrastructure Research Workshop

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sheldon, Frederick T; Krings, Axel; Yoo, Seong-Moo

2006-01-01

The workshop theme is Cyber Security: Beyond the Maginot Line Recently the FBI reported that computer crime has skyrocketed costing over $67 billion in 2005 alone and affecting 2.8M+ businesses and organizations. Attack sophistication is unprecedented along with availability of open source concomitant tools. Private, academic, and public sectors invest significant resources in cyber security. Industry primarily performs cyber security research as an investment in future products and services. While the public sector also funds cyber security R&D, the majority of this activity focuses on the specific mission(s) of the funding agency. Thus, broad areas of cyber security remain neglectedmore » or underdeveloped. Consequently, this workshop endeavors to explore issues involving cyber security and related technologies toward strengthening such areas and enabling the development of new tools and methods for securing our information infrastructure critical assets. We aim to assemble new ideas and proposals about robust models on which we can build the architecture of a secure cyberspace including but not limited to: * Knowledge discovery and management * Critical infrastructure protection * De-obfuscating tools for the validation and verification of tamper-proofed software * Computer network defense technologies * Scalable information assurance strategies * Assessment-driven design for trust * Security metrics and testing methodologies * Validation of security and survivability properties * Threat assessment and risk analysis * Early accurate detection of the insider threat * Security hardened sensor networks and ubiquitous computing environments * Mobile software authentication protocols * A new "model" of the threat to replace the "Maginot Line" model and more . . .« less
POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph

NASA Astrophysics Data System (ADS)

Poat, M. D.; Lauret, J.; Betts, W.

2015-12-01

The STAR online computing infrastructure has become an intensive dynamic system used for first-hand data collection and analysis resulting in a dense collection of data output. As we have transitioned to our current state, inefficient, limited storage systems have become an impediment to fast feedback to online shift crews. Motivation for a centrally accessible, scalable and redundant distributed storage system had become a necessity in this environment. OpenStack Swift Object Storage and Ceph Object Storage are two eye-opening technologies as community use and development have led to success elsewhere. In this contribution, OpenStack Swift and Ceph have been put to the test with single and parallel I/O tests, emulating real world scenarios for data processing and workflows. The Ceph file system storage, offering a POSIX compliant file system mounted similarly to an NFS share was of particular interest as it aligned with our requirements and was retained as our solution. I/O performance tests were run against the Ceph POSIX file system and have presented surprising results indicating true potential for fast I/O and reliability. STAR'S online compute farm historical use has been for job submission and first hand data analysis. The goal of reusing the online compute farm to maintain a storage cluster and job submission will be an efficient use of the current infrastructure.
Automatically generated code for relativistic inhomogeneous cosmologies

NASA Astrophysics Data System (ADS)

Bentivegna, Eloisa

2017-02-01

The applications of numerical relativity to cosmology are on the rise, contributing insight into such cosmological problems as structure formation, primordial phase transitions, gravitational-wave generation, and inflation. In this paper, I present the infrastructure for the computation of inhomogeneous dust cosmologies which was used recently to measure the effect of nonlinear inhomogeneity on the cosmic expansion rate. I illustrate the code's architecture, provide evidence for its correctness in a number of familiar cosmological settings, and evaluate its parallel performance for grids of up to several billion points. The code, which is available as free software, is based on the Einstein Toolkit infrastructure, and in particular leverages the automated code generation capabilities provided by its component Kranc.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.