large computing resources: Topics by Science.gov

Sample records for large computing resources

Flexible services for the support of research.

PubMed

Turilli, Matteo; Wallom, David; Williams, Chris; Gough, Steve; Curran, Neal; Tarrant, Richard; Bretherton, Dan; Powell, Andy; Johnson, Matt; Harmer, Terry; Wright, Peter; Gordon, John

2013-01-28

Cloud computing has been increasingly adopted by users and providers to promote a flexible, scalable and tailored access to computing resources. Nonetheless, the consolidation of this paradigm has uncovered some of its limitations. Initially devised by corporations with direct control over large amounts of computational resources, cloud computing is now being endorsed by organizations with limited resources or with a more articulated, less direct control over these resources. The challenge for these organizations is to leverage the benefits of cloud computing while dealing with limited and often widely distributed computing resources. This study focuses on the adoption of cloud computing by higher education institutions and addresses two main issues: flexible and on-demand access to a large amount of storage resources, and scalability across a heterogeneous set of cloud infrastructures. The proposed solutions leverage a federated approach to cloud resources in which users access multiple and largely independent cloud infrastructures through a highly customizable broker layer. This approach allows for a uniform authentication and authorization infrastructure, a fine-grained policy specification and the aggregation of accounting and monitoring. Within a loosely coupled federation of cloud infrastructures, users can access vast amount of data without copying them across cloud infrastructures and can scale their resource provisions when the local cloud resources become insufficient.
Exploring Cloud Computing for Large-scale Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Guang; Han, Binh; Yin, Jian

This paper explores cloud computing for large-scale data-intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address thesemore » challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.« less
Experience in using commercial clouds in CMS

NASA Astrophysics Data System (ADS)

Bauerdick, L.; Bockelman, B.; Dykstra, D.; Fuess, S.; Garzoglio, G.; Girone, M.; Gutsche, O.; Holzman, B.; Hufnagel, D.; Kim, H.; Kennedy, R.; Mason, D.; Spentzouris, P.; Timm, S.; Tiradani, A.; Vaandering, E.; CMS Collaboration

2017-10-01

Historically high energy physics computing has been performed on large purpose-built computing systems. In the beginning there were single site computing facilities, which evolved into the Worldwide LHC Computing Grid (WLCG) used today. The vast majority of the WLCG resources are used for LHC computing and the resources are scheduled to be continuously used throughout the year. In the last several years there has been an explosion in capacity and capability of commercial and academic computing clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest amongst the cloud providers to demonstrate the capability to perform large scale scientific computing. In this presentation we will discuss results from the CMS experiment using the Fermilab HEPCloud Facility, which utilized both local Fermilab resources and Amazon Web Services (AWS). The goal was to work with AWS through a matching grant to demonstrate a sustained scale approximately equal to half of the worldwide processing resources available to CMS. We will discuss the planning and technical challenges involved in organizing the most IO intensive CMS workflows on a large-scale set of virtualized resource provisioned by the Fermilab HEPCloud. We will describe the data handling and data management challenges. Also, we will discuss the economic issues and cost and operational efficiency comparison to our dedicated resources. At the end we will consider the changes in the working model of HEP computing in a domain with the availability of large scale resources scheduled at peak times.
Experience in using commercial clouds in CMS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bauerdick, L.; Bockelman, B.; Dykstra, D.

Historically high energy physics computing has been performed on large purposebuilt computing systems. In the beginning there were single site computing facilities, which evolved into the Worldwide LHC Computing Grid (WLCG) used today. The vast majority of the WLCG resources are used for LHC computing and the resources are scheduled to be continuously used throughout the year. In the last several years there has been an explosion in capacity and capability of commercial and academic computing clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is amore » growing interest amongst the cloud providers to demonstrate the capability to perform large scale scientific computing. In this presentation we will discuss results from the CMS experiment using the Fermilab HEPCloud Facility, which utilized both local Fermilab resources and Amazon Web Services (AWS). The goal was to work with AWS through a matching grant to demonstrate a sustained scale approximately equal to half of the worldwide processing resources available to CMS. We will discuss the planning and technical challenges involved in organizing the most IO intensive CMS workflows on a large-scale set of virtualized resource provisioned by the Fermilab HEPCloud. We will describe the data handling and data management challenges. Also, we will discuss the economic issues and cost and operational efficiency comparison to our dedicated resources. At the end we will consider the changes in the working model of HEP computing in a domain with the availability of large scale resources scheduled at peak times.« less
Using Mosix for Wide-Area Compuational Resources

USGS Publications Warehouse

Maddox, Brian G.

2004-01-01

One of the problems with using traditional Beowulf-type distributed processing clusters is that they require an investment in dedicated computer resources. These resources are usually needed in addition to pre-existing ones such as desktop computers and file servers. Mosix is a series of modifications to the Linux kernel that creates a virtual computer, featuring automatic load balancing by migrating processes from heavily loaded nodes to less used ones. An extension of the Beowulf concept is to run a Mosixenabled Linux kernel on a large number of computer resources in an organization. This configuration would provide a very large amount of computational resources based on pre-existing equipment. The advantage of this method is that it provides much more processing power than a traditional Beowulf cluster without the added costs of dedicating resources.
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation

DOE PAGES

Holzman, Burt; Bauerdick, Lothar A. T.; Bockelman, Brian; ...

2017-09-29

Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized bothmore » local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.« less
HEPCloud, a New Paradigm for HEP Facilities: CMS Amazon Web Services Investigation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzman, Burt; Bauerdick, Lothar A. T.; Bockelman, Brian

Historically, high energy physics computing has been performed on large purpose-built computing systems. These began as single-site compute facilities, but have evolved into the distributed computing grids used today. Recently, there has been an exponential increase in the capacity and capability of commercial clouds. Cloud resources are highly virtualized and intended to be able to be flexibly deployed for a variety of computing tasks. There is a growing interest among the cloud providers to demonstrate the capability to perform large-scale scientific computing. In this paper, we discuss results from the CMS experiment using the Fermilab HEPCloud facility, which utilized bothmore » local Fermilab resources and virtual machines in the Amazon Web Services Elastic Compute Cloud. We discuss the planning, technical challenges, and lessons learned involved in performing physics workflows on a large-scale set of virtualized resources. Additionally, we will discuss the economics and operational efficiencies when executing workflows both in the cloud and on dedicated resources.« less
Parallel computing method for simulating hydrological processesof large rivers under climate change

NASA Astrophysics Data System (ADS)

Wang, H.; Chen, Y.

2016-12-01

Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
Using Amazon's Elastic Compute Cloud to dynamically scale CMS computational resources

NASA Astrophysics Data System (ADS)

Evans, D.; Fisk, I.; Holzman, B.; Melo, A.; Metson, S.; Pordes, R.; Sheldon, P.; Tiradani, A.

2011-12-01

Large international scientific collaborations such as the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider have traditionally addressed their data reduction and analysis needs by building and maintaining dedicated computational infrastructure. Emerging cloud computing services such as Amazon's Elastic Compute Cloud (EC2) offer short-term CPU and storage resources with costs based on usage. These services allow experiments to purchase computing resources as needed, without significant prior planning and without long term investments in facilities and their management. We have demonstrated that services such as EC2 can successfully be integrated into the production-computing model of CMS, and find that they work very well as worker nodes. The cost-structure and transient nature of EC2 services makes them inappropriate for some CMS production services and functions. We also found that the resources are not truely "on-demand" as limits and caps on usage are imposed. Our trial workflows allow us to make a cost comparison between EC2 resources and dedicated CMS resources at a University, and conclude that it is most cost effective to purchase dedicated resources for the "base-line" needs of experiments such as CMS. However, if the ability to use cloud computing resources is built into an experiment's software framework before demand requires their use, cloud computing resources make sense for bursting during times when spikes in usage are required.
The Computing and Data Grid Approach: Infrastructure for Distributed Science Applications

NASA Technical Reports Server (NTRS)

Johnston, William E.

2002-01-01

With the advent of Grids - infrastructure for using and managing widely distributed computing and data resources in the science environment - there is now an opportunity to provide a standard, large-scale, computing, data, instrument, and collaboration environment for science that spans many different projects and provides the required infrastructure and services in a relatively uniform and supportable way. Grid technology has evolved over the past several years to provide the services and infrastructure needed for building 'virtual' systems and organizations. We argue that Grid technology provides an excellent basis for the creation of the integrated environments that can combine the resources needed to support the large- scale science projects located at multiple laboratories and universities. We present some science case studies that indicate that a paradigm shift in the process of science will come about as a result of Grids providing transparent and secure access to advanced and integrated information and technologies infrastructure: powerful computing systems, large-scale data archives, scientific instruments, and collaboration tools. These changes will be in the form of services that can be integrated with the user's work environment, and that enable uniform and highly capable access to these computers, data, and instruments, regardless of the location or exact nature of these resources. These services will integrate transient-use resources like computing systems, scientific instruments, and data caches (e.g., as they are needed to perform a simulation or analyze data from a single experiment); persistent-use resources. such as databases, data catalogues, and archives, and; collaborators, whose involvement will continue for the lifetime of a project or longer. While we largely address large-scale science in this paper, Grids, particularly when combined with Web Services, will address a broad spectrum of science scenarios. both large and small scale.
Campus Computing Environment: University of Kentucky.

ERIC Educational Resources Information Center

CAUSE/EFFECT, 1989

1989-01-01

A dramatic growth in computing and communications was precipitated largely by the leadership of President David Roselle at the University of Kentucky. A new operational structure of information resource management includes not only computing (academic and administrative) and communications, instructional resources, and printing/mailing services,…
Framework Resources Multiply Computing Power

NASA Technical Reports Server (NTRS)

2010-01-01

As an early proponent of grid computing, Ames Research Center awarded Small Business Innovation Research (SBIR) funding to 3DGeo Development Inc., of Santa Clara, California, (now FusionGeo Inc., of The Woodlands, Texas) to demonstrate a virtual computer environment that linked geographically dispersed computer systems over the Internet to help solve large computational problems. By adding to an existing product, FusionGeo enabled access to resources for calculation- or data-intensive applications whenever and wherever they were needed. Commercially available as Accelerated Imaging and Modeling, the product is used by oil companies and seismic service companies, which require large processing and data storage capacities.
Contextuality as a Resource for Models of Quantum Computation with Qubits

NASA Astrophysics Data System (ADS)

Bermejo-Vega, Juan; Delfosse, Nicolas; Browne, Dan E.; Okay, Cihan; Raussendorf, Robert

2017-09-01

A central question in quantum computation is to identify the resources that are responsible for quantum speed-up. Quantum contextuality has been recently shown to be a resource for quantum computation with magic states for odd-prime dimensional qudits and two-dimensional systems with real wave functions. The phenomenon of state-independent contextuality poses a priori an obstruction to characterizing the case of regular qubits, the fundamental building block of quantum computation. Here, we establish contextuality of magic states as a necessary resource for a large class of quantum computation schemes on qubits. We illustrate our result with a concrete scheme related to measurement-based quantum computation.
NASA's Information Power Grid: Large Scale Distributed Computing and Data Management

NASA Technical Reports Server (NTRS)

Johnston, William E.; Vaziri, Arsi; Hinke, Tom; Tanner, Leigh Ann; Feiereisen, William J.; Thigpen, William; Tang, Harry (Technical Monitor)

2001-01-01

Large-scale science and engineering are done through the interaction of people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. The overall motivation for Grids is to facilitate the routine interactions of these resources in order to support large-scale science and engineering. Multi-disciplinary simulations provide a good example of a class of applications that are very likely to require aggregation of widely distributed computing, data, and intellectual resources. Such simulations - e.g. whole system aircraft simulation and whole system living cell simulation - require integrating applications and data that are developed by different teams of researchers frequently in different locations. The research team's are the only ones that have the expertise to maintain and improve the simulation code and/or the body of experimental data that drives the simulations. This results in an inherently distributed computing and data management environment.
Performance of distributed multiscale simulations

PubMed Central

Borgdorff, J.; Ben Belgacem, M.; Bona-Casas, C.; Fazendeiro, L.; Groen, D.; Hoenen, O.; Mizeranschi, A.; Suter, J. L.; Coster, D.; Coveney, P. V.; Dubitzky, W.; Hoekstra, A. G.; Strand, P.; Chopard, B.

2014-01-01

Multiscale simulations model phenomena across natural scales using monolithic or component-based code, running on local or distributed resources. In this work, we investigate the performance of distributed multiscale computing of component-based models, guided by six multiscale applications with different characteristics and from several disciplines. Three modes of distributed multiscale computing are identified: supplementing local dependencies with large-scale resources, load distribution over multiple resources, and load balancing of small- and large-scale resources. We find that the first mode has the apparent benefit of increasing simulation speed, and the second mode can increase simulation speed if local resources are limited. Depending on resource reservation and model coupling topology, the third mode may result in a reduction of resource consumption. PMID:24982258
Information Power Grid Posters

NASA Technical Reports Server (NTRS)

Vaziri, Arsi

2003-01-01

This document is a summary of the accomplishments of the Information Power Grid (IPG). Grids are an emerging technology that provide seamless and uniform access to the geographically dispersed, computational, data storage, networking, instruments, and software resources needed for solving large-scale scientific and engineering problems. The goal of the NASA IPG is to use NASA's remotely located computing and data system resources to build distributed systems that can address problems that are too large or complex for a single site. The accomplishments outlined in this poster presentation are: access to distributed data, IPG heterogeneous computing, integration of large-scale computing node into distributed environment, remote access to high data rate instruments,and exploratory grid environment.
New design for interfacing computers to the Octopus network

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sloan, L.J.

1977-03-14

The Lawrence Livermore Laboratory has several large-scale computers which are connected to the Octopus network. Several difficulties arise in providing adequate resources along with reliable performance. To alleviate some of these problems a new method of bringing large computers into the Octopus environment is proposed.
Stream-based Hebbian eigenfilter for real-time neuronal spike discrimination

PubMed Central

2012-01-01

Background Principal component analysis (PCA) has been widely employed for automatic neuronal spike sorting. Calculating principal components (PCs) is computationally expensive, and requires complex numerical operations and large memory resources. Substantial hardware resources are therefore needed for hardware implementations of PCA. General Hebbian algorithm (GHA) has been proposed for calculating PCs of neuronal spikes in our previous work, which eliminates the needs of computationally expensive covariance analysis and eigenvalue decomposition in conventional PCA algorithms. However, large memory resources are still inherently required for storing a large volume of aligned spikes for training PCs. The large size memory will consume large hardware resources and contribute significant power dissipation, which make GHA difficult to be implemented in portable or implantable multi-channel recording micro-systems. Method In this paper, we present a new algorithm for PCA-based spike sorting based on GHA, namely stream-based Hebbian eigenfilter, which eliminates the inherent memory requirements of GHA while keeping the accuracy of spike sorting by utilizing the pseudo-stationarity of neuronal spikes. Because of the reduction of large hardware storage requirements, the proposed algorithm can lead to ultra-low hardware resources and power consumption of hardware implementations, which is critical for the future multi-channel micro-systems. Both clinical and synthetic neural recording data sets were employed for evaluating the accuracy of the stream-based Hebbian eigenfilter. The performance of spike sorting using stream-based eigenfilter and the computational complexity of the eigenfilter were rigorously evaluated and compared with conventional PCA algorithms. Field programmable logic arrays (FPGAs) were employed to implement the proposed algorithm, evaluate the hardware implementations and demonstrate the reduction in both power consumption and hardware memories achieved by the streaming computing Results and discussion Results demonstrate that the stream-based eigenfilter can achieve the same accuracy and is 10 times more computationally efficient when compared with conventional PCA algorithms. Hardware evaluations show that 90.3% logic resources, 95.1% power consumption and 86.8% computing latency can be reduced by the stream-based eigenfilter when compared with PCA hardware. By utilizing the streaming method, 92% memory resources and 67% power consumption can be saved when compared with the direct implementation of GHA. Conclusion Stream-based Hebbian eigenfilter presents a novel approach to enable real-time spike sorting with reduced computational complexity and hardware costs. This new design can be further utilized for multi-channel neuro-physiological experiments or chronic implants. PMID:22490725
A Web-based Distributed Voluntary Computing Platform for Large Scale Hydrological Computations

NASA Astrophysics Data System (ADS)

Demir, I.; Agliamzanov, R.

2014-12-01

Distributed volunteer computing can enable researchers and scientist to form large parallel computing environments to utilize the computing power of the millions of computers on the Internet, and use them towards running large scale environmental simulations and models to serve the common good of local communities and the world. Recent developments in web technologies and standards allow client-side scripting languages to run at speeds close to native application, and utilize the power of Graphics Processing Units (GPU). Using a client-side scripting language like JavaScript, we have developed an open distributed computing framework that makes it easy for researchers to write their own hydrologic models, and run them on volunteer computers. Users will easily enable their websites for visitors to volunteer sharing their computer resources to contribute running advanced hydrological models and simulations. Using a web-based system allows users to start volunteering their computational resources within seconds without installing any software. The framework distributes the model simulation to thousands of nodes in small spatial and computational sizes. A relational database system is utilized for managing data connections and queue management for the distributed computing nodes. In this paper, we present a web-based distributed volunteer computing platform to enable large scale hydrological simulations and model runs in an open and integrated environment.
Tools and Techniques for Measuring and Improving Grid Performance

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Frumkin, M.; Smith, W.; VanderWijngaart, R.; Wong, P.; Biegel, Bryan (Technical Monitor)

2001-01-01

This viewgraph presentation provides information on NASA's geographically dispersed computing resources, and the various methods by which the disparate technologies are integrated within a nationwide computational grid. Many large-scale science and engineering projects are accomplished through the interaction of people, heterogeneous computing resources, information systems and instruments at different locations. The overall goal is to facilitate the routine interactions of these resources to reduce the time spent in design cycles, particularly for NASA's mission critical projects. The IPG (Information Power Grid) seeks to implement NASA's diverse computing resources in a fashion similar to the way in which electric power is made available.

Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets

PubMed Central

Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

2014-01-01

Background As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Methods Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Results Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Conclusions Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. PMID:24464852
Scaling predictive modeling in drug development with cloud computing.

PubMed

Moghadam, Behrooz Torabi; Alvarsson, Jonathan; Holm, Marcus; Eklund, Martin; Carlsson, Lars; Spjuth, Ola

2015-01-26

Growing data sets with increased time for analysis is hampering predictive modeling in drug discovery. Model building can be carried out on high-performance computer clusters, but these can be expensive to purchase and maintain. We have evaluated ligand-based modeling on cloud computing resources where computations are parallelized and run on the Amazon Elastic Cloud. We trained models on open data sets of varying sizes for the end points logP and Ames mutagenicity and compare with model building parallelized on a traditional high-performance computing cluster. We show that while high-performance computing results in faster model building, the use of cloud computing resources is feasible for large data sets and scales well within cloud instances. An additional advantage of cloud computing is that the costs of predictive models can be easily quantified, and a choice can be made between speed and economy. The easy access to computational resources with no up-front investments makes cloud computing an attractive alternative for scientists, especially for those without access to a supercomputer, and our study shows that it enables cost-efficient modeling of large data sets on demand within reasonable time.
ACToR: Aggregated Computational Toxicology Resource (T)

EPA Science Inventory

The EPA Aggregated Computational Toxicology Resource (ACToR) is a set of databases compiling information on chemicals in the environment from a large number of public and in-house EPA sources. ACToR has 3 main goals: (1) The serve as a repository of public toxicology information ...
A distributed computing approach to mission operations support. [for spacecraft

NASA Technical Reports Server (NTRS)

Larsen, R. L.

1975-01-01

Computing mission operation support includes orbit determination, attitude processing, maneuver computation, resource scheduling, etc. The large-scale third-generation distributed computer network discussed is capable of fulfilling these dynamic requirements. It is shown that distribution of resources and control leads to increased reliability, and exhibits potential for incremental growth. Through functional specialization, a distributed system may be tuned to very specific operational requirements. Fundamental to the approach is the notion of process-to-process communication, which is effected through a high-bandwidth communications network. Both resource-sharing and load-sharing may be realized in the system.
The OSG Open Facility: an on-ramp for opportunistic scientific computing

NASA Astrophysics Data System (ADS)

Jayatilaka, B.; Levshina, T.; Sehgal, C.; Gardner, R.; Rynge, M.; Würthwein, F.

2017-10-01

The Open Science Grid (OSG) is a large, robust computing grid that started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero, but has evolved in recent years to a much larger user and resource platform. In addition to meeting the US LHC community’s computational needs, the OSG continues to be one of the largest providers of distributed high-throughput computing (DHTC) to researchers from a wide variety of disciplines via the OSG Open Facility. The Open Facility consists of OSG resources that are available opportunistically to users other than resource owners and their collaborators. In the past two years, the Open Facility has doubled its annual throughput to over 200 million wall hours. More than half of these resources are used by over 100 individual researchers from over 60 institutions in fields such as biology, medicine, math, economics, and many others. Over 10% of these individual users utilized in excess of 1 million computational hours each in the past year. The largest source of these cycles is temporary unused capacity at institutions affiliated with US LHC computational sites. An increasing fraction, however, comes from university HPC clusters and large national infrastructure supercomputers offering unused capacity. Such expansions have allowed the OSG to provide ample computational resources to both individual researchers and small groups as well as sizable international science collaborations such as LIGO, AMS, IceCube, and sPHENIX. Opening up access to the Fermilab FabrIc for Frontier Experiments (FIFE) project has also allowed experiments such as mu2e and NOvA to make substantial use of Open Facility resources, the former with over 40 million wall hours in a year. We present how this expansion was accomplished as well as future plans for keeping the OSG Open Facility at the forefront of enabling scientific research by way of DHTC.
The OSG Open Facility: An On-Ramp for Opportunistic Scientific Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jayatilaka, B.; Levshina, T.; Sehgal, C.

The Open Science Grid (OSG) is a large, robust computing grid that started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero, but has evolved in recent years to a much larger user and resource platform. In addition to meeting the US LHC community’s computational needs, the OSG continues to be one of the largest providers of distributed high-throughput computing (DHTC) to researchers from a wide variety of disciplines via the OSG Open Facility. The Open Facility consists of OSG resources that are available opportunistically to users other than resource ownersmore » and their collaborators. In the past two years, the Open Facility has doubled its annual throughput to over 200 million wall hours. More than half of these resources are used by over 100 individual researchers from over 60 institutions in fields such as biology, medicine, math, economics, and many others. Over 10% of these individual users utilized in excess of 1 million computational hours each in the past year. The largest source of these cycles is temporary unused capacity at institutions affiliated with US LHC computational sites. An increasing fraction, however, comes from university HPC clusters and large national infrastructure supercomputers offering unused capacity. Such expansions have allowed the OSG to provide ample computational resources to both individual researchers and small groups as well as sizable international science collaborations such as LIGO, AMS, IceCube, and sPHENIX. Opening up access to the Fermilab FabrIc for Frontier Experiments (FIFE) project has also allowed experiments such as mu2e and NOvA to make substantial use of Open Facility resources, the former with over 40 million wall hours in a year. We present how this expansion was accomplished as well as future plans for keeping the OSG Open Facility at the forefront of enabling scientific research by way of DHTC.« less
Interoperability of GADU in using heterogeneous Grid resources for bioinformatics applications.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sulakhe, D.; Rodriguez, A.; Wilde, M.

2008-03-01

Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual datamore » system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.« less
Design and performance of the virtualization platform for offline computing on the ATLAS TDAQ Farm

NASA Astrophysics Data System (ADS)

Ballestrero, S.; Batraneanu, S. M.; Brasolin, F.; Contescu, C.; Di Girolamo, A.; Lee, C. J.; Pozo Astigarraga, M. E.; Scannicchio, D. A.; Twomey, M. S.; Zaytsev, A.

2014-06-01

With the LHC collider at CERN currently going through the period of Long Shutdown 1 there is an opportunity to use the computing resources of the experiments' large trigger farms for other data processing activities. In the case of the ATLAS experiment, the TDAQ farm, consisting of more than 1500 compute nodes, is suitable for running Monte Carlo (MC) production jobs that are mostly CPU and not I/O bound. This contribution gives a thorough review of the design and deployment of a virtualized platform running on this computing resource and of its use to run large groups of CernVM based virtual machines operating as a single CERN-P1 WLCG site. This platform has been designed to guarantee the security and the usability of the ATLAS private network, and to minimize interference with TDAQ's usage of the farm. Openstack has been chosen to provide a cloud management layer. The experience gained in the last 3.5 months shows that the use of the TDAQ farm for the MC simulation contributes to the ATLAS data processing at the level of a large Tier-1 WLCG site, despite the opportunistic nature of the underlying computing resources being used.
Processing Shotgun Proteomics Data on the Amazon Cloud with the Trans-Proteomic Pipeline*

PubMed Central

Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W.; Moritz, Robert L.

2015-01-01

Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. PMID:25418363
Processing shotgun proteomics data on the Amazon cloud with the trans-proteomic pipeline.

PubMed

Slagel, Joseph; Mendoza, Luis; Shteynberg, David; Deutsch, Eric W; Moritz, Robert L

2015-02-01

Cloud computing, where scalable, on-demand compute cycles and storage are available as a service, has the potential to accelerate mass spectrometry-based proteomics research by providing simple, expandable, and affordable large-scale computing to all laboratories regardless of location or information technology expertise. We present new cloud computing functionality for the Trans-Proteomic Pipeline, a free and open-source suite of tools for the processing and analysis of tandem mass spectrometry datasets. Enabled with Amazon Web Services cloud computing, the Trans-Proteomic Pipeline now accesses large scale computing resources, limited only by the available Amazon Web Services infrastructure, for all users. The Trans-Proteomic Pipeline runs in an environment fully hosted on Amazon Web Services, where all software and data reside on cloud resources to tackle large search studies. In addition, it can also be run on a local computer with computationally intensive tasks launched onto the Amazon Elastic Compute Cloud service to greatly decrease analysis times. We describe the new Trans-Proteomic Pipeline cloud service components, compare the relative performance and costs of various Elastic Compute Cloud service instance types, and present on-line tutorials that enable users to learn how to deploy cloud computing technology rapidly with the Trans-Proteomic Pipeline. We provide tools for estimating the necessary computing resources and costs given the scale of a job and demonstrate the use of cloud enabled Trans-Proteomic Pipeline by performing over 1100 tandem mass spectrometry files through four proteomic search engines in 9 h and at a very low cost. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.

PubMed

Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

2014-01-01

As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
TethysCluster: A comprehensive approach for harnessing cloud resources for hydrologic modeling

NASA Astrophysics Data System (ADS)

Nelson, J.; Jones, N.; Ames, D. P.

2015-12-01

Advances in water resources modeling are improving the information that can be supplied to support decisions affecting the safety and sustainability of society. However, as water resources models become more sophisticated and data-intensive they require more computational power to run. Purchasing and maintaining the computing facilities needed to support certain modeling tasks has been cost-prohibitive for many organizations. With the advent of the cloud, the computing resources needed to address this challenge are now available and cost-effective, yet there still remains a significant technical barrier to leverage these resources. This barrier inhibits many decision makers and even trained engineers from taking advantage of the best science and tools available. Here we present the Python tools TethysCluster and CondorPy, that have been developed to lower the barrier to model computation in the cloud by providing (1) programmatic access to dynamically scalable computing resources, (2) a batch scheduling system to queue and dispatch the jobs to the computing resources, (3) data management for job inputs and outputs, and (4) the ability to dynamically create, submit, and monitor computing jobs. These Python tools leverage the open source, computing-resource management, and job management software, HTCondor, to offer a flexible and scalable distributed-computing environment. While TethysCluster and CondorPy can be used independently to provision computing resources and perform large modeling tasks, they have also been integrated into Tethys Platform, a development platform for water resources web apps, to enable computing support for modeling workflows and decision-support systems deployed as web apps.
BNL ATLAS Grid Computing

ScienceCinema

Michael Ernst

2017-12-09

As the sole Tier-1 computing facility for ATLAS in the United States and the largest ATLAS computing center worldwide Brookhaven provides a large portion of the overall computing resources for U.S. collaborators and serves as the central hub for storing,
Grid site availability evaluation and monitoring at CMS

DOE PAGES

Lyons, Gaston; Maciulaitis, Rokas; Bagliesi, Giuseppe; ...

2017-10-01

The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) uses distributed grid computing to store, process, and analyse the vast quantity of scientific data recorded every year. The computing resources are grouped into sites and organized in a tiered structure. Each site provides computing and storage to the CMS computing grid. Over a hundred sites worldwide contribute with resources from hundred to well over ten thousand computing cores and storage from tens of TBytes to tens of PBytes. In such a large computing setup scheduled and unscheduled outages occur continually and are not allowed to significantly impactmore » data handling, processing, and analysis. Unscheduled capacity and performance reductions need to be detected promptly and corrected. CMS developed a sophisticated site evaluation and monitoring system for Run 1 of the LHC based on tools of the Worldwide LHC Computing Grid. For Run 2 of the LHC the site evaluation and monitoring system is being overhauled to enable faster detection/reaction to failures and a more dynamic handling of computing resources. Furthermore, enhancements to better distinguish site from central service issues and to make evaluations more transparent and informative to site support staff are planned.« less
Grid site availability evaluation and monitoring at CMS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lyons, Gaston; Maciulaitis, Rokas; Bagliesi, Giuseppe

The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) uses distributed grid computing to store, process, and analyse the vast quantity of scientific data recorded every year. The computing resources are grouped into sites and organized in a tiered structure. Each site provides computing and storage to the CMS computing grid. Over a hundred sites worldwide contribute with resources from hundred to well over ten thousand computing cores and storage from tens of TBytes to tens of PBytes. In such a large computing setup scheduled and unscheduled outages occur continually and are not allowed to significantly impactmore » data handling, processing, and analysis. Unscheduled capacity and performance reductions need to be detected promptly and corrected. CMS developed a sophisticated site evaluation and monitoring system for Run 1 of the LHC based on tools of the Worldwide LHC Computing Grid. For Run 2 of the LHC the site evaluation and monitoring system is being overhauled to enable faster detection/reaction to failures and a more dynamic handling of computing resources. Furthermore, enhancements to better distinguish site from central service issues and to make evaluations more transparent and informative to site support staff are planned.« less
Grid site availability evaluation and monitoring at CMS

NASA Astrophysics Data System (ADS)

Lyons, Gaston; Maciulaitis, Rokas; Bagliesi, Giuseppe; Lammel, Stephan; Sciabà, Andrea

2017-10-01

The Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) uses distributed grid computing to store, process, and analyse the vast quantity of scientific data recorded every year. The computing resources are grouped into sites and organized in a tiered structure. Each site provides computing and storage to the CMS computing grid. Over a hundred sites worldwide contribute with resources from hundred to well over ten thousand computing cores and storage from tens of TBytes to tens of PBytes. In such a large computing setup scheduled and unscheduled outages occur continually and are not allowed to significantly impact data handling, processing, and analysis. Unscheduled capacity and performance reductions need to be detected promptly and corrected. CMS developed a sophisticated site evaluation and monitoring system for Run 1 of the LHC based on tools of the Worldwide LHC Computing Grid. For Run 2 of the LHC the site evaluation and monitoring system is being overhauled to enable faster detection/reaction to failures and a more dynamic handling of computing resources. Enhancements to better distinguish site from central service issues and to make evaluations more transparent and informative to site support staff are planned.
Managing large-scale workflow execution from resource provisioning to provenance tracking: The CyberShake example

USGS Publications Warehouse

Deelman, E.; Callaghan, S.; Field, E.; Francoeur, H.; Graves, R.; Gupta, N.; Gupta, V.; Jordan, T.H.; Kesselman, C.; Maechling, P.; Mehringer, J.; Mehta, G.; Okaya, D.; Vahi, K.; Zhao, L.

2006-01-01

This paper discusses the process of building an environment where large-scale, complex, scientific analysis can be scheduled onto a heterogeneous collection of computational and storage resources. The example application is the Southern California Earthquake Center (SCEC) CyberShake project, an analysis designed to compute probabilistic seismic hazard curves for sites in the Los Angeles area. We explain which software tools were used to build to the system, describe their functionality and interactions. We show the results of running the CyberShake analysis that included over 250,000 jobs using resources available through SCEC and the TeraGrid. ?? 2006 IEEE.
Large Data at Small Universities: Astronomical processing using a computer classroom

NASA Astrophysics Data System (ADS)

Fuller, Nathaniel James; Clarkson, William I.; Fluharty, Bill; Belanger, Zach; Dage, Kristen

2016-06-01

The use of large computing clusters for astronomy research is becoming more commonplace as datasets expand, but access to these required resources is sometimes difficult for research groups working at smaller Universities. As an alternative to purchasing processing time on an off-site computing cluster, or purchasing dedicated hardware, we show how one can easily build a crude on-site cluster by utilizing idle cycles on instructional computers in computer-lab classrooms. Since these computers are maintained as part of the educational mission of the University, the resource impact on the investigator is generally low.By using open source Python routines, it is possible to have a large number of desktop computers working together via a local network to sort through large data sets. By running traditional analysis routines in an “embarrassingly parallel” manner, gains in speed are accomplished without requiring the investigator to learn how to write routines using highly specialized methodology. We demonstrate this concept here applied to 1. photometry of large-format images and 2. Statistical significance-tests for X-ray lightcurve analysis. In these scenarios, we see a speed-up factor which scales almost linearly with the number of cores in the cluster. Additionally, we show that the usage of the cluster does not severely limit performance for a local user, and indeed the processing can be performed while the computers are in use for classroom purposes.
Cloud computing for genomic data analysis and collaboration.

PubMed

Langmead, Ben; Nellore, Abhinav

2018-04-01

Next-generation sequencing has made major strides in the past decade. Studies based on large sequencing data sets are growing in number, and public archives for raw sequencing data have been doubling in size every 18 months. Leveraging these data requires researchers to use large-scale computational resources. Cloud computing, a model whereby users rent computers and storage from large data centres, is a solution that is gaining traction in genomics research. Here, we describe how cloud computing is used in genomics for research and large-scale collaborations, and argue that its elasticity, reproducibility and privacy features make it ideally suited for the large-scale reanalysis of publicly available archived data, including privacy-protected data.
Traffic Simulations on Parallel Computers Using Domain Decomposition Techniques

DOT National Transportation Integrated Search

1995-01-01

Large scale simulations of Intelligent Transportation Systems (ITS) can only be acheived by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic...

iTools: a framework for classification, categorization and integration of computational biology resources.

PubMed

Dinov, Ivo D; Rubin, Daniel; Lorensen, William; Dugan, Jonathan; Ma, Jeff; Murphy, Shawn; Kirschner, Beth; Bug, William; Sherman, Michael; Floratos, Aris; Kennedy, David; Jagadish, H V; Schmidt, Jeanette; Athey, Brian; Califano, Andrea; Musen, Mark; Altman, Russ; Kikinis, Ron; Kohane, Isaac; Delp, Scott; Parker, D Stott; Toga, Arthur W

2008-05-28

The advancement of the computational biology field hinges on progress in three fundamental directions--the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources--data, software tools and web-services. The iTools design, implementation and resource meta-data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu.
Semantics-based distributed I/O with the ParaMEDIC framework.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Balaji, P.; Feng, W.; Lin, H.

2008-01-01

Many large-scale applications simultaneously rely on multiple resources for efficient execution. For example, such applications may require both large compute and storage resources; however, very few supercomputing centers can provide large quantities of both. Thus, data generated at the compute site oftentimes has to be moved to a remote storage site for either storage or visualization and analysis. Clearly, this is not an efficient model, especially when the two sites are distributed over a wide-area network. Thus, we present a framework called 'ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing' which uses application-specific semantic information to convert the generatedmore » data to orders-of-magnitude smaller metadata at the compute site, transfer the metadata to the storage site, and re-process the metadata at the storage site to regenerate the output. Specifically, ParaMEDIC trades a small amount of additional computation (in the form of data post-processing) for a potentially significant reduction in data that needs to be transferred in distributed environments.« less
A Review of High-Performance Computational Strategies for Modeling and Imaging of Electromagnetic Induction Data

NASA Astrophysics Data System (ADS)

Newman, Gregory A.

2014-01-01

Many geoscientific applications exploit electrostatic and electromagnetic fields to interrogate and map subsurface electrical resistivity—an important geophysical attribute for characterizing mineral, energy, and water resources. In complex three-dimensional geologies, where many of these resources remain to be found, resistivity mapping requires large-scale modeling and imaging capabilities, as well as the ability to treat significant data volumes, which can easily overwhelm single-core and modest multicore computing hardware. To treat such problems requires large-scale parallel computational resources, necessary for reducing the time to solution to a time frame acceptable to the exploration process. The recognition that significant parallel computing processes must be brought to bear on these problems gives rise to choices that must be made in parallel computing hardware and software. In this review, some of these choices are presented, along with the resulting trade-offs. We also discuss future trends in high-performance computing and the anticipated impact on electromagnetic (EM) geophysics. Topics discussed in this review article include a survey of parallel computing platforms, graphics processing units to multicore CPUs with a fast interconnect, along with effective parallel solvers and associated solver libraries effective for inductive EM modeling and imaging.
Implementing Parquet equations using HPX

NASA Astrophysics Data System (ADS)

Kellar, Samuel; Wagle, Bibek; Yang, Shuxiang; Tam, Ka-Ming; Kaiser, Hartmut; Moreno, Juana; Jarrell, Mark

A new C++ runtime system (HPX) enables simulations of complex systems to run more efficiently on parallel and heterogeneous systems. This increased efficiency allows for solutions to larger simulations of the parquet approximation for a system with impurities. The relevancy of the parquet equations depends upon the ability to solve systems which require long runs and large amounts of memory. These limitations, in addition to numerical complications arising from stability of the solutions, necessitate running on large distributed systems. As the computational resources trend towards the exascale and the limitations arising from computational resources vanish efficiency of large scale simulations becomes a focus. HPX facilitates efficient simulations through intelligent overlapping of computation and communication. Simulations such as the parquet equations which require the transfer of large amounts of data should benefit from HPX implementations. Supported by the the NSF EPSCoR Cooperative Agreement No. EPS-1003897 with additional support from the Louisiana Board of Regents.
AIMES Final Technical Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Katz, Daniel S; Jha, Shantenu; Weissman, Jon

2017-01-31

This is the final technical report for the AIMES project. Many important advances in science and engineering are due to large-scale distributed computing. Notwithstanding this reliance, we are still learning how to design and deploy large-scale production Distributed Computing Infrastructures (DCI). This is evidenced by missing design principles for DCI, and an absence of generally acceptable and usable distributed computing abstractions. The AIMES project was conceived against this backdrop, following on the heels of a comprehensive survey of scientific distributed applications. AIMES laid the foundations to address the tripartite challenge of dynamic resource management, integrating information, and portable and interoperablemore » distributed applications. Four abstractions were defined and implemented: skeleton, resource bundle, pilot, and execution strategy. The four abstractions were implemented into software modules and then aggregated into the AIMES middleware. This middleware successfully integrates information across the application layer (skeletons) and resource layer (Bundles), derives a suitable execution strategy for the given skeleton and enacts its execution by means of pilots on one or more resources, depending on the application requirements, and resource availabilities and capabilities.« less
AIMES Final Technical Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Weissman, Jon; Katz, Dan; Jha, Shantenu

2017-01-31

This is the final technical report for the AIMES project. Many important advances in science and engineering are due to large scale distributed computing. Notwithstanding this reliance, we are still learning how to design and deploy large-scale production Distributed Computing Infrastructures (DCI). This is evidenced by missing design principles for DCI, and an absence of generally acceptable and usable distributed computing abstractions. The AIMES project was conceived against this backdrop, following on the heels of a comprehensive survey of scientific distributed applications. AIMES laid the foundations to address the tripartite challenge of dynamic resource management, integrating information, and portable andmore » interoperable distributed applications. Four abstractions were defined and implemented: skeleton, resource bundle, pilot, and execution strategy. The four abstractions were implemented into software modules and then aggregated into the AIMES middleware. This middleware successfully integrates information across the application layer (skeletons) and resource layer (Bundles), derives a suitable execution strategy for the given skeleton and enacts its execution by means of pilots on one or more resources, depending on the application requirements, and resource availabilities and capabilities.« less
Step-by-step magic state encoding for efficient fault-tolerant quantum computation

PubMed Central

Goto, Hayato

2014-01-01

Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation. PMID:25511387
Step-by-step magic state encoding for efficient fault-tolerant quantum computation.

PubMed

Goto, Hayato

2014-12-16

Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation.
Lambda Data Grid: Communications Architecture in Support of Grid Computing

DTIC Science & Technology

2006-12-21

number of paradigm shifts in the 20th century, including the growth of large geographically dispersed teams and the use of simulations and computational...get results. The work in this thesis automates the orchestration of networks with other resources, better utilizing all resources in a time efficient...domains, over transatlantic links in around minute. The main goal of this thesis is to build a new grid-computing paradigm that fully harnesses the
iTools: A Framework for Classification, Categorization and Integration of Computational Biology Resources

PubMed Central

Dinov, Ivo D.; Rubin, Daniel; Lorensen, William; Dugan, Jonathan; Ma, Jeff; Murphy, Shawn; Kirschner, Beth; Bug, William; Sherman, Michael; Floratos, Aris; Kennedy, David; Jagadish, H. V.; Schmidt, Jeanette; Athey, Brian; Califano, Andrea; Musen, Mark; Altman, Russ; Kikinis, Ron; Kohane, Isaac; Delp, Scott; Parker, D. Stott; Toga, Arthur W.

2008-01-01

The advancement of the computational biology field hinges on progress in three fundamental directions – the development of new computational algorithms, the availability of informatics resource management infrastructures and the capability of tools to interoperate and synergize. There is an explosion in algorithms and tools for computational biology, which makes it difficult for biologists to find, compare and integrate such resources. We describe a new infrastructure, iTools, for managing the query, traversal and comparison of diverse computational biology resources. Specifically, iTools stores information about three types of resources–data, software tools and web-services. The iTools design, implementation and resource meta - data content reflect the broad research, computational, applied and scientific expertise available at the seven National Centers for Biomedical Computing. iTools provides a system for classification, categorization and integration of different computational biology resources across space-and-time scales, biomedical problems, computational infrastructures and mathematical foundations. A large number of resources are already iTools-accessible to the community and this infrastructure is rapidly growing. iTools includes human and machine interfaces to its resource meta-data repository. Investigators or computer programs may utilize these interfaces to search, compare, expand, revise and mine meta-data descriptions of existent computational biology resources. We propose two ways to browse and display the iTools dynamic collection of resources. The first one is based on an ontology of computational biology resources, and the second one is derived from hyperbolic projections of manifolds or complex structures onto planar discs. iTools is an open source project both in terms of the source code development as well as its meta-data content. iTools employs a decentralized, portable, scalable and lightweight framework for long-term resource management. We demonstrate several applications of iTools as a framework for integrated bioinformatics. iTools and the complete details about its specifications, usage and interfaces are available at the iTools web page http://iTools.ccb.ucla.edu. PMID:18509477
LaRC local area networks to support distributed computing

NASA Technical Reports Server (NTRS)

Riddle, E. P.

1984-01-01

The Langley Research Center's (LaRC) Local Area Network (LAN) effort is discussed. LaRC initiated the development of a LAN to support a growing distributed computing environment at the Center. The purpose of the network is to provide an improved capability (over inteactive and RJE terminal access) for sharing multivendor computer resources. Specifically, the network will provide a data highway for the transfer of files between mainframe computers, minicomputers, work stations, and personal computers. An important influence on the overall network design was the vital need of LaRC researchers to efficiently utilize the large CDC mainframe computers in the central scientific computing facility. Although there was a steady migration from a centralized to a distributed computing environment at LaRC in recent years, the work load on the central resources increased. Major emphasis in the network design was on communication with the central resources within the distributed environment. The network to be implemented will allow researchers to utilize the central resources, distributed minicomputers, work stations, and personal computers to obtain the proper level of computing power to efficiently perform their jobs.
Design for Run-Time Monitor on Cloud Computing

NASA Astrophysics Data System (ADS)

Kang, Mikyung; Kang, Dong-In; Yun, Mira; Park, Gyung-Leen; Lee, Junghoon

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is the type of a parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring the system status change, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize resources on cloud computing. RTM monitors application software through library instrumentation as well as underlying hardware through performance counter optimizing its computing configuration based on the analyzed data.
Grid computing in large pharmaceutical molecular modeling.

PubMed

Claus, Brian L; Johnson, Stephen R

2008-07-01

Most major pharmaceutical companies have employed grid computing to expand their compute resources with the intention of minimizing additional financial expenditure. Historically, one of the issues restricting widespread utilization of the grid resources in molecular modeling is the limited set of suitable applications amenable to coarse-grained parallelization. Recent advances in grid infrastructure technology coupled with advances in application research and redesign will enable fine-grained parallel problems, such as quantum mechanics and molecular dynamics, which were previously inaccessible to the grid environment. This will enable new science as well as increase resource flexibility to load balance and schedule existing workloads.
Changing from computing grid to knowledge grid in life-science grid.

PubMed

Talukdar, Veera; Konar, Amit; Datta, Ayan; Choudhury, Anamika Roy

2009-09-01

Grid computing has a great potential to become a standard cyber infrastructure for life sciences that often require high-performance computing and large data handling, which exceeds the computing capacity of a single institution. Grid computer applies the resources of many computers in a network to a single problem at the same time. It is useful to scientific problems that require a great number of computer processing cycles or access to a large amount of data.As biologists,we are constantly discovering millions of genes and genome features, which are assembled in a library and distributed on computers around the world.This means that new, innovative methods must be developed that exploit the re-sources available for extensive calculations - for example grid computing.This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing a "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. By extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community.
Polyphony: A Workflow Orchestration Framework for Cloud Computing

NASA Technical Reports Server (NTRS)

Shams, Khawaja S.; Powell, Mark W.; Crockett, Tom M.; Norris, Jeffrey S.; Rossi, Ryan; Soderstrom, Tom

2010-01-01

Cloud Computing has delivered unprecedented compute capacity to NASA missions at affordable rates. Missions like the Mars Exploration Rovers (MER) and Mars Science Lab (MSL) are enjoying the elasticity that enables them to leverage hundreds, if not thousands, or machines for short durations without making any hardware procurements. In this paper, we describe Polyphony, a resilient, scalable, and modular framework that efficiently leverages a large set of computing resources to perform parallel computations. Polyphony can employ resources on the cloud, excess capacity on local machines, as well as spare resources on the supercomputing center, and it enables these resources to work in concert to accomplish a common goal. Polyphony is resilient to node failures, even if they occur in the middle of a transaction. We will conclude with an evaluation of a production-ready application built on top of Polyphony to perform image-processing operations of images from around the solar system, including Mars, Saturn, and Titan.
Study on the application of mobile internet cloud computing platform

NASA Astrophysics Data System (ADS)

Gong, Songchun; Fu, Songyin; Chen, Zheng

2012-04-01

The innovative development of computer technology promotes the application of the cloud computing platform, which actually is the substitution and exchange of a sort of resource service models and meets the needs of users on the utilization of different resources after changes and adjustments of multiple aspects. "Cloud computing" owns advantages in many aspects which not merely reduce the difficulties to apply the operating system and also make it easy for users to search, acquire and process the resources. In accordance with this point, the author takes the management of digital libraries as the research focus in this paper, and analyzes the key technologies of the mobile internet cloud computing platform in the operation process. The popularization and promotion of computer technology drive people to create the digital library models, and its core idea is to strengthen the optimal management of the library resource information through computers and construct an inquiry and search platform with high performance, allowing the users to access to the necessary information resources at any time. However, the cloud computing is able to promote the computations within the computers to distribute in a large number of distributed computers, and hence implement the connection service of multiple computers. The digital libraries, as a typical representative of the applications of the cloud computing, can be used to carry out an analysis on the key technologies of the cloud computing.
Networked Microcomputers--The Next Generation in College Computing.

ERIC Educational Resources Information Center

Harris, Albert L.

The evolution of computer hardware for college computing has mirrored the industry's growth. When computers were introduced into the educational environment, they had limited capacity and served one user at a time. Then came large mainframes with many terminals sharing the resource. Next, the use of computers in office automation emerged. As…
AGIS: The ATLAS Grid Information System

NASA Astrophysics Data System (ADS)

Anisenkov, A.; Di Girolamo, A.; Klimentov, A.; Oleynik, D.; Petrosyan, A.; Atlas Collaboration

2014-06-01

ATLAS, a particle physics experiment at the Large Hadron Collider at CERN, produced petabytes of data annually through simulation production and tens of petabytes of data per year from the detector itself. The ATLAS computing model embraces the Grid paradigm and a high degree of decentralization and computing resources able to meet ATLAS requirements of petabytes scale data operations. In this paper we describe the ATLAS Grid Information System (AGIS), designed to integrate configuration and status information about resources, services and topology of the computing infrastructure used by the ATLAS Distributed Computing applications and services.
Computer User's Guide to the Protection of Information Resources. NIST Special Publication 500-171.

ERIC Educational Resources Information Center

Helsing, Cheryl; And Others

Computers have changed the way information resources are handled. Large amounts of information are stored in one central place and can be accessed from remote locations. Users have a personal responsibility for the security of the system and the data stored in it. This document outlines the user's responsibilities and provides security and control…
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ali, Amjad Majid; Albert, Don; Andersson, Par

SLURM is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small computer clusters. As a cluster resource manager, SLURM has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work 9normally a parallel job) on the set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work.

A System for Monitoring and Management of Computational Grids

NASA Technical Reports Server (NTRS)

Smith, Warren; Biegel, Bryan (Technical Monitor)

2002-01-01

As organizations begin to deploy large computational grids, it has become apparent that systems for observation and control of the resources, services, and applications that make up such grids are needed. Administrators must observe the operation of resources and services to ensure that they are operating correctly and they must control the resources and services to ensure that their operation meets the needs of users. Users are also interested in the operation of resources and services so that they can choose the most appropriate ones to use. In this paper we describe a prototype system to monitor and manage computational grids and describe the general software framework for control and observation in distributed environments that it is based on.
Job Management and Task Bundling

NASA Astrophysics Data System (ADS)

Berkowitz, Evan; Jansen, Gustav R.; McElvain, Kenneth; Walker-Loud, André

2018-03-01

High Performance Computing is often performed on scarce and shared computing resources. To ensure computers are used to their full capacity, administrators often incentivize large workloads that are not possible on smaller systems. Measurements in Lattice QCD frequently do not scale to machine-size workloads. By bundling tasks together we can create large jobs suitable for gigantic partitions. We discuss METAQ and mpi_jm, software developed to dynamically group computational tasks together, that can intelligently backfill to consume idle time without substantial changes to users' current workflows or executables.
Large-scale high-throughput computer-aided discovery of advanced materials using cloud computing

NASA Astrophysics Data System (ADS)

Bazhirov, Timur; Mohammadi, Mohammad; Ding, Kevin; Barabash, Sergey

Recent advances in cloud computing made it possible to access large-scale computational resources completely on-demand in a rapid and efficient manner. When combined with high fidelity simulations, they serve as an alternative pathway to enable computational discovery and design of new materials through large-scale high-throughput screening. Here, we present a case study for a cloud platform implemented at Exabyte Inc. We perform calculations to screen lightweight ternary alloys for thermodynamic stability. Due to the lack of experimental data for most such systems, we rely on theoretical approaches based on first-principle pseudopotential density functional theory. We calculate the formation energies for a set of ternary compounds approximated by special quasirandom structures. During an example run we were able to scale to 10,656 CPUs within 7 minutes from the start, and obtain results for 296 compounds within 38 hours. The results indicate that the ultimate formation enthalpy of ternary systems can be negative for some of lightweight alloys, including Li and Mg compounds. We conclude that compared to traditional capital-intensive approach that requires in on-premises hardware resources, cloud computing is agile and cost-effective, yet scalable and delivers similar performance.
Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing

PubMed Central

Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon

2011-01-01

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811
Design and development of a run-time monitor for multi-core architectures in cloud computing.

PubMed

Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon

2011-01-01

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.
Surfer: An Extensible Pull-Based Framework for Resource Selection and Ranking

NASA Technical Reports Server (NTRS)

Zolano, Paul Z.

2004-01-01

Grid computing aims to connect large numbers of geographically and organizationally distributed resources to increase computational power; resource utilization, and resource accessibility. In order to effectively utilize grids, users need to be connected to the best available resources at any given time. As grids are in constant flux, users cannot be expected to keep up with the configuration and status of the grid, thus they must be provided with automatic resource brokering for selecting and ranking resources meeting constraints and preferences they specify. This paper presents a new OGSI-compliant resource selection and ranking framework called Surfer that has been implemented as part of NASA's Information Power Grid (IPG) project. Surfer is highly extensible and may be integrated into any grid environment by adding information providers knowledgeable about that environment.
Using Cloud Computing infrastructure with CloudBioLinux, CloudMan and Galaxy

PubMed Central

Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

2012-01-01

Cloud computing has revolutionized availability and access to computing and storage resources; making it possible to provision a large computational infrastructure with only a few clicks in a web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this protocol, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatics analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to setup the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command line interface, and the web-based Galaxy interface. PMID:22700313
Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy.

PubMed

Afgan, Enis; Chapman, Brad; Jadan, Margita; Franke, Vedran; Taylor, James

2012-06-01

Cloud computing has revolutionized availability and access to computing and storage resources, making it possible to provision a large computational infrastructure with only a few clicks in a Web browser. However, those resources are typically provided in the form of low-level infrastructure components that need to be procured and configured before use. In this unit, we demonstrate how to utilize cloud computing resources to perform open-ended bioinformatic analyses, with fully automated management of the underlying cloud infrastructure. By combining three projects, CloudBioLinux, CloudMan, and Galaxy, into a cohesive unit, we have enabled researchers to gain access to more than 100 preconfigured bioinformatics tools and gigabytes of reference genomes on top of the flexible cloud computing infrastructure. The protocol demonstrates how to set up the available infrastructure and how to use the tools via a graphical desktop interface, a parallel command-line interface, and the Web-based Galaxy interface.
Infrastructures for Distributed Computing: the case of BESIII

NASA Astrophysics Data System (ADS)

Pellegrino, J.

2018-05-01

The BESIII is an electron-positron collision experiment hosted at BEPCII in Beijing and aimed to investigate Tau-Charm physics. Now BESIII has been running for several years and gathered more than 1PB raw data. In order to analyze these data and perform massive Monte Carlo simulations, a large amount of computing and storage resources is needed. The distributed computing system is based up on DIRAC and it is in production since 2012. It integrates computing and storage resources from different institutes and a variety of resource types such as cluster, grid, cloud or volunteer computing. About 15 sites from BESIII Collaboration from all over the world joined this distributed computing infrastructure, giving a significant contribution to the IHEP computing facility. Nowadays cloud computing is playing a key role in the HEP computing field, due to its scalability and elasticity. Cloud infrastructures take advantages of several tools, such as VMDirac, to manage virtual machines through cloud managers according to the job requirements. With the virtually unlimited resources from commercial clouds, the computing capacity could scale accordingly in order to deal with any burst demands. General computing models have been discussed in the talk and are addressed herewith, with particular focus on the BESIII infrastructure. Moreover new computing tools and upcoming infrastructures will be addressed.
Data Center Consolidation: A Step towards Infrastructure Clouds

NASA Astrophysics Data System (ADS)

Winter, Markus

Application service providers face enormous challenges and rising costs in managing and operating a growing number of heterogeneous system and computing landscapes. Limitations of traditional computing environments force IT decision-makers to reorganize computing resources within the data center, as continuous growth leads to an inefficient utilization of the underlying hardware infrastructure. This paper discusses a way for infrastructure providers to improve data center operations based on the findings of a case study on resource utilization of very large business applications and presents an outlook beyond server consolidation endeavors, transforming corporate data centers into compute clouds.
A Grid Infrastructure for Supporting Space-based Science Operations

NASA Technical Reports Server (NTRS)

Bradford, Robert N.; Redman, Sandra H.; McNair, Ann R. (Technical Monitor)

2002-01-01

Emerging technologies for computational grid infrastructures have the potential for revolutionizing the way computers are used in all aspects of our lives. Computational grids are currently being implemented to provide a large-scale, dynamic, and secure research and engineering environments based on standards and next-generation reusable software, enabling greater science and engineering productivity through shared resources and distributed computing for less cost than traditional architectures. Combined with the emerging technologies of high-performance networks, grids provide researchers, scientists and engineers the first real opportunity for an effective distributed collaborative environment with access to resources such as computational and storage systems, instruments, and software tools and services for the most computationally challenging applications.
Workflow Management Systems for Molecular Dynamics on Leadership Computers

NASA Astrophysics Data System (ADS)

Wells, Jack; Panitkin, Sergey; Oleynik, Danila; Jha, Shantenu

Molecular Dynamics (MD) simulations play an important role in a range of disciplines from Material Science to Biophysical systems and account for a large fraction of cycles consumed on computing resources. Increasingly science problems require the successful execution of ''many'' MD simulations as opposed to a single MD simulation. There is a need to provide scalable and flexible approaches to the execution of the workload. We present preliminary results on the Titan computer at the Oak Ridge Leadership Computing Facility that demonstrate a general capability to manage workload execution agnostic of a specific MD simulation kernel or execution pattern, and in a manner that integrates disparate grid-based and supercomputing resources. Our results build upon our extensive experience of distributed workload management in the high-energy physics ATLAS project using PanDA (Production and Distributed Analysis System), coupled with recent conceptual advances in our understanding of workload management on heterogeneous resources. We will discuss how we will generalize these initial capabilities towards a more production level service on DOE leadership resources. This research is sponsored by US DOE/ASCR and used resources of the OLCF computing facility.
Federated data storage system prototype for LHC experiments and data intensive science

NASA Astrophysics Data System (ADS)

Kiryanov, A.; Klimentov, A.; Krasnopevtsev, D.; Ryabinkin, E.; Zarochentsev, A.

2017-10-01

Rapid increase of data volume from the experiments running at the Large Hadron Collider (LHC) prompted physics computing community to evaluate new data handling and processing solutions. Russian grid sites and universities’ clusters scattered over a large area aim at the task of uniting their resources for future productive work, at the same time giving an opportunity to support large physics collaborations. In our project we address the fundamental problem of designing a computing architecture to integrate distributed storage resources for LHC experiments and other data-intensive science applications and to provide access to data from heterogeneous computing facilities. Studies include development and implementation of federated data storage prototype for Worldwide LHC Computing Grid (WLCG) centres of different levels and University clusters within one National Cloud. The prototype is based on computing resources located in Moscow, Dubna, Saint Petersburg, Gatchina and Geneva. This project intends to implement a federated distributed storage for all kind of operations such as read/write/transfer and access via WAN from Grid centres, university clusters, supercomputers, academic and commercial clouds. The efficiency and performance of the system are demonstrated using synthetic and experiment-specific tests including real data processing and analysis workflows from ATLAS and ALICE experiments, as well as compute-intensive bioinformatics applications (PALEOMIX) running on supercomputers. We present topology and architecture of the designed system, report performance and statistics for different access patterns and show how federated data storage can be used efficiently by physicists and biologists. We also describe how sharing data on a widely distributed storage system can lead to a new computing model and reformations of computing style, for instance how bioinformatics program running on supercomputers can read/write data from the federated storage.
Concepts and Plans towards fast large scale Monte Carlo production for the ATLAS Experiment

NASA Astrophysics Data System (ADS)

Ritsch, E.; Atlas Collaboration

2014-06-01

The huge success of the physics program of the ATLAS experiment at the Large Hadron Collider (LHC) during Run 1 relies upon a great number of simulated Monte Carlo events. This Monte Carlo production takes the biggest part of the computing resources being in use by ATLAS as of now. In this document we describe the plans to overcome the computing resource limitations for large scale Monte Carlo production in the ATLAS Experiment for Run 2, and beyond. A number of fast detector simulation, digitization and reconstruction techniques are being discussed, based upon a new flexible detector simulation framework. To optimally benefit from these developments, a redesigned ATLAS MC production chain is presented at the end of this document.
Cognitive Model Exploration and Optimization: A New Challenge for Computational Science

DTIC Science & Technology

2010-03-01

the generation and analysis of computational cognitive models to explain various aspects of cognition. Typically the behavior of these models...computational scale of a workstation, so we have turned to high performance computing (HPC) clusters and volunteer computing for large-scale...computational resources. The majority of applications on the Department of Defense HPC clusters focus on solving partial differential equations (Post
Simple Linux Utility for Resource Management

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jette, M.

2009-09-09

SLURM is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small computer clusters. As a cluster resource manager, SLURM has three key functions. First, it allocates exclusive and/or non exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allciated nodes. Finally, it arbitrates conflicting requests for resouces by managing a queue of pending work.
Overview of Computer Simulation Modeling Approaches and Methods

Treesearch

Robert E. Manning; Robert M. Itami; David N. Cole; Randy Gimblett

2005-01-01

The field of simulation modeling has grown greatly with recent advances in computer hardware and software. Much of this work has involved large scientific and industrial applications for which substantial financial resources are available. However, advances in object-oriented programming and simulation methodology, concurrent with dramatic increases in computer...
Some Measurement and Instruction Related Considerations Regarding Computer Assisted Testing.

ERIC Educational Resources Information Center

Oosterhof, Albert C.; Salisbury, David F.

The Assessment Resource Center (ARC) at Florida State University provides computer assisted testing (CAT) for approximately 4,000 students each term. Computer capabilities permit a small proctoring staff to administer tests simultaneously to large numbers of students. Programs provide immediate feedback for students and generate a variety of…
Efficient multi-objective calibration of a computationally intensive hydrologic model with parallel computing software in Python

USDA-ARS?s Scientific Manuscript database

With enhanced data availability, distributed watershed models for large areas with high spatial and temporal resolution are increasingly used to understand water budgets and examine effects of human activities and climate change/variability on water resources. Developing parallel computing software...
Graded Lexicons: New Resources for Educational Purposes and Much More

ERIC Educational Resources Information Center

Gala, Núria; Billami, Mokhtar B.; François, Thomas; Bernhard, Delphine

2015-01-01

Computational tools and resources play an important role for vocabulary acquisition. Although a large variety of dictionaries and learning games are available, few resources provide information about the complexity of a word, either for learning or for comprehension. The idea here is to use frequency counts combined with intralexical variables to…

Developing science gateways for drug discovery in a grid environment.

PubMed

Pérez-Sánchez, Horacio; Rezaei, Vahid; Mezhuyev, Vitaliy; Man, Duhu; Peña-García, Jorge; den-Haan, Helena; Gesing, Sandra

2016-01-01

Methods for in silico screening of large databases of molecules increasingly complement and replace experimental techniques to discover novel compounds to combat diseases. As these techniques become more complex and computationally costly we are faced with an increasing problem to provide the research community of life sciences with a convenient tool for high-throughput virtual screening on distributed computing resources. To this end, we recently integrated the biophysics-based drug-screening program FlexScreen into a service, applicable for large-scale parallel screening and reusable in the context of scientific workflows. Our implementation is based on Pipeline Pilot and Simple Object Access Protocol and provides an easy-to-use graphical user interface to construct complex workflows, which can be executed on distributed computing resources, thus accelerating the throughput by several orders of magnitude.
AGIS: The ATLAS Grid Information System

NASA Astrophysics Data System (ADS)

Anisenkov, Alexey; Belov, Sergey; Di Girolamo, Alessandro; Gayazov, Stavro; Klimentov, Alexei; Oleynik, Danila; Senchenko, Alexander

2012-12-01

ATLAS is a particle physics experiment at the Large Hadron Collider at CERN. The experiment produces petabytes of data annually through simulation production and tens petabytes of data per year from the detector itself. The ATLAS Computing model embraces the Grid paradigm and a high degree of decentralization and computing resources able to meet ATLAS requirements of petabytes scale data operations. In this paper we present ATLAS Grid Information System (AGIS) designed to integrate configuration and status information about resources, services and topology of whole ATLAS Grid needed by ATLAS Distributed Computing applications and services.
Common Accounting System for Monitoring the ATLAS Distributed Computing Resources

NASA Astrophysics Data System (ADS)

Karavakis, E.; Andreeva, J.; Campana, S.; Gayazov, S.; Jezequel, S.; Saiz, P.; Sargsyan, L.; Schovancova, J.; Ueda, I.; Atlas Collaboration

2014-06-01

This paper covers in detail a variety of accounting tools used to monitor the utilisation of the available computational and storage resources within the ATLAS Distributed Computing during the first three years of Large Hadron Collider data taking. The Experiment Dashboard provides a set of common accounting tools that combine monitoring information originating from many different information sources; either generic or ATLAS specific. This set of tools provides quality and scalable solutions that are flexible enough to support the constantly evolving requirements of the ATLAS user community.
Large Scale Computing and Storage Requirements for High Energy Physics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gerber, Richard A.; Wasserman, Harvey

2010-11-24

The National Energy Research Scientific Computing Center (NERSC) is the leading scientific computing facility for the Department of Energy's Office of Science, providing high-performance computing (HPC) resources to more than 3,000 researchers working on about 400 projects. NERSC provides large-scale computing resources and, crucially, the support and expertise needed for scientists to make effective use of them. In November 2009, NERSC, DOE's Office of Advanced Scientific Computing Research (ASCR), and DOE's Office of High Energy Physics (HEP) held a workshop to characterize the HPC resources needed at NERSC to support HEP research through the next three to five years. Themore » effort is part of NERSC's legacy of anticipating users needs and deploying resources to meet those demands. The workshop revealed several key points, in addition to achieving its goal of collecting and characterizing computing requirements. The chief findings: (1) Science teams need access to a significant increase in computational resources to meet their research goals; (2) Research teams need to be able to read, write, transfer, store online, archive, analyze, and share huge volumes of data; (3) Science teams need guidance and support to implement their codes on future architectures; and (4) Projects need predictable, rapid turnaround of their computational jobs to meet mission-critical time constraints. This report expands upon these key points and includes others. It also presents a number of case studies as representative of the research conducted within HEP. Workshop participants were asked to codify their requirements in this case study format, summarizing their science goals, methods of solution, current and three-to-five year computing requirements, and software and support needs. Participants were also asked to describe their strategy for computing in the highly parallel, multi-core environment that is expected to dominate HPC architectures over the next few years. The report includes a section that describes efforts already underway or planned at NERSC that address requirements collected at the workshop. NERSC has many initiatives in progress that address key workshop findings and are aligned with NERSC's strategic plans.« less
Nick Grue | NREL

Science.gov Websites

geospatial data analysis using parallel processing High performance computing Renewable resource technical potential and supply curve analysis Spatial database utilization Rapid analysis of large geospatial datasets energy and geospatial analysis products Research Interests Rapid, web-based renewable resource analysis
Integrating Cloud-Computing-Specific Model into Aircraft Design

NASA Astrophysics Data System (ADS)

Zhimin, Tian; Qi, Lin; Guangwen, Yang

Cloud Computing is becoming increasingly relevant, as it will enable companies involved in spreading this technology to open the door to Web 3.0. In the paper, the new categories of services introduced will slowly replace many types of computational resources currently used. In this perspective, grid computing, the basic element for the large scale supply of cloud services, will play a fundamental role in defining how those services will be provided. The paper tries to integrate cloud computing specific model into aircraft design. This work has acquired good results in sharing licenses of large scale and expensive software, such as CFD (Computational Fluid Dynamics), UG, CATIA, and so on.
Managing competing elastic Grid and Cloud scientific computing applications using OpenNebula

NASA Astrophysics Data System (ADS)

Bagnasco, S.; Berzano, D.; Lusso, S.; Masera, M.; Vallero, S.

2015-12-01

Elastic cloud computing applications, i.e. applications that automatically scale according to computing needs, work on the ideal assumption of infinite resources. While large public cloud infrastructures may be a reasonable approximation of this condition, scientific computing centres like WLCG Grid sites usually work in a saturated regime, in which applications compete for scarce resources through queues, priorities and scheduling policies, and keeping a fraction of the computing cores idle to allow for headroom is usually not an option. In our particular environment one of the applications (a WLCG Tier-2 Grid site) is much larger than all the others and cannot autoscale easily. Nevertheless, other smaller applications can benefit of automatic elasticity; the implementation of this property in our infrastructure, based on the OpenNebula cloud stack, will be described and the very first operational experiences with a small number of strategies for timely allocation and release of resources will be discussed.
Parallel high-performance grid computing: capabilities and opportunities of a novel demanding service and business class allowing highest resource efficiency.

PubMed

Kepper, Nick; Ettig, Ramona; Dickmann, Frank; Stehr, Rene; Grosveld, Frank G; Wedemann, Gero; Knoch, Tobias A

2010-01-01

Especially in the life-science and the health-care sectors the huge IT requirements are imminent due to the large and complex systems to be analysed and simulated. Grid infrastructures play here a rapidly increasing role for research, diagnostics, and treatment, since they provide the necessary large-scale resources efficiently. Whereas grids were first used for huge number crunching of trivially parallelizable problems, increasingly parallel high-performance computing is required. Here, we show for the prime example of molecular dynamic simulations how the presence of large grid clusters including very fast network interconnects within grid infrastructures allows now parallel high-performance grid computing efficiently and thus combines the benefits of dedicated super-computing centres and grid infrastructures. The demands for this service class are the highest since the user group has very heterogeneous requirements: i) two to many thousands of CPUs, ii) different memory architectures, iii) huge storage capabilities, and iv) fast communication via network interconnects, are all needed in different combinations and must be considered in a highly dedicated manner to reach highest performance efficiency. Beyond, advanced and dedicated i) interaction with users, ii) the management of jobs, iii) accounting, and iv) billing, not only combines classic with parallel high-performance grid usage, but more importantly is also able to increase the efficiency of IT resource providers. Consequently, the mere "yes-we-can" becomes a huge opportunity like e.g. the life-science and health-care sectors as well as grid infrastructures by reaching higher level of resource efficiency.
Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer.

PubMed

Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas

2016-04-01

Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .
CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing.

PubMed

Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian

2011-08-30

Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
Exploring the Universe with WISE and Cloud Computing

NASA Technical Reports Server (NTRS)

Benford, Dominic J.

2011-01-01

WISE is a recently-completed astronomical survey mission that has imaged the entire sky in four infrared wavelength bands. The large quantity of science images returned consists of 2,776,922 individual snapshots in various locations in each band which, along with ancillary data, totals around 110TB of raw, uncompressed data. Making the most use of this data requires advanced computing resources. I will discuss some initial attempts in the use of cloud computing to make this large problem tractable.
Digital Image Access & Retrieval.

ERIC Educational Resources Information Center

Heidorn, P. Bryan, Ed.; Sandore, Beth, Ed.

Recent technological advances in computing and digital imaging technology have had immediate and permanent consequences for visual resource collections. Libraries are involved in organizing and managing large visual resource collections. The central challenges in working with digital image collections mirror those that libraries have sought to…
Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud

PubMed Central

Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew

2015-01-01

Background Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. Results We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. Conclusions This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation. PMID:26501966
Genomics Virtual Laboratory: A Practical Bioinformatics Workbench for the Cloud.

PubMed

Afgan, Enis; Sloggett, Clare; Goonasekera, Nuwan; Makunin, Igor; Benson, Derek; Crowe, Mark; Gladman, Simon; Kowsar, Yousef; Pheasant, Michael; Horst, Ron; Lonie, Andrew

2015-01-01

Analyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise. We designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic. This paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.
The OSG open facility: A sharing ecosystem

DOE PAGES

Jayatilaka, B.; Levshina, T.; Rynge, M.; ...

2015-12-23

The Open Science Grid (OSG) ties together individual experiments’ computing power, connecting their resources to create a large, robust computing grid, this computing infrastructure started primarily as a collection of sites associated with large HEP experiments such as ATLAS, CDF, CMS, and DZero. In the years since, the OSG has broadened its focus to also address the needs of other US researchers and increased delivery of Distributed High Through-put Computing (DHTC) to users from a wide variety of disciplines via the OSG Open Facility. Presently, the Open Facility delivers about 100 million computing wall hours per year to researchers whomore » are not already associated with the owners of the computing sites, this is primarily accomplished by harvesting and organizing the temporarily unused capacity (i.e. opportunistic cycles) from the sites in the OSG. Using these methods, OSG resource providers and scientists share computing hours with researchers in many other fields to enable their science, striving to make sure that these computing power used with maximal efficiency. Furthermore, we believe that expanded access to DHTC is an essential tool for scientific innovation and work continues in expanding this service.« less
HyperCard K-12: Classroom Computer Learning Special Supplement Sponsored by Apple Computer.

ERIC Educational Resources Information Center

Classroom Computer Learning, 1989

1989-01-01

Follows the development of hypertext which is the electronic movement of large amounts of text. Probes the use of the Macintosh HyperCard and its applications in education. Notes programs are stackable in the computer. Provides tool, resource, and stack directory along with tips for using HyperCard. (MVL)
The National Special Education Alliance: One Year Later.

ERIC Educational Resources Information Center

Green, Peter

1988-01-01

The National Special Education Alliance (a national network of local computer resource centers associated with Apple Computer, Inc.) consists, one year after formation, of 24 non-profit support centers staffed largely by volunteers. The NSEA now reaches more than 1000 disabled computer users each month and more growth in the future is expected.…
Two pass method and radiation interchange processing when applied to thermal-structural analysis of large space truss structures

NASA Technical Reports Server (NTRS)

Warren, Andrew H.; Arelt, Joseph E.; Lalicata, Anthony L.; Rogers, Karen M.

1993-01-01

A method of efficient and automated thermal-structural processing of very large space structures is presented. The method interfaces the finite element and finite difference techniques. It also results in a pronounced reduction of the quantity of computations, computer resources and manpower required for the task, while assuring the desired accuracy of the results.
MOLNs: A CLOUD PLATFORM FOR INTERACTIVE, REPRODUCIBLE, AND SCALABLE SPATIAL STOCHASTIC COMPUTATIONAL EXPERIMENTS IN SYSTEMS BIOLOGY USING PyURDME.

PubMed

Drawert, Brian; Trogdon, Michael; Toor, Salman; Petzold, Linda; Hellander, Andreas

2016-01-01

Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments.
Dynamic Collaboration Infrastructure for Hydrologic Science

NASA Astrophysics Data System (ADS)

Tarboton, D. G.; Idaszak, R.; Castillo, C.; Yi, H.; Jiang, F.; Jones, N.; Goodall, J. L.

2016-12-01

Data and modeling infrastructure is becoming increasingly accessible to water scientists. HydroShare is a collaborative environment that currently offers water scientists the ability to access modeling and data infrastructure in support of data intensive modeling and analysis. It supports the sharing of and collaboration around "resources" which are social objects defined to include both data and models in a structured standardized format. Users collaborate around these objects via comments, ratings, and groups. HydroShare also supports web services and cloud based computation for the execution of hydrologic models and analysis and visualization of hydrologic data. However, the quantity and variety of data and modeling infrastructure available that can be accessed from environments like HydroShare is increasing. Storage infrastructure can range from one's local PC to campus or organizational storage to storage in the cloud. Modeling or computing infrastructure can range from one's desktop to departmental clusters to national HPC resources to grid and cloud computing resources. How does one orchestrate this vast number of data and computing infrastructure without needing to correspondingly learn each new system? A common limitation across these systems is the lack of efficient integration between data transport mechanisms and the corresponding high-level services to support large distributed data and compute operations. A scientist running a hydrology model from their desktop may require processing a large collection of files across the aforementioned storage and compute resources and various national databases. To address these community challenges a proof-of-concept prototype was created integrating HydroShare with RADII (Resource Aware Data-centric collaboration Infrastructure) to provide software infrastructure to enable the comprehensive and rapid dynamic deployment of what we refer to as "collaborative infrastructure." In this presentation we discuss the results of this proof-of-concept prototype which enabled HydroShare users to readily instantiate virtual infrastructure marshaling arbitrary combinations, varieties, and quantities of distributed data and computing infrastructure in addressing big problems in hydrology.

A Development of Lightweight Grid Interface

NASA Astrophysics Data System (ADS)

Iwai, G.; Kawai, Y.; Sasaki, T.; Watase, Y.

2011-12-01

In order to help a rapid development of Grid/Cloud aware applications, we have developed API to abstract the distributed computing infrastructures based on SAGA (A Simple API for Grid Applications). SAGA, which is standardized in the OGF (Open Grid Forum), defines API specifications to access distributed computing infrastructures, such as Grid, Cloud and local computing resources. The Universal Grid API (UGAPI), which is a set of command line interfaces (CLI) and APIs, aims to offer simpler API to combine several SAGA interfaces with richer functionalities. These CLIs of the UGAPI offer typical functionalities required by end users for job management and file access to the different distributed computing infrastructures as well as local computing resources. We have also built a web interface for the particle therapy simulation and demonstrated the large scale calculation using the different infrastructures at the same time. In this paper, we would like to present how the web interface based on UGAPI and SAGA achieve more efficient utilization of computing resources over the different infrastructures with technical details and practical experiences.
On-demand provisioning of HEP compute resources on cloud sites and shared HPC centers

NASA Astrophysics Data System (ADS)

Erli, G.; Fischer, F.; Fleig, G.; Giffels, M.; Hauth, T.; Quast, G.; Schnepf, M.; Heese, J.; Leppert, K.; Arnaez de Pedro, J.; Sträter, R.

2017-10-01

This contribution reports on solutions, experiences and recent developments with the dynamic, on-demand provisioning of remote computing resources for analysis and simulation workflows. Local resources of a physics institute are extended by private and commercial cloud sites, ranging from the inclusion of desktop clusters over institute clusters to HPC centers. Rather than relying on dedicated HEP computing centers, it is nowadays more reasonable and flexible to utilize remote computing capacity via virtualization techniques or container concepts. We report on recent experience from incorporating a remote HPC center (NEMO Cluster, Freiburg University) and resources dynamically requested from the commercial provider 1&1 Internet SE into our intitute’s computing infrastructure. The Freiburg HPC resources are requested via the standard batch system, allowing HPC and HEP applications to be executed simultaneously, such that regular batch jobs run side by side to virtual machines managed via OpenStack [1]. For the inclusion of the 1&1 commercial resources, a Python API and SDK as well as the possibility to upload images were available. Large scale tests prove the capability to serve the scientific use case in the European 1&1 datacenters. The described environment at the Institute of Experimental Nuclear Physics (IEKP) at KIT serves the needs of researchers participating in the CMS and Belle II experiments. In total, resources exceeding half a million CPU hours have been provided by remote sites.
Climate@Home: Crowdsourcing Climate Change Research

NASA Astrophysics Data System (ADS)

Xu, C.; Yang, C.; Li, J.; Sun, M.; Bambacus, M.

2011-12-01

Climate change deeply impacts human wellbeing. Significant amounts of resources have been invested in building super-computers that are capable of running advanced climate models, which help scientists understand climate change mechanisms, and predict its trend. Although climate change influences all human beings, the general public is largely excluded from the research. On the other hand, scientists are eagerly seeking communication mediums for effectively enlightening the public on climate change and its consequences. The Climate@Home project is devoted to connect the two ends with an innovative solution: crowdsourcing climate computing to the general public by harvesting volunteered computing resources from the participants. A distributed web-based computing platform will be built to support climate computing, and the general public can 'plug-in' their personal computers to participate in the research. People contribute the spare computing power of their computers to run a computer model, which is used by scientists to predict climate change. Traditionally, only super-computers could handle such a large computing processing load. By orchestrating massive amounts of personal computers to perform atomized data processing tasks, investments on new super-computers, energy consumed by super-computers, and carbon release from super-computers are reduced. Meanwhile, the platform forms a social network of climate researchers and the general public, which may be leveraged to raise climate awareness among the participants. A portal is to be built as the gateway to the climate@home project. Three types of roles and the corresponding functionalities are designed and supported. The end users include the citizen participants, climate scientists, and project managers. Citizen participants connect their computing resources to the platform by downloading and installing a computing engine on their personal computers. Computer climate models are defined at the server side. Climate scientists configure computer model parameters through the portal user interface. After model configuration, scientists then launch the computing task. Next, data is atomized and distributed to computing engines that are running on citizen participants' computers. Scientists will receive notifications on the completion of computing tasks, and examine modeling results via visualization modules of the portal. Computing tasks, computing resources, and participants are managed by project managers via portal tools. A portal prototype has been built for proof of concept. Three forums have been setup for different groups of users to share information on science aspect, technology aspect, and educational outreach aspect. A facebook account has been setup to distribute messages via the most popular social networking platform. New treads are synchronized from the forums to facebook. A mapping tool displays geographic locations of the participants and the status of tasks on each client node. A group of users have been invited to test functions such as forums, blogs, and computing resource monitoring.
Formation of Virtual Organizations in Grids: A Game-Theoretic Approach

NASA Astrophysics Data System (ADS)

Carroll, Thomas E.; Grosu, Daniel

The execution of large scale grid applications requires the use of several computational resources owned by various Grid Service Providers (GSPs). GSPs must form Virtual Organizations (VOs) to be able to provide the composite resource to these applications. We consider grids as self-organizing systems composed of autonomous, self-interested GSPs that will organize themselves into VOs with every GSP having the objective of maximizing its profit. We formulate the resource composition among GSPs as a coalition formation problem and propose a game-theoretic framework based on cooperation structures to model it. Using this framework, we design a resource management system that supports the VO formation among GSPs in a grid computing system.
Distributed intrusion detection system based on grid security model

NASA Astrophysics Data System (ADS)

Su, Jie; Liu, Yahui

2008-03-01

Grid computing has developed rapidly with the development of network technology and it can solve the problem of large-scale complex computing by sharing large-scale computing resource. In grid environment, we can realize a distributed and load balance intrusion detection system. This paper first discusses the security mechanism in grid computing and the function of PKI/CA in the grid security system, then gives the application of grid computing character in the distributed intrusion detection system (IDS) based on Artificial Immune System. Finally, it gives a distributed intrusion detection system based on grid security system that can reduce the processing delay and assure the detection rates.
Role of the ATLAS Grid Information System (AGIS) in Distributed Data Analysis and Simulation

NASA Astrophysics Data System (ADS)

Anisenkov, A. V.

2018-03-01

In modern high-energy physics experiments, particular attention is paid to the global integration of information and computing resources into a unified system for efficient storage and processing of experimental data. Annually, the ATLAS experiment performed at the Large Hadron Collider at the European Organization for Nuclear Research (CERN) produces tens of petabytes raw data from the recording electronics and several petabytes of data from the simulation system. For processing and storage of such super-large volumes of data, the computing model of the ATLAS experiment is based on heterogeneous geographically distributed computing environment, which includes the worldwide LHC computing grid (WLCG) infrastructure and is able to meet the requirements of the experiment for processing huge data sets and provide a high degree of their accessibility (hundreds of petabytes). The paper considers the ATLAS grid information system (AGIS) used by the ATLAS collaboration to describe the topology and resources of the computing infrastructure, to configure and connect the high-level software systems of computer centers, to describe and store all possible parameters, control, configuration, and other auxiliary information required for the effective operation of the ATLAS distributed computing applications and services. The role of the AGIS system in the development of a unified description of the computing resources provided by grid sites, supercomputer centers, and cloud computing into a consistent information model for the ATLAS experiment is outlined. This approach has allowed the collaboration to extend the computing capabilities of the WLCG project and integrate the supercomputers and cloud computing platforms into the software components of the production and distributed analysis workload management system (PanDA, ATLAS).
Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.

PubMed

Ragothaman, Anjani; Boddu, Sairam Chowdary; Kim, Nayong; Feinstein, Wei; Brylinski, Michal; Jha, Shantenu; Kim, Joohyun

2014-01-01

While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

PubMed Central

Ragothaman, Anjani; Feinstein, Wei; Jha, Shantenu; Kim, Joohyun

2014-01-01

While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. PMID:24995285
Information Power Grid: Distributed High-Performance Computing and Large-Scale Data Management for Science and Engineering

NASA Technical Reports Server (NTRS)

Johnston, William E.; Gannon, Dennis; Nitzberg, Bill

2000-01-01

We use the term "Grid" to refer to distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. This infrastructure includes: (1) Tools for constructing collaborative, application oriented Problem Solving Environments / Frameworks (the primary user interfaces for Grids); (2) Programming environments, tools, and services providing various approaches for building applications that use aggregated computing and storage resources, and federated data sources; (3) Comprehensive and consistent set of location independent tools and services for accessing and managing dynamic collections of widely distributed resources: heterogeneous computing systems, storage systems, real-time data sources and instruments, human collaborators, and communications systems; (4) Operational infrastructure including management tools for distributed systems and distributed resources, user services, accounting and auditing, strong and location independent user authentication and authorization, and overall system security services The vision for NASA's Information Power Grid - a computing and data Grid - is that it will provide significant new capabilities to scientists and engineers by facilitating routine construction of information based problem solving environments / frameworks. Such Grids will knit together widely distributed computing, data, instrument, and human resources into just-in-time systems that can address complex and large-scale computing and data analysis problems. Examples of these problems include: (1) Coupled, multidisciplinary simulations too large for single systems (e.g., multi-component NPSS turbomachine simulation); (2) Use of widely distributed, federated data archives (e.g., simultaneous access to metrological, topological, aircraft performance, and flight path scheduling databases supporting a National Air Space Simulation systems}; (3) Coupling large-scale computing and data systems to scientific and engineering instruments (e.g., realtime interaction with experiments through real-time data analysis and interpretation presented to the experimentalist in ways that allow direct interaction with the experiment (instead of just with instrument control); (5) Highly interactive, augmented reality and virtual reality remote collaborations (e.g., Ames / Boeing Remote Help Desk providing field maintenance use of coupled video and NDI to a remote, on-line airframe structures expert who uses this data to index into detailed design databases, and returns 3D internal aircraft geometry to the field); (5) Single computational problems too large for any single system (e.g. the rotocraft reference calculation). Grids also have the potential to provide pools of resources that could be called on in extraordinary / rapid response situations (such as disaster response) because they can provide common interfaces and access mechanisms, standardized management, and uniform user authentication and authorization, for large collections of distributed resources (whether or not they normally function in concert). IPG development and deployment is addressing requirements obtained by analyzing a number of different application areas, in particular from the NASA Aero-Space Technology Enterprise. This analysis has focussed primarily on two types of users: the scientist / design engineer whose primary interest is problem solving (e.g. determining wing aerodynamic characteristics in many different operating environments), and whose primary interface to IPG will be through various sorts of problem solving frameworks. The second type of user is the tool designer: the computational scientists who convert physics and mathematics into code that can simulate the physical world. These are the two primary users of IPG, and they have rather different requirements. The results of the analysis of the needs of these two types of users provides a broad set of requirements that gives rise to a general set of required capabilities. The IPG project is intended to address all of these requirements. In some cases the required computing technology exists, and in some cases it must be researched and developed. The project is using available technology to provide a prototype set of capabilities in a persistent distributed computing testbed. Beyond this, there are required capabilities that are not immediately available, and whose development spans the range from near-term engineering development (one to two years) to much longer term R&D (three to six years). Additional information is contained in the original.
Enabling Large-Scale Biomedical Analysis in the Cloud

PubMed Central

Lin, Ying-Chih; Yu, Chin-Sheng; Lin, Yen-Jen

2013-01-01

Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable. PMID:24288665
Network gateway security method for enterprise Grid: a literature review

NASA Astrophysics Data System (ADS)

Sujarwo, A.; Tan, J.

2017-03-01

The computational Grid has brought big computational resources closer to scientists. It enables people to do a large computational job anytime and anywhere without any physical border anymore. However, the massive and spread of computer participants either as user or computational provider arise problems in security. The challenge is on how the security system, especially the one which filters data in the gateway could works in flexibility depends on the registered Grid participants. This paper surveys what people have done to approach this challenge, in order to find the better and new method for enterprise Grid. The findings of this paper is the dynamically controlled enterprise firewall to secure the Grid resources from unwanted connections with a new firewall controlling method and components.
Cognitive Model Exploration and Optimization: A New Challenge for Computational Science

DTIC Science & Technology

2010-01-01

Introduction Research in cognitive science often involves the generation and analysis of computational cognitive models to explain various...HPC) clusters and volunteer computing for large-scale computational resources. The majority of applications on the Department of Defense HPC... clusters focus on solving partial differential equations (Post, 2009). These tend to be lean, fast models with little noise. While we lack specific
Self-guaranteed measurement-based quantum computation

NASA Astrophysics Data System (ADS)

Hayashi, Masahito; Hajdušek, Michal

2018-05-01

In order to guarantee the output of a quantum computation, we usually assume that the component devices are trusted. However, when the total computation process is large, it is not easy to guarantee the whole system when we have scaling effects, unexpected noise, or unaccounted for correlations between several subsystems. If we do not trust the measurement basis or the prepared entangled state, we do need to be worried about such uncertainties. To this end, we propose a self-guaranteed protocol for verification of quantum computation under the scheme of measurement-based quantum computation where no prior-trusted devices (measurement basis or entangled state) are needed. The approach we present enables the implementation of verifiable quantum computation using the measurement-based model in the context of a particular instance of delegated quantum computation where the server prepares the initial computational resource and sends it to the client, who drives the computation by single-qubit measurements. Applying self-testing procedures, we are able to verify the initial resource as well as the operation of the quantum devices and hence the computation itself. The overhead of our protocol scales with the size of the initial resource state to the power of 4 times the natural logarithm of the initial state's size.
APL: An Alternative to the Multi-Language Environment for Education. Systems Research Memo Number Four.

ERIC Educational Resources Information Center

Lippert, Henry T.; Harris, Edward V.

The diverse requirements for computing facilities in education place heavy demands upon available resources. Although multiple or very large computers can supply such diverse needs, their cost makes them impractical for many institutions. Small computers which serve a few specific needs may be an economical answer. However, to serve operationally…
PP-SWAT: A phython-based computing software for efficient multiobjective callibration of SWAT

USDA-ARS?s Scientific Manuscript database

With enhanced data availability, distributed watershed models for large areas with high spatial and temporal resolution are increasingly used to understand water budgets and examine effects of human activities and climate change/variability on water resources. Developing parallel computing software...
Architecture Framework for Trapped-Ion Quantum Computer based on Performance Simulation Tool

NASA Astrophysics Data System (ADS)

Ahsan, Muhammad

The challenge of building scalable quantum computer lies in striking appropriate balance between designing a reliable system architecture from large number of faulty computational resources and improving the physical quality of system components. The detailed investigation of performance variation with physics of the components and the system architecture requires adequate performance simulation tool. In this thesis we demonstrate a software tool capable of (1) mapping and scheduling the quantum circuit on a realistic quantum hardware architecture with physical resource constraints, (2) evaluating the performance metrics such as the execution time and the success probability of the algorithm execution, and (3) analyzing the constituents of these metrics and visualizing resource utilization to identify system components which crucially define the overall performance. Using this versatile tool, we explore vast design space for modular quantum computer architecture based on trapped ions. We find that while success probability is uniformly determined by the fidelity of physical quantum operation, the execution time is a function of system resources invested at various layers of design hierarchy. At physical level, the number of lasers performing quantum gates, impact the latency of the fault-tolerant circuit blocks execution. When these blocks are used to construct meaningful arithmetic circuit such as quantum adders, the number of ancilla qubits for complicated non-clifford gates and entanglement resources to establish long-distance communication channels, become major performance limiting factors. Next, in order to factorize large integers, these adders are assembled into modular exponentiation circuit comprising bulk of Shor's algorithm. At this stage, the overall scaling of resource-constraint performance with the size of problem, describes the effectiveness of chosen design. By matching the resource investment with the pace of advancement in hardware technology, we find optimal designs for different types of quantum adders. Conclusively, we show that 2,048-bit Shor's algorithm can be reliably executed within the resource budget of 1.5 million qubits.
A large-scale solar dynamics observatory image dataset for computer vision applications.

PubMed

Kucuk, Ahmet; Banda, Juan M; Angryk, Rafal A

2017-01-01

The National Aeronautics Space Agency (NASA) Solar Dynamics Observatory (SDO) mission has given us unprecedented insight into the Sun's activity. By capturing approximately 70,000 images a day, this mission has created one of the richest and biggest repositories of solar image data available to mankind. With such massive amounts of information, researchers have been able to produce great advances in detecting solar events. In this resource, we compile SDO solar data into a single repository in order to provide the computer vision community with a standardized and curated large-scale dataset of several hundred thousand solar events found on high resolution solar images. This publicly available resource, along with the generation source code, will accelerate computer vision research on NASA's solar image data by reducing the amount of time spent performing data acquisition and curation from the multiple sources we have compiled. By improving the quality of the data with thorough curation, we anticipate a wider adoption and interest from the computer vision to the solar physics community.
COMPUTATIONAL RESOURCES FOR BIOFUEL FEEDSTOCK SPECIES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Buell, Carol Robin; Childs, Kevin L

2013-05-07

While current production of ethanol as a biofuel relies on starch and sugar inputs, it is anticipated that sustainable production of ethanol for biofuel use will utilize lignocellulosic feedstocks. Candidate plant species to be used for lignocellulosic ethanol production include a large number of species within the Grass, Pine and Birch plant families. For these biofuel feedstock species, there are variable amounts of genome sequence resources available, ranging from complete genome sequences (e.g. sorghum, poplar) to transcriptome data sets (e.g. switchgrass, pine). These data sets are not only dispersed in location but also disparate in content. It will be essentialmore » to leverage and improve these genomic data sets for the improvement of biofuel feedstock production. The objectives of this project were to provide computational tools and resources for data-mining genome sequence/annotation and large-scale functional genomic datasets available for biofuel feedstock species. We have created a Bioenergy Feedstock Genomics Resource that provides a web-based portal or clearing house for genomic data for plant species relevant to biofuel feedstock production. Sequence data from a total of 54 plant species are included in the Bioenergy Feedstock Genomics Resource including model plant species that permit leveraging of knowledge across taxa to biofuel feedstock species.We have generated additional computational analyses of these data, including uniform annotation, to facilitate genomic approaches to improved biofuel feedstock production. These data have been centralized in the publicly available Bioenergy Feedstock Genomics Resource (http://bfgr.plantbiology.msu.edu/).« less
CMS Connect

NASA Astrophysics Data System (ADS)

Balcas, J.; Bockelman, B.; Gardner, R., Jr.; Hurtado Anampa, K.; Jayatilaka, B.; Aftab Khan, F.; Lannon, K.; Larson, K.; Letts, J.; Marra Da Silva, J.; Mascheroni, M.; Mason, D.; Perez-Calero Yzquierdo, A.; Tiradani, A.

2017-10-01

The CMS experiment collects and analyzes large amounts of data coming from high energy particle collisions produced by the Large Hadron Collider (LHC) at CERN. This involves a huge amount of real and simulated data processing that needs to be handled in batch-oriented platforms. The CMS Global Pool of computing resources provide +100K dedicated CPU cores and another 50K to 100K CPU cores from opportunistic resources for these kind of tasks and even though production and event processing analysis workflows are already managed by existing tools, there is still a lack of support to submit final stage condor-like analysis jobs familiar to Tier-3 or local Computing Facilities users into these distributed resources in an integrated (with other CMS services) and friendly way. CMS Connect is a set of computing tools and services designed to augment existing services in the CMS Physics community focusing on these kind of condor analysis jobs. It is based on the CI-Connect platform developed by the Open Science Grid and uses the CMS GlideInWMS infrastructure to transparently plug CMS global grid resources into a virtual pool accessed via a single submission machine. This paper describes the specific developments and deployment of CMS Connect beyond the CI-Connect platform in order to integrate the service with CMS specific needs, including specific Site submission, accounting of jobs and automated reporting to standard CMS monitoring resources in an effortless way to their users.
Requirements for fault-tolerant factoring on an atom-optics quantum computer.

PubMed

Devitt, Simon J; Stephens, Ashley M; Munro, William J; Nemoto, Kae

2013-01-01

Quantum information processing and its associated technologies have reached a pivotal stage in their development, with many experiments having established the basic building blocks. Moving forward, the challenge is to scale up to larger machines capable of performing computational tasks not possible today. This raises questions that need to be urgently addressed, such as what resources these machines will consume and how large will they be. Here we estimate the resources required to execute Shor's factoring algorithm on an atom-optics quantum computer architecture. We determine the runtime and size of the computer as a function of the problem size and physical error rate. Our results suggest that once the physical error rate is low enough to allow quantum error correction, optimization to reduce resources and increase performance will come mostly from integrating algorithms and circuits within the error correction environment, rather than from improving the physical hardware.

CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

PubMed Central

2011-01-01

Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105
The Nimrod computational workbench: a case study in desktop metacomputing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abramson, D.; Sosic, R.; Foster, I.

The coordinated use of geographically distributed computers, or metacomputing, can in principle provide more accessible and cost- effective supercomputing than conventional high-performance systems. However, we lack evidence that metacomputing systems can be made easily usable, or that there exist large numbers of applications able to exploit metacomputing resources. In this paper, we present work that addresses both these concerns. The basis for this work is a system called Nimrod that provides a desktop problem-solving environment for parametric experiments. We describe how Nimrod has been extended to support the scheduling of computational resources located in a wide-area environment, and report onmore » an experiment in which Nimrod was used to schedule a large parametric study across the Australian Internet. The experiment provided both new scientific results and insights into Nimrod capabilities. We relate the results of this experiment to lessons learned from the I-WAY distributed computing experiment, and draw conclusions as to how Nimrod and I-WAY- like computing environments should be developed to support desktop metacomputing.« less
Scaling up a CMS tier-3 site with campus resources and a 100 Gb/s network connection: what could go wrong?

NASA Astrophysics Data System (ADS)

Wolf, Matthias; Woodard, Anna; Li, Wenzhao; Hurtado Anampa, Kenyi; Tovar, Benjamin; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; Thain, Douglas

2017-10-01

The University of Notre Dame (ND) CMS group operates a modest-sized Tier-3 site suitable for local, final-stage analysis of CMS data. However, through the ND Center for Research Computing (CRC), Notre Dame researchers have opportunistic access to roughly 25k CPU cores of computing and a 100 Gb/s WAN network link. To understand the limits of what might be possible in this scenario, we undertook to use these resources for a wide range of CMS computing tasks from user analysis through large-scale Monte Carlo production (including both detector simulation and data reconstruction.) We will discuss the challenges inherent in effectively utilizing CRC resources for these tasks and the solutions deployed to overcome them.
The Eukaryotic Pathogen Databases: a functional genomic resource integrating data from human and veterinary parasites.

PubMed

Harb, Omar S; Roos, David S

2015-01-01

Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.
Scheduling language and algorithm development study. Volume 1, phase 2: Design considerations for a scheduling and resource allocation system

NASA Technical Reports Server (NTRS)

Morrell, R. A.; Odoherty, R. J.; Ramsey, H. R.; Reynolds, C. C.; Willoughby, J. K.; Working, R. D.

1975-01-01

Data and analyses related to a variety of algorithms for solving typical large-scale scheduling and resource allocation problems are presented. The capabilities and deficiencies of various alternative problem solving strategies are discussed from the viewpoint of computer system design.
A Review of Enhanced Sampling Approaches for Accelerated Molecular Dynamics

NASA Astrophysics Data System (ADS)

Tiwary, Pratyush; van de Walle, Axel

Molecular dynamics (MD) simulations have become a tool of immense use and popularity for simulating a variety of systems. With the advent of massively parallel computer resources, one now routinely sees applications of MD to systems as large as hundreds of thousands to even several million atoms, which is almost the size of most nanomaterials. However, it is not yet possible to reach laboratory timescales of milliseconds and beyond with MD simulations. Due to the essentially sequential nature of time, parallel computers have been of limited use in solving this so-called timescale problem. Instead, over the years a large range of statistical mechanics based enhanced sampling approaches have been proposed for accelerating molecular dynamics, and accessing timescales that are well beyond the reach of the fastest computers. In this review we provide an overview of these approaches, including the underlying theory, typical applications, and publicly available software resources to implement them.
Megatux

DOE Office of Scientific and Technical Information (OSTI.GOV)

2012-09-25

The Megatux platform enables the emulation of large scale (multi-million node) distributed systems. In particular, it allows for the emulation of large-scale networks interconnecting a very large number of emulated computer systems. It does this by leveraging virtualization and associated technologies to allow hundreds of virtual computers to be hosted on a single moderately sized server or workstation. Virtualization technology provided by modern processors allows for multiple guest OSs to run at the same time, sharing the hardware resources. The Megatux platform can be deployed on a single PC, a small cluster of a few boxes or a large clustermore » of computers. With a modest cluster, the Megatux platform can emulate complex organizational networks. By using virtualization, we emulate the hardware, but run actual software enabling large scale without sacrificing fidelity.« less
Squid - a simple bioinformatics grid.

PubMed

Carvalho, Paulo C; Glória, Rafael V; de Miranda, Antonio B; Degrave, Wim M

2005-08-03

BLAST is a widely used genetic research tool for analysis of similarity between nucleotide and protein sequences. This paper presents a software application entitled "Squid" that makes use of grid technology. The current version, as an example, is configured for BLAST applications, but adaptation for other computing intensive repetitive tasks can be easily accomplished in the open source version. This enables the allocation of remote resources to perform distributed computing, making large BLAST queries viable without the need of high-end computers. Most distributed computing / grid solutions have complex installation procedures requiring a computer specialist, or have limitations regarding operating systems. Squid is a multi-platform, open-source program designed to "keep things simple" while offering high-end computing power for large scale applications. Squid also has an efficient fault tolerance and crash recovery system against data loss, being able to re-route jobs upon node failure and recover even if the master machine fails. Our results show that a Squid application, working with N nodes and proper network resources, can process BLAST queries almost N times faster than if working with only one computer. Squid offers high-end computing, even for the non-specialist, and is freely available at the project web site. Its open-source and binary Windows distributions contain detailed instructions and a "plug-n-play" instalation containing a pre-configured example.
ATLAS and LHC computing on CRAY

NASA Astrophysics Data System (ADS)

Sciacca, F. G.; Haug, S.; ATLAS Collaboration

2017-10-01

Access and exploitation of large scale computing resources, such as those offered by general purpose HPC centres, is one important measure for ATLAS and the other Large Hadron Collider experiments in order to meet the challenge posed by the full exploitation of the future data within the constraints of flat budgets. We report on the effort of moving the Swiss WLCG T2 computing, serving ATLAS, CMS and LHCb, from a dedicated cluster to the large Cray systems at the Swiss National Supercomputing Centre CSCS. These systems do not only offer very efficient hardware, cooling and highly competent operators, but also have large backfill potentials due to size and multidisciplinary usage and potential gains due to economy at scale. Technical solutions, performance, expected return and future plans are discussed.
Overview of ASC Capability Computing System Governance Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Doebling, Scott W.

This document contains a description of the Advanced Simulation and Computing Program's Capability Computing System Governance Model. Objectives of the Governance Model are to ensure that the capability system resources are allocated on a priority-driven basis according to the Program requirements; and to utilize ASC Capability Systems for the large capability jobs for which they were designed and procured.
AGIS: Evolution of Distributed Computing information system for ATLAS

NASA Astrophysics Data System (ADS)

Anisenkov, A.; Di Girolamo, A.; Alandes, M.; Karavakis, E.

2015-12-01

ATLAS, a particle physics experiment at the Large Hadron Collider at CERN, produces petabytes of data annually through simulation production and tens of petabytes of data per year from the detector itself. The ATLAS computing model embraces the Grid paradigm and a high degree of decentralization of computing resources in order to meet the ATLAS requirements of petabytes scale data operations. It has been evolved after the first period of LHC data taking (Run-1) in order to cope with new challenges of the upcoming Run- 2. In this paper we describe the evolution and recent developments of the ATLAS Grid Information System (AGIS), developed in order to integrate configuration and status information about resources, services and topology of the computing infrastructure used by the ATLAS Distributed Computing applications and services.
A multiresolution approach to iterative reconstruction algorithms in X-ray computed tomography.

PubMed

De Witte, Yoni; Vlassenbroeck, Jelle; Van Hoorebeke, Luc

2010-09-01

In computed tomography, the application of iterative reconstruction methods in practical situations is impeded by their high computational demands. Especially in high resolution X-ray computed tomography, where reconstruction volumes contain a high number of volume elements (several giga voxels), this computational burden prevents their actual breakthrough. Besides the large amount of calculations, iterative algorithms require the entire volume to be kept in memory during reconstruction, which quickly becomes cumbersome for large data sets. To overcome this obstacle, we present a novel multiresolution reconstruction, which greatly reduces the required amount of memory without significantly affecting the reconstructed image quality. It is shown that, combined with an efficient implementation on a graphical processing unit, the multiresolution approach enables the application of iterative algorithms in the reconstruction of large volumes at an acceptable speed using only limited resources.
Challenges and opportunities of cloud computing for atmospheric sciences

NASA Astrophysics Data System (ADS)

Pérez Montes, Diego A.; Añel, Juan A.; Pena, Tomás F.; Wallom, David C. H.

2016-04-01

Cloud computing is an emerging technological solution widely used in many fields. Initially developed as a flexible way of managing peak demand it has began to make its way in scientific research. One of the greatest advantages of cloud computing for scientific research is independence of having access to a large cyberinfrastructure to fund or perform a research project. Cloud computing can avoid maintenance expenses for large supercomputers and has the potential to 'democratize' the access to high-performance computing, giving flexibility to funding bodies for allocating budgets for the computational costs associated with a project. Two of the most challenging problems in atmospheric sciences are computational cost and uncertainty in meteorological forecasting and climate projections. Both problems are closely related. Usually uncertainty can be reduced with the availability of computational resources to better reproduce a phenomenon or to perform a larger number of experiments. Here we expose results of the application of cloud computing resources for climate modeling using cloud computing infrastructures of three major vendors and two climate models. We show how the cloud infrastructure compares in performance to traditional supercomputers and how it provides the capability to complete experiments in shorter periods of time. The monetary cost associated is also analyzed. Finally we discuss the future potential of this technology for meteorological and climatological applications, both from the point of view of operational use and research.
MOLNs: A CLOUD PLATFORM FOR INTERACTIVE, REPRODUCIBLE, AND SCALABLE SPATIAL STOCHASTIC COMPUTATIONAL EXPERIMENTS IN SYSTEMS BIOLOGY USING PyURDME

PubMed Central

Drawert, Brian; Trogdon, Michael; Toor, Salman; Petzold, Linda; Hellander, Andreas

2017-01-01

Computational experiments using spatial stochastic simulations have led to important new biological insights, but they require specialized tools and a complex software stack, as well as large and scalable compute and data analysis resources due to the large computational cost associated with Monte Carlo computational workflows. The complexity of setting up and managing a large-scale distributed computation environment to support productive and reproducible modeling can be prohibitive for practitioners in systems biology. This results in a barrier to the adoption of spatial stochastic simulation tools, effectively limiting the type of biological questions addressed by quantitative modeling. In this paper, we present PyURDME, a new, user-friendly spatial modeling and simulation package, and MOLNs, a cloud computing appliance for distributed simulation of stochastic reaction-diffusion models. MOLNs is based on IPython and provides an interactive programming platform for development of sharable and reproducible distributed parallel computational experiments. PMID:28190948
Encapsulating model complexity and landscape-scale analyses of state-and-transition simulation models: an application of ecoinformatics and juniper encroachment in sagebrush steppe ecosystems

USGS Publications Warehouse

O'Donnell, Michael

2015-01-01

State-and-transition simulation modeling relies on knowledge of vegetation composition and structure (states) that describe community conditions, mechanistic feedbacks such as fire that can affect vegetation establishment, and ecological processes that drive community conditions as well as the transitions between these states. However, as the need for modeling larger and more complex landscapes increase, a more advanced awareness of computing resources becomes essential. The objectives of this study include identifying challenges of executing state-and-transition simulation models, identifying common bottlenecks of computing resources, developing a workflow and software that enable parallel processing of Monte Carlo simulations, and identifying the advantages and disadvantages of different computing resources. To address these objectives, this study used the ApexRMS® SyncroSim software and embarrassingly parallel tasks of Monte Carlo simulations on a single multicore computer and on distributed computing systems. The results demonstrated that state-and-transition simulation models scale best in distributed computing environments, such as high-throughput and high-performance computing, because these environments disseminate the workloads across many compute nodes, thereby supporting analysis of larger landscapes, higher spatial resolution vegetation products, and more complex models. Using a case study and five different computing environments, the top result (high-throughput computing versus serial computations) indicated an approximate 96.6% decrease of computing time. With a single, multicore compute node (bottom result), the computing time indicated an 81.8% decrease relative to using serial computations. These results provide insight into the tradeoffs of using different computing resources when research necessitates advanced integration of ecoinformatics incorporating large and complicated data inputs and models. - See more at: http://aimspress.com/aimses/ch/reader/view_abstract.aspx?file_no=Environ2015030&flag=1#sthash.p1XKDtF8.dpuf
Large-scale educational telecommunications systems for the US: An analysis of educational needs and technological opportunities

NASA Technical Reports Server (NTRS)

Morgan, R. P.; Singh, J. P.; Rothenberg, D.; Robinson, B. E.

1975-01-01

The needs to be served, the subsectors in which the system might be used, the technology employed, and the prospects for future utilization of an educational telecommunications delivery system are described and analyzed. Educational subsectors are analyzed with emphasis on the current status and trends within each subsector. Issues which affect future development, and prospects for future use of media, technology, and large-scale electronic delivery within each subsector are included. Information on technology utilization is presented. Educational telecommunications services are identified and grouped into categories: public television and radio, instructional television, computer aided instruction, computer resource sharing, and information resource sharing. Technology based services, their current utilization, and factors which affect future development are stressed. The role of communications satellites in providing these services is discussed. Efforts to analyze and estimate future utilization of large-scale educational telecommunications are summarized. Factors which affect future utilization are identified. Conclusions are presented.
Cloud computing in medical imaging.

PubMed

Kagadis, George C; Kloukinas, Christos; Moore, Kevin; Philbin, Jim; Papadimitroulas, Panagiotis; Alexakos, Christos; Nagy, Paul G; Visvikis, Dimitris; Hendee, William R

2013-07-01

Over the past century technology has played a decisive role in defining, driving, and reinventing procedures, devices, and pharmaceuticals in healthcare. Cloud computing has been introduced only recently but is already one of the major topics of discussion in research and clinical settings. The provision of extensive, easily accessible, and reconfigurable resources such as virtual systems, platforms, and applications with low service cost has caught the attention of many researchers and clinicians. Healthcare researchers are moving their efforts to the cloud, because they need adequate resources to process, store, exchange, and use large quantities of medical data. This Vision 20/20 paper addresses major questions related to the applicability of advanced cloud computing in medical imaging. The paper also considers security and ethical issues that accompany cloud computing.
Job Superscheduler Architecture and Performance in Computational Grid Environments

NASA Technical Reports Server (NTRS)

Shan, Hongzhang; Oliker, Leonid; Biswas, Rupak

2003-01-01

Computational grids hold great promise in utilizing geographically separated heterogeneous resources to solve large-scale complex scientific problems. However, a number of major technical hurdles, including distributed resource management and effective job scheduling, stand in the way of realizing these gains. In this paper, we propose a novel grid superscheduler architecture and three distributed job migration algorithms. We also model the critical interaction between the superscheduler and autonomous local schedulers. Extensive performance comparisons with ideal, central, and local schemes using real workloads from leading computational centers are conducted in a simulation environment. Additionally, synthetic workloads are used to perform a detailed sensitivity analysis of our superscheduler. Several key metrics demonstrate that substantial performance gains can be achieved via smart superscheduling in distributed computational grids.
Optimize Resources and Help Reduce Cost of Ownership with Dell[TM] Systems Management

ERIC Educational Resources Information Center

Technology & Learning, 2008

2008-01-01

Maintaining secure, convenient administration of the PC system environment can be a significant drain on resources. Deskside visits can greatly increase the cost of supporting a large number of computers. Even simple tasks, such as tracking inventory or updating software, quickly become expensive when they require physically visiting every…
Towards a Scalable and Adaptive Application Support Platform for Large-Scale Distributed E-Sciences in High-Performance Network Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Chase Qishi; Zhu, Michelle Mengxia

The advent of large-scale collaborative scientific applications has demonstrated the potential for broad scientific communities to pool globally distributed resources to produce unprecedented data acquisition, movement, and analysis. System resources including supercomputers, data repositories, computing facilities, network infrastructures, storage systems, and display devices have been increasingly deployed at national laboratories and academic institutes. These resources are typically shared by large communities of users over Internet or dedicated networks and hence exhibit an inherent dynamic nature in their availability, accessibility, capacity, and stability. Scientific applications using either experimental facilities or computation-based simulations with various physical, chemical, climatic, and biological models featuremore » diverse scientific workflows as simple as linear pipelines or as complex as a directed acyclic graphs, which must be executed and supported over wide-area networks with massively distributed resources. Application users oftentimes need to manually configure their computing tasks over networks in an ad hoc manner, hence significantly limiting the productivity of scientists and constraining the utilization of resources. The success of these large-scale distributed applications requires a highly adaptive and massively scalable workflow platform that provides automated and optimized computing and networking services. This project is to design and develop a generic Scientific Workflow Automation and Management Platform (SWAMP), which contains a web-based user interface specially tailored for a target application, a set of user libraries, and several easy-to-use computing and networking toolkits for application scientists to conveniently assemble, execute, monitor, and control complex computing workflows in heterogeneous high-performance network environments. SWAMP will enable the automation and management of the entire process of scientific workflows with the convenience of a few mouse clicks while hiding the implementation and technical details from end users. Particularly, we will consider two types of applications with distinct performance requirements: data-centric and service-centric applications. For data-centric applications, the main workflow task involves large-volume data generation, catalog, storage, and movement typically from supercomputers or experimental facilities to a team of geographically distributed users; while for service-centric applications, the main focus of workflow is on data archiving, preprocessing, filtering, synthesis, visualization, and other application-specific analysis. We will conduct a comprehensive comparison of existing workflow systems and choose the best suited one with open-source code, a flexible system structure, and a large user base as the starting point for our development. Based on the chosen system, we will develop and integrate new components including a black box design of computing modules, performance monitoring and prediction, and workflow optimization and reconfiguration, which are missing from existing workflow systems. A modular design for separating specification, execution, and monitoring aspects will be adopted to establish a common generic infrastructure suited for a wide spectrum of science applications. We will further design and develop efficient workflow mapping and scheduling algorithms to optimize the workflow performance in terms of minimum end-to-end delay, maximum frame rate, and highest reliability. We will develop and demonstrate the SWAMP system in a local environment, the grid network, and the 100Gpbs Advanced Network Initiative (ANI) testbed. The demonstration will target scientific applications in climate modeling and high energy physics and the functions to be demonstrated include workflow deployment, execution, steering, and reconfiguration. Throughout the project period, we will work closely with the science communities in the fields of climate modeling and high energy physics including Spallation Neutron Source (SNS) and Large Hadron Collider (LHC) projects to mature the system for production use.« less

Template Interfaces for Agile Parallel Data-Intensive Science

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ramakrishnan, Lavanya; Gunter, Daniel; Pastorello, Gilerto Z.

Tigres provides a programming library to compose and execute large-scale data-intensive scientific workflows from desktops to supercomputers. DOE User Facilities and large science collaborations are increasingly generating large enough data sets that it is no longer practical to download them to a desktop to operate on them. They are instead stored at centralized compute and storage resources such as high performance computing (HPC) centers. Analysis of this data requires an ability to run on these facilities, but with current technologies, scaling an analysis to an HPC center and to a large data set is difficult even for experts. Tigres ismore » addressing the challenge of enabling collaborative analysis of DOE Science data through a new concept of reusable "templates" that enable scientists to easily compose, run and manage collaborative computational tasks. These templates define common computation patterns used in analyzing a data set.« less
The Influence of Large-Scale Computing on Aircraft Structural Design.

DTIC Science & Technology

1986-04-01

the customer in the most cost- effective manner. Computer facility organizations became computer resource power brokers. A good data processing...capabilities generated on other processors can be easily used. This approach is easily implementable and provides a good strategy for using existing...assistance to member nations for the purpose of increasing their scientific and technical potential; - Recommending effective ways for the member nations to
Merging the Machines of Modern Science

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolf, Laura; Collins, Jim

Two recent projects have harnessed supercomputing resources at the US Department of Energy’s Argonne National Laboratory in a novel way to support major fusion science and particle collider experiments. Using leadership computing resources, one team ran fine-grid analysis of real-time data to make near-real-time adjustments to an ongoing experiment, while a second team is working to integrate Argonne’s supercomputers into the Large Hadron Collider/ATLAS workflow. Together these efforts represent a new paradigm of the high-performance computing center as a partner in experimental science.
Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing.

PubMed

Li, Hao; Yu, Di; Kumar, Anand; Tu, Yi-Cheng

2014-10-01

Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA's CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream . Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels.
Cloud-based crowd sensing: a framework for location-based crowd analyzer and advisor

NASA Astrophysics Data System (ADS)

Aishwarya, K. C.; Nambi, A.; Hudson, S.; Nadesh, R. K.

2017-11-01

Cloud computing is an emerging field of computer science to integrate and explore large and powerful computing systems and storages for personal and also for enterprise requirements. Mobile Cloud Computing is the inheritance of this concept towards mobile hand-held devices. Crowdsensing, or to be precise, Mobile Crowdsensing is the process of sharing resources from an available group of mobile handheld devices that support sharing of different resources such as data, memory and bandwidth to perform a single task for collective reasons. In this paper, we propose a framework to use Crowdsensing and perform a crowd analyzer and advisor whether the user can go to the place or not. This is an ongoing research and is a new concept to which the direction of cloud computing has shifted and is viable for more expansion in the near future.
Designing for Ab Initio Blended Learning Environments: Identifying Systemic Contradictions

ERIC Educational Resources Information Center

Ó Doinn, Oisín

2017-01-01

In recent years, Computer Assisted Language Learning (CALL) has become more accessible than ever before. This is largely due to the proliferation of mobile computing devices and the growth of open online language-learning resources. Additionally, since the beginning of the millennium there has been massive growth in the number of students studying…
Balancing Computer Resources with Institutional Needs. AIR Forum Paper 1978.

ERIC Educational Resources Information Center

McLaughlin, Gerald W.; And Others

To estimate computer needs at a higher education institution, the major types of users and their future needs should be determined. In a large or complex university, three major groups of users are typically instructional, research, and administrative. After collecting information on the needs of these users, the needs can be translated into…
A Pipeline for Large Data Processing Using Regular Sampling for Unstructured Grids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berres, Anne Sabine; Adhinarayanan, Vignesh; Turton, Terece

2017-05-12

Large simulation data requires a lot of time and computational resources to compute, store, analyze, visualize, and run user studies. Today, the largest cost of a supercomputer is not hardware but maintenance, in particular energy consumption. Our goal is to balance energy consumption and cognitive value of visualizations of resulting data. This requires us to go through the entire processing pipeline, from simulation to user studies. To reduce the amount of resources, data can be sampled or compressed. While this adds more computation time, the computational overhead is negligible compared to the simulation time. We built a processing pipeline atmore » the example of regular sampling. The reasons for this choice are two-fold: using a simple example reduces unnecessary complexity as we know what to expect from the results. Furthermore, it provides a good baseline for future, more elaborate sampling methods. We measured time and energy for each test we did, and we conducted user studies in Amazon Mechanical Turk (AMT) for a range of different results we produced through sampling.« less
A Stream Tilling Approach to Surface Area Estimation for Large Scale Spatial Data in a Shared Memory System

NASA Astrophysics Data System (ADS)

Liu, Jiping; Kang, Xiaochen; Dong, Chun; Xu, Shenghua

2017-12-01

Surface area estimation is a widely used tool for resource evaluation in the physical world. When processing large scale spatial data, the input/output (I/O) can easily become the bottleneck in parallelizing the algorithm due to the limited physical memory resources and the very slow disk transfer rate. In this paper, we proposed a stream tilling approach to surface area estimation that first decomposed a spatial data set into tiles with topological expansions. With these tiles, the one-to-one mapping relationship between the input and the computing process was broken. Then, we realized a streaming framework towards the scheduling of the I/O processes and computing units. Herein, each computing unit encapsulated a same copy of the estimation algorithm, and multiple asynchronous computing units could work individually in parallel. Finally, the performed experiment demonstrated that our stream tilling estimation can efficiently alleviate the heavy pressures from the I/O-bound work, and the measured speedup after being optimized have greatly outperformed the directly parallel versions in shared memory systems with multi-core processors.
Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU

PubMed Central

Xia, Yong; Zhang, Henggui

2015-01-01

Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations. PMID:26581957
Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU.

PubMed

Xia, Yong; Wang, Kuanquan; Zhang, Henggui

2015-01-01

Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations.
The application of the large particles method of numerical modeling of the process of carbonic nanostructures synthesis in plasma

NASA Astrophysics Data System (ADS)

Abramov, G. V.; Gavrilov, A. N.

2018-03-01

The article deals with the numerical solution of the mathematical model of the particles motion and interaction in multicomponent plasma by the example of electric arc synthesis of carbon nanostructures. The high order of the particles and the number of their interactions requires a significant input of machine resources and time for calculations. Application of the large particles method makes it possible to reduce the amount of computation and the requirements for hardware resources without affecting the accuracy of numerical calculations. The use of technology of GPGPU parallel computing using the Nvidia CUDA technology allows organizing all General purpose computation on the basis of the graphical processor graphics card. The comparative analysis of different approaches to parallelization of computations to speed up calculations with the choice of the algorithm in which to calculate the accuracy of the solution shared memory is used. Numerical study of the influence of particles density in the macro particle on the motion parameters and the total number of particle collisions in the plasma for different modes of synthesis has been carried out. The rational range of the coherence coefficient of particle in the macro particle is computed.
Aggregating Data for Computational Toxicology Applications: The U.S. Environmental Protection Agency (EPA) Aggregated Computational Toxicology Resource (ACToR) System

PubMed Central

Judson, Richard S.; Martin, Matthew T.; Egeghy, Peter; Gangwal, Sumit; Reif, David M.; Kothiya, Parth; Wolf, Maritja; Cathey, Tommy; Transue, Thomas; Smith, Doris; Vail, James; Frame, Alicia; Mosher, Shad; Cohen Hubal, Elaine A.; Richard, Ann M.

2012-01-01

Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built using open source tools and is freely available to download. This review describes the organization of the data repository and provides selected examples of use cases. PMID:22408426
Aggregating data for computational toxicology applications: The U.S. Environmental Protection Agency (EPA) Aggregated Computational Toxicology Resource (ACToR) System.

PubMed

Judson, Richard S; Martin, Matthew T; Egeghy, Peter; Gangwal, Sumit; Reif, David M; Kothiya, Parth; Wolf, Maritja; Cathey, Tommy; Transue, Thomas; Smith, Doris; Vail, James; Frame, Alicia; Mosher, Shad; Cohen Hubal, Elaine A; Richard, Ann M

2012-01-01

Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built using open source tools and is freely available to download. This review describes the organization of the data repository and provides selected examples of use cases.
The performance of low-cost commercial cloud computing as an alternative in computational chemistry.

PubMed

Thackston, Russell; Fortenberry, Ryan C

2015-05-05

The growth of commercial cloud computing (CCC) as a viable means of computational infrastructure is largely unexplored for the purposes of quantum chemistry. In this work, the PSI4 suite of computational chemistry programs is installed on five different types of Amazon World Services CCC platforms. The performance for a set of electronically excited state single-point energies is compared between these CCC platforms and typical, "in-house" physical machines. Further considerations are made for the number of cores or virtual CPUs (vCPUs, for the CCC platforms), but no considerations are made for full parallelization of the program (even though parallelization of the BLAS library is implemented), complete high-performance computing cluster utilization, or steal time. Even with this most pessimistic view of the computations, CCC resources are shown to be more cost effective for significant numbers of typical quantum chemistry computations. Large numbers of large computations are still best utilized by more traditional means, but smaller-scale research may be more effectively undertaken through CCC services. © 2015 Wiley Periodicals, Inc.
Unified, Cross-Platform, Open-Source Library Package for High-Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kozacik, Stephen

Compute power is continually increasing, but this increased performance is largely found in sophisticated computing devices and supercomputer resources that are difficult to use, resulting in under-utilization. We developed a unified set of programming tools that will allow users to take full advantage of the new technology by allowing them to work at a level abstracted away from the platform specifics, encouraging the use of modern computing systems, including government-funded supercomputer facilities.
The Next Frontier in Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sarrao, John

2016-11-16

Exascale computing refers to computing systems capable of at least one exaflop or a billion calculations per second (1018). That is 50 times faster than the most powerful supercomputers being used today and represents a thousand-fold increase over the first petascale computer that came into operation in 2008. How we use these large-scale simulation resources is the key to solving some of today’s most pressing problems, including clean energy production, nuclear reactor lifetime extension and nuclear stockpile aging.
High Performance Geostatistical Modeling of Biospheric Resources

NASA Astrophysics Data System (ADS)

Pedelty, J. A.; Morisette, J. T.; Smith, J. A.; Schnase, J. L.; Crosier, C. S.; Stohlgren, T. J.

2004-12-01

We are using parallel geostatistical codes to study spatial relationships among biospheric resources in several study areas. For example, spatial statistical models based on large- and small-scale variability have been used to predict species richness of both native and exotic plants (hot spots of diversity) and patterns of exotic plant invasion. However, broader use of geostastics in natural resource modeling, especially at regional and national scales, has been limited due to the large computing requirements of these applications. To address this problem, we implemented parallel versions of the kriging spatial interpolation algorithm. The first uses the Message Passing Interface (MPI) in a master/slave paradigm on an open source Linux Beowulf cluster, while the second is implemented with the new proprietary Xgrid distributed processing system on an Xserve G5 cluster from Apple Computer, Inc. These techniques are proving effective and provide the basis for a national decision support capability for invasive species management that is being jointly developed by NASA and the US Geological Survey.
Exploiting volatile opportunistic computing resources with Lobster

NASA Astrophysics Data System (ADS)

Woodard, Anna; Wolf, Matthias; Mueller, Charles; Tovar, Ben; Donnelly, Patrick; Hurtado Anampa, Kenyi; Brenner, Paul; Lannon, Kevin; Hildreth, Mike; Thain, Douglas

2015-12-01

Analysis of high energy physics experiments using the Compact Muon Solenoid (CMS) at the Large Hadron Collider (LHC) can be limited by availability of computing resources. As a joint effort involving computer scientists and CMS physicists at Notre Dame, we have developed an opportunistic workflow management tool, Lobster, to harvest available cycles from university campus computing pools. Lobster consists of a management server, file server, and worker processes which can be submitted to any available computing resource without requiring root access. Lobster makes use of the Work Queue system to perform task management, while the CMS specific software environment is provided via CVMFS and Parrot. Data is handled via Chirp and Hadoop for local data storage and XrootD for access to the CMS wide-area data federation. An extensive set of monitoring and diagnostic tools have been developed to facilitate system optimisation. We have tested Lobster using the 20 000-core cluster at Notre Dame, achieving approximately 8-10k tasks running simultaneously, sustaining approximately 9 Gbit/s of input data and 340 Mbit/s of output data.
An Adaptive Priority Tuning System for Optimized Local CPU Scheduling using BOINC Clients

NASA Astrophysics Data System (ADS)

Mnaouer, Adel B.; Ragoonath, Colin

2010-11-01

Volunteer Computing (VC) is a Distributed Computing model which utilizes idle CPU cycles from computing resources donated by volunteers who are connected through the Internet to form a very large-scale, loosely coupled High Performance Computing environment. Distributed Volunteer Computing environments such as the BOINC framework is concerned mainly with the efficient scheduling of the available resources to the applications which require them. The BOINC framework thus contains a number of scheduling policies/algorithms both on the server-side and on the client which work together to maximize the available resources and to provide a degree of QoS in an environment which is highly volatile. This paper focuses on the BOINC client and introduces an adaptive priority tuning client side middleware application which improves the execution times of Work Units (WUs) while maintaining an acceptable Maximum Response Time (MRT) for the end user. We have conducted extensive experimentation of the proposed system and the results show clear speedup of BOINC applications using our optimized middleware as opposed to running using the original BOINC client.

Design Tools for Evaluating Multiprocessor Programs

DTIC Science & Technology

1976-07-01

than large uniprocessing machines, and 2. economies of scale in manufacturing. Perhaps the most compelling reason (possibly a consequence of the...speed, redundancy, (inefficiency, resource utilization, and economies of the components. [Browne 73, Lehman 66] 6. How can the system be scheduled...mejsures are interesting about the computation? Somn may be: speed, redundancy, (inefficiency, resource utilization, and economies of the components
Allocation model for firefighting resources ... a progress report

Treesearch

Frederick W. Bratten

1970-01-01

A study is underway at the Pacific Southwest Forest and Range Experiment Station to develop computer techniques for planning suppression efforts in large wildfires. A mathematical model for allocation of firefighting resources in a going fire has been developed. Explicit definitions are given for strategic and tactical planning functions. How the model might be used is...
WebGIS based on semantic grid model and web services

NASA Astrophysics Data System (ADS)

Zhang, WangFei; Yue, CaiRong; Gao, JianGuo

2009-10-01

As the combination point of the network technology and GIS technology, WebGIS has got the fast development in recent years. With the restriction of Web and the characteristics of GIS, traditional WebGIS has some prominent problems existing in development. For example, it can't accomplish the interoperability of heterogeneous spatial databases; it can't accomplish the data access of cross-platform. With the appearance of Web Service and Grid technology, there appeared great change in field of WebGIS. Web Service provided an interface which can give information of different site the ability of data sharing and inter communication. The goal of Grid technology was to make the internet to a large and super computer, with this computer we can efficiently implement the overall sharing of computing resources, storage resource, data resource, information resource, knowledge resources and experts resources. But to WebGIS, we only implement the physically connection of data and information and these is far from the enough. Because of the different understanding of the world, following different professional regulations, different policies and different habits, the experts in different field will get different end when they observed the same geographic phenomenon and the semantic heterogeneity produced. Since these there are large differences to the same concept in different field. If we use the WebGIS without considering of the semantic heterogeneity, we will answer the questions users proposed wrongly or we can't answer the questions users proposed. To solve this problem, this paper put forward and experienced an effective method of combing semantic grid and Web Services technology to develop WebGIS. In this paper, we studied the method to construct ontology and the method to combine Grid technology and Web Services and with the detailed analysis of computing characteristics and application model in the distribution of data, we designed the WebGIS query system driven by ontology based on Grid technology and Web Services.
Managing the CMS Data and Monte Carlo Processing during LHC Run 2

NASA Astrophysics Data System (ADS)

Wissing, C.; CMS Collaboration

2017-10-01

In order to cope with the challenges expected during the LHC Run 2 CMS put in a number of enhancements into the main software packages and the tools used for centrally managed processing. In the presentation we will highlight these improvements that allow CMS to deal with the increased trigger output rate, the increased pileup and the evolution in computing technology. The overall system aims at high flexibility, improved operational flexibility and largely automated procedures. The tight coupling of workflow classes to types of sites has been drastically relaxed. Reliable and high-performing networking between most of the computing sites and the successful deployment of a data-federation allow the execution of workflows using remote data access. That required the development of a largely automatized system to assign workflows and to handle necessary pre-staging of data. Another step towards flexibility has been the introduction of one large global HTCondor Pool for all types of processing workflows and analysis jobs. Besides classical Grid resources also some opportunistic resources as well as Cloud resources have been integrated into that Pool, which gives reach to more than 200k CPU cores.
Rich client data exploration and research prototyping for NOAA

NASA Astrophysics Data System (ADS)

Grossberg, Michael; Gladkova, Irina; Guch, Ingrid; Alabi, Paul; Shahriar, Fazlul; Bonev, George; Aizenman, Hannah

2009-08-01

Data from satellites and model simulations is increasing exponentially as observations and model computing power improve rapidly. Not only is technology producing more data, but it often comes from sources all over the world. Researchers and scientists who must collaborate are also located globally. This work presents a software design and technologies which will make it possible for groups of researchers to explore large data sets visually together without the need to download these data sets locally. The design will also make it possible to exploit high performance computing remotely and transparently to analyze and explore large data sets. Computer power, high quality sensing, and data storage capacity have improved at a rate that outstrips our ability to develop software applications that exploit these resources. It is impractical for NOAA scientists to download all of the satellite and model data that may be relevant to a given problem and the computing environments available to a given researcher range from supercomputers to only a web browser. The size and volume of satellite and model data are increasing exponentially. There are at least 50 multisensor satellite platforms collecting Earth science data. On the ground and in the sea there are sensor networks, as well as networks of ground based radar stations, producing a rich real-time stream of data. This new wealth of data would have limited use were it not for the arrival of large-scale high-performance computation provided by parallel computers, clusters, grids, and clouds. With these computational resources and vast archives available, it is now possible to analyze subtle relationships which are global, multi-modal and cut across many data sources. Researchers, educators, and even the general public, need tools to access, discover, and use vast data center archives and high performance computing through a simple yet flexible interface.
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

PubMed

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
The Next Frontier in Computing

ScienceCinema

Sarrao, John

2018-06-13

Exascale computing refers to computing systems capable of at least one exaflop or a billion calculations per second (1018). That is 50 times faster than the most powerful supercomputers being used today and represents a thousand-fold increase over the first petascale computer that came into operation in 2008. How we use these large-scale simulation resources is the key to solving some of todayâs most pressing problems, including clean energy production, nuclear reactor lifetime extension and nuclear stockpile aging.
Conceptual Framework for Using Computers to Enhance Employee Engagement in Large Offices

ERIC Educational Resources Information Center

Gill, Rob

2010-01-01

Using computers to engage with staff members on their organization's Employer of Choice (EOC) program as part of a human resource development (HRD) framework can add real value to that organization's reputation. EOC is an evolving principle for Australian business. It reflects the value and importance organizations place on their key stakeholders,…
Translational bioinformatics in the cloud: an affordable alternative

PubMed Central

2010-01-01

With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine. PMID:20691073
RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization

PubMed Central

Chen, Qingkui; Zhao, Deyu; Wang, Jingjuan

2017-01-01

This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes’ diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services. PMID:28777325
RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization.

PubMed

Fang, Yuling; Chen, Qingkui; Xiong, Neal N; Zhao, Deyu; Wang, Jingjuan

2017-08-04

This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes' diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services.
Large-scale virtual screening on public cloud resources with Apache Spark.

PubMed

Capuccini, Marco; Ahmed, Laeeq; Schaal, Wesley; Laure, Erwin; Spjuth, Ola

2017-01-01

Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure rate hardware and fast network connection. Google's MapReduce revolutionized large-scale analysis, enabling the processing of massive datasets on commodity hardware and cloud resources, providing transparent scalability and fault tolerance at the software level. Open source implementations of MapReduce include Apache Hadoop and the more recent Apache Spark. We developed a method to run existing docking-based screening software on distributed cloud resources, utilizing the MapReduce approach. We benchmarked our method, which is implemented in Apache Spark, docking a publicly available target receptor against [Formula: see text]2.2 M compounds. The performance experiments show a good parallel efficiency (87%) when running in a public cloud environment. Our method enables parallel Structure-based virtual screening on public cloud resources or commodity computer clusters. The degree of scalability that we achieve allows for trying out our method on relatively small libraries first and then to scale to larger libraries. Our implementation is named Spark-VS and it is freely available as open source from GitHub (https://github.com/mcapuccini/spark-vs).Graphical abstract.
CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community

PubMed Central

Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J.; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius

2016-01-01

The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data. PMID:28785418
CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community.

PubMed

Connor, Thomas R; Loman, Nicholas J; Thompson, Simon; Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius; Sheppard, Samuel K; Pallen, Mark J

2016-09-01

The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data.
The Application of Large-Scale Hypermedia Information Systems to Training.

ERIC Educational Resources Information Center

Crowder, Richard; And Others

1995-01-01

Discusses the use of hypermedia in electronic information systems that support maintenance operations in large-scale industrial plants. Findings show that after establishing an information system, the same resource base can be used to train personnel how to use the computer system and how to perform operational and maintenance tasks. (Author/JMV)
Modeling of the Global Water Cycle - Analytical Models

Treesearch

Yongqiang Liu; Roni Avissar

2005-01-01

Both numerical and analytical models of coupled atmosphere and its underlying ground components (land, ocean, ice) are useful tools for modeling the global and regional water cycle. Unlike complex three-dimensional climate models, which need very large computing resources and involve a large number of complicated interactions often difficult to interpret, analytical...
Accessing and visualizing scientific spatiotemporal data

NASA Technical Reports Server (NTRS)

Katz, Daniel S.; Bergou, Attila; Berriman, G. Bruce; Block, Gary L.; Collier, Jim; Curkendall, David W.; Good, John; Husman, Laura; Jacob, Joseph C.; Laity, Anastasia;

2004-01-01

This paper discusses work done by JPL's Parallel Applications Technologies Group in helping scientists access and visualize very large data sets through the use of multiple computing resources, such as parallel supercomputers, clusters, and grids.

Quantum Search in Hilbert Space

NASA Technical Reports Server (NTRS)

Zak, Michail

2003-01-01

A proposed quantum-computing algorithm would perform a search for an item of information in a database stored in a Hilbert-space memory structure. The algorithm is intended to make it possible to search relatively quickly through a large database under conditions in which available computing resources would otherwise be considered inadequate to perform such a task. The algorithm would apply, more specifically, to a relational database in which information would be stored in a set of N complex orthonormal vectors, each of N dimensions (where N can be exponentially large). Each vector would constitute one row of a unitary matrix, from which one would derive the Hamiltonian operator (and hence the evolutionary operator) of a quantum system. In other words, all the stored information would be mapped onto a unitary operator acting on a quantum state that would represent the item of information to be retrieved. Then one could exploit quantum parallelism: one could pose all search queries simultaneously by performing a quantum measurement on the system. In so doing, one would effectively solve the search problem in one computational step. One could exploit the direct- and inner-product decomposability of the unitary matrix to make the dimensionality of the memory space exponentially large by use of only linear resources. However, inasmuch as the necessary preprocessing (the mapping of the stored information into a Hilbert space) could be exponentially expensive, the proposed algorithm would likely be most beneficial in applications in which the resources available for preprocessing were much greater than those available for searching.
HPC on Competitive Cloud Resources

NASA Astrophysics Data System (ADS)

Bientinesi, Paolo; Iakymchuk, Roman; Napper, Jeff

Computing as a utility has reached the mainstream. Scientists can now easily rent time on large commercial clusters that can be expanded and reduced on-demand in real-time. However, current commercial cloud computing performance falls short of systems specifically designed for scientific applications. Scientific computing needs are quite different from those of the web applications that have been the focus of cloud computing vendors. In this chapter we demonstrate through empirical evaluation the computational efficiency of high-performance numerical applications in a commercial cloud environment when resources are shared under high contention. Using the Linpack benchmark as a case study, we show that cache utilization becomes highly unpredictable and similarly affects computation time. For some problems, not only is it more efficient to underutilize resources, but the solution can be reached sooner in realtime (wall-time). We also show that the smallest, cheapest (64-bit) instance on the studied environment is the best for price to performance ration. In light of the high-contention we witness, we believe that alternative definitions of efficiency for commercial cloud environments should be introduced where strong performance guarantees do not exist. Concepts like average, expected performance and execution time, expected cost to completion, and variance measures--traditionally ignored in the high-performance computing context--now should complement or even substitute the standard definitions of efficiency.
Exploiting short-term memory in soft body dynamics as a computational resource

PubMed Central

Nakajima, K.; Li, T.; Hauser, H.; Pfeifer, R.

2014-01-01

Soft materials are not only highly deformable, but they also possess rich and diverse body dynamics. Soft body dynamics exhibit a variety of properties, including nonlinearity, elasticity and potentially infinitely many degrees of freedom. Here, we demonstrate that such soft body dynamics can be employed to conduct certain types of computation. Using body dynamics generated from a soft silicone arm, we show that they can be exploited to emulate functions that require memory and to embed robust closed-loop control into the arm. Our results suggest that soft body dynamics have a short-term memory and can serve as a computational resource. This finding paves the way towards exploiting passive body dynamics for control of a large class of underactuated systems. PMID:25185579

Climate simulations and services on HPC, Cloud and Grid infrastructures

NASA Astrophysics Data System (ADS)

Cofino, Antonio S.; Blanco, Carlos; Minondo Tshuma, Antonio

2017-04-01

Cloud, Grid and High Performance Computing have changed the accessibility and availability of computing resources for Earth Science research communities, specially for Climate community. These paradigms are modifying the way how climate applications are being executed. By using these technologies the number, variety and complexity of experiments and resources are increasing substantially. But, although computational capacity is increasing, traditional applications and tools used by the community are not good enough to manage this large volume and variety of experiments and computing resources. In this contribution, we evaluate the challenges to run climate simulations and services on Grid, Cloud and HPC infrestructures and how to tackle them. The Grid and Cloud infrastructures provided by EGI's VOs ( esr , earth.vo.ibergrid and fedcloud.egi.eu) will be evaluated, as well as HPC resources from PRACE infrastructure and institutional clusters. To solve those challenges, solutions using DRM4G framework will be shown. DRM4G provides a good framework to manage big volume and variety of computing resources for climate experiments. This work has been supported by the Spanish National R&D Plan under projects WRF4G (CGL2011-28864), INSIGNIA (CGL2016-79210-R) and MULTI-SDM (CGL2015-66583-R) ; the IS-ENES2 project from the 7FP of the European Commission (grant agreement no. 312979); the European Regional Development Fund—ERDF and the Programa de Personal Investigador en Formación Predoctoral from Universidad de Cantabria and Government of Cantabria.
A Parallel Sliding Region Algorithm to Make Agent-Based Modeling Possible for a Large-Scale Simulation: Modeling Hepatitis C Epidemics in Canada.

PubMed

Wong, William W L; Feng, Zeny Z; Thein, Hla-Hla

2016-11-01

Agent-based models (ABMs) are computer simulation models that define interactions among agents and simulate emergent behaviors that arise from the ensemble of local decisions. ABMs have been increasingly used to examine trends in infectious disease epidemiology. However, the main limitation of ABMs is the high computational cost for a large-scale simulation. To improve the computational efficiency for large-scale ABM simulations, we built a parallelizable sliding region algorithm (SRA) for ABM and compared it to a nonparallelizable ABM. We developed a complex agent network and performed two simulations to model hepatitis C epidemics based on the real demographic data from Saskatchewan, Canada. The first simulation used the SRA that processed on each postal code subregion subsequently. The second simulation processed the entire population simultaneously. It was concluded that the parallelizable SRA showed computational time saving with comparable results in a province-wide simulation. Using the same method, SRA can be generalized for performing a country-wide simulation. Thus, this parallel algorithm enables the possibility of using ABM for large-scale simulation with limited computational resources.
The SGI/CRAY T3E: Experiences and Insights

NASA Technical Reports Server (NTRS)

Bernard, Lisa Hamet

1999-01-01

The focus of the HPCC Earth and Space Sciences (ESS) Project is capability computing - pushing highly scalable computing testbeds to their performance limits. The drivers of this focus are the Grand Challenge problems in Earth and space science: those that could not be addressed in a capacity computing environment where large jobs must continually compete for resources. These Grand Challenge codes require a high degree of communication, large memory, and very large I/O (throughout the duration of the processing, not just in loading initial conditions and saving final results). This set of parameters led to the selection of an SGI/Cray T3E as the current ESS Computing Testbed. The T3E at the Goddard Space Flight Center is a unique computational resource within NASA. As such, it must be managed to effectively support the diverse research efforts across the NASA research community yet still enable the ESS Grand Challenge Investigator teams to achieve their performance milestones, for which the system was intended. To date, all Grand Challenge Investigator teams have achieved the 10 GFLOPS milestone, eight of nine have achieved the 50 GFLOPS milestone, and three have achieved the 100 GFLOPS milestone. In addition, many technical papers have been published highlighting results achieved on the NASA T3E, including some at this Workshop. The successes enabled by the NASA T3E computing environment are best illustrated by the 512 PE upgrade funded by the NASA Earth Science Enterprise earlier this year. Never before has an HPCC computing testbed been so well received by the general NASA science community that it was deemed critical to the success of a core NASA science effort. NASA looks forward to many more success stories before the conclusion of the NASA-SGI/Cray cooperative agreement in June 1999.
Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing

PubMed Central

Li, Hao; Yu, Di; Kumar, Anand; Tu, Yi-Cheng

2015-01-01

Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA’s CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream. Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels. PMID:26566545
Why build a virtual brain? Large-scale neural simulations as jump start for cognitive computing

NASA Astrophysics Data System (ADS)

Colombo, Matteo

2017-03-01

Despite the impressive amount of financial resources recently invested in carrying out large-scale brain simulations, it is controversial what the pay-offs are of pursuing this project. One idea is that from designing, building, and running a large-scale neural simulation, scientists acquire knowledge about the computational performance of the simulating system, rather than about the neurobiological system represented in the simulation. It has been claimed that this knowledge may usher in a new era of neuromorphic, cognitive computing systems. This study elucidates this claim and argues that the main challenge this era is facing is not the lack of biological realism. The challenge lies in identifying general neurocomputational principles for the design of artificial systems, which could display the robust flexibility characteristic of biological intelligence.
Cost-effective cloud computing: a case study using the comparative genomics tool, roundup.

PubMed

Kudtarkar, Parul; Deluca, Todd F; Fusaro, Vincent A; Tonellato, Peter J; Wall, Dennis P

2010-12-22

Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource-Roundup-using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon's Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon's computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure.
Using Computing and Data Grids for Large-Scale Science and Engineering

NASA Technical Reports Server (NTRS)

Johnston, William E.

2001-01-01

We use the term "Grid" to refer to a software system that provides uniform and location independent access to geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. These emerging data and computing Grids promise to provide a highly capable and scalable environment for addressing large-scale science problems. We describe the requirements for science Grids, the resulting services and architecture of NASA's Information Power Grid (IPG) and DOE's Science Grid, and some of the scaling issues that have come up in their implementation.
From Three-Photon Greenberger-Horne-Zeilinger States to Ballistic Universal Quantum Computation.

PubMed

Gimeno-Segovia, Mercedes; Shadbolt, Pete; Browne, Dan E; Rudolph, Terry

2015-07-10

Single photons, manipulated using integrated linear optics, constitute a promising platform for universal quantum computation. A series of increasingly efficient proposals have shown linear-optical quantum computing to be formally scalable. However, existing schemes typically require extensive adaptive switching, which is experimentally challenging and noisy, thousands of photon sources per renormalized qubit, and/or large quantum memories for repeat-until-success strategies. Our work overcomes all these problems. We present a scheme to construct a cluster state universal for quantum computation, which uses no adaptive switching, no large memories, and which is at least an order of magnitude more resource efficient than previous passive schemes. Unlike previous proposals, it is constructed entirely from loss-detecting gates and offers a robustness to photon loss. Even without the use of an active loss-tolerant encoding, our scheme naturally tolerates a total loss rate ∼1.6% in the photons detected in the gates. This scheme uses only 3 Greenberger-Horne-Zeilinger states as a resource, together with a passive linear-optical network. We fully describe and model the iterative process of cluster generation, including photon loss and gate failure. This demonstrates that building a linear-optical quantum computer needs to be less challenging than previously thought.
Systems Analysis, Machineable Circulation Data and Library Users and Non-Users.

ERIC Educational Resources Information Center

Lubans, John, Jr.

A study to be made with computer-based circulation data of the non-use and use of a large academic library is discussed. A search of the literature reveals that computer-based circulation systems can be, but have not been, utilized to provide data bases for systematic analyses of library users and resources. The data gathered in the circulation…
The Czech National Grid Infrastructure

NASA Astrophysics Data System (ADS)

Chudoba, J.; Křenková, I.; Mulač, M.; Ruda, M.; Sitera, J.

2017-10-01

The Czech National Grid Infrastructure is operated by MetaCentrum, a CESNET department responsible for coordinating and managing activities related to distributed computing. CESNET as the Czech National Research and Education Network (NREN) provides many e-infrastructure services, which are used by 94% of the scientific and research community in the Czech Republic. Computing and storage resources owned by different organizations are connected by fast enough network to provide transparent access to all resources. We describe in more detail the computing infrastructure, which is based on several different technologies and covers grid, cloud and map-reduce environment. While the largest part of CPUs is still accessible via distributed torque servers, providing environment for long batch jobs, part of infrastructure is available via standard EGI tools in EGI, subset of NGI resources is provided into EGI FedCloud environment with cloud interface and there is also Hadoop cluster provided by the same e-infrastructure.A broad spectrum of computing servers is offered; users can choose from standard 2 CPU servers to large SMP machines with up to 6 TB of RAM or servers with GPU cards. Different groups have different priorities on various resources, resource owners can even have an exclusive access. The software is distributed via AFS. Storage servers offering up to tens of terabytes of disk space to individual users are connected via NFS4 on top of GPFS and access to long term HSM storage with peta-byte capacity is also provided. Overview of available resources and recent statistics of usage will be given.
The application of LANDSAT remote sensing technology to natural resources management. Section 1: Introduction to VICAR - Image classification module. Section 2: Forest resource assessment of Humboldt County.

NASA Technical Reports Server (NTRS)

Fox, L., III (Principal Investigator); Mayer, K. E.

1980-01-01

A teaching module on image classification procedures using the VICAR computer software package was developed to optimize the training benefits for users of the VICAR programs. The field test of the module is discussed. An intensive forest land inventory strategy was developed for Humboldt County. The results indicate that LANDSAT data can be computer classified to yield site specific forest resource information with high accuracy (82%). The "Douglas-fir 80%" category was found to cover approximately 21% of the county and "Mixed Conifer 80%" covering about 13%. The "Redwood 80%" resource category, which represented dense old growth trees as well as large second growth, comprised 4.0% of the total vegetation mosaic. Furthermore, the "Brush" and "Brush-Regeneration" categories were found to be a significant part of the vegetative community, with area estimates of 9.4 and 10.0%.
The HEPCloud Facility: elastic computing for High Energy Physics - The NOvA Use Case

NASA Astrophysics Data System (ADS)

Fuess, S.; Garzoglio, G.; Holzman, B.; Kennedy, R.; Norman, A.; Timm, S.; Tiradani, A.

2017-10-01

The need for computing in the HEP community follows cycles of peaks and valleys mainly driven by conference dates, accelerator shutdown, holiday schedules, and other factors. Because of this, the classical method of provisioning these resources at providing facilities has drawbacks such as potential overprovisioning. As the appetite for computing increases, however, so does the need to maximize cost efficiency by developing a model for dynamically provisioning resources only when needed. To address this issue, the HEPCloud project was launched by the Fermilab Scientific Computing Division in June 2015. Its goal is to develop a facility that provides a common interface to a variety of resources, including local clusters, grids, high performance computers, and community and commercial Clouds. Initially targeted experiments include CMS and NOvA, as well as other Fermilab stakeholders. In its first phase, the project has demonstrated the use of the “elastic” provisioning model offered by commercial clouds, such as Amazon Web Services. In this model, resources are rented and provisioned automatically over the Internet upon request. In January 2016, the project demonstrated the ability to increase the total amount of global CMS resources by 58,000 cores from 150,000 cores - a 38 percent increase - in preparation for the Recontres de Moriond. In March 2016, the NOvA experiment has also demonstrated resource burst capabilities with an additional 7,300 cores, achieving a scale almost four times as large as the local allocated resources and utilizing the local AWS s3 storage to optimize data handling operations and costs. NOvA was using the same familiar services used for local computations, such as data handling and job submission, in preparation for the Neutrino 2016 conference. In both cases, the cost was contained by the use of the Amazon Spot Instance Market and the Decision Engine, a HEPCloud component that aims at minimizing cost and job interruption. This paper describes the Fermilab HEPCloud Facility and the challenges overcome for the CMS and NOvA communities.
Computational modelling of oxygenation processes in enzymes and biomimetic model complexes.

PubMed

de Visser, Sam P; Quesne, Matthew G; Martin, Bodo; Comba, Peter; Ryde, Ulf

2014-01-11

With computational resources becoming more efficient and more powerful and at the same time cheaper, computational methods have become more and more popular for studies on biochemical and biomimetic systems. Although large efforts from the scientific community have gone into exploring the possibilities of computational methods for studies on large biochemical systems, such studies are not without pitfalls and often cannot be routinely done but require expert execution. In this review we summarize and highlight advances in computational methodology and its application to enzymatic and biomimetic model complexes. In particular, we emphasize on topical and state-of-the-art methodologies that are able to either reproduce experimental findings, e.g., spectroscopic parameters and rate constants, accurately or make predictions of short-lived intermediates and fast reaction processes in nature. Moreover, we give examples of processes where certain computational methods dramatically fail.
An FPGA computing demo core for space charge simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Jinyuan; Huang, Yifei; /Fermilab

2009-01-01

In accelerator physics, space charge simulation requires large amount of computing power. In a particle system, each calculation requires time/resource consuming operations such as multiplications, divisions, and square roots. Because of the flexibility of field programmable gate arrays (FPGAs), we implemented this task with efficient use of the available computing resources and completely eliminated non-calculating operations that are indispensable in regular micro-processors (e.g. instruction fetch, instruction decoding, etc.). We designed and tested a 16-bit demo core for computing Coulomb's force in an Altera Cyclone II FPGA device. To save resources, the inverse square-root cube operation in our design is computedmore » using a memory look-up table addressed with nine to ten most significant non-zero bits. At 200 MHz internal clock, our demo core reaches a throughput of 200 M pairs/s/core, faster than a typical 2 GHz micro-processor by about a factor of 10. Temperature and power consumption of FPGAs were also lower than those of micro-processors. Fast and convenient, FPGAs can serve as alternatives to time-consuming micro-processors for space charge simulation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gross, D.; Eisert, J.; Schuch, N.

We introduce schemes for quantum computing based on local measurements on entangled resource states. This work elaborates on the framework established in Gross and Eisert [Phys. Rev. Lett. 98, 220503 (2007); quant-ph/0609149]. Our method makes use of tools from many-body physics--matrix product states, finitely correlated states, or projected entangled pairs states--to show how measurements on entangled states can be viewed as processing quantum information. This work hence constitutes an instance where a quantum information problem--how to realize quantum computation--was approached using tools from many-body theory and not vice versa. We give a more detailed description of the setting and presentmore » a large number of examples. We find computational schemes, which differ from the original one-way computer, for example, in the way the randomness of measurement outcomes is handled. Also, schemes are presented where the logical qubits are no longer strictly localized on the resource state. Notably, we find a great flexibility in the properties of the universal resource states: They may, for example, exhibit nonvanishing long-range correlation functions or be locally arbitrarily close to a pure state. We discuss variants of Kitaev's toric code states as universal resources, and contrast this with situations where they can be efficiently classically simulated. This framework opens up a way of thinking of tailoring resource states to specific physical systems, such as cold atoms in optical lattices or linear optical systems.« less
National Laboratory for Advanced Scientific Visualization at UNAM - Mexico

NASA Astrophysics Data System (ADS)

Manea, Marina; Constantin Manea, Vlad; Varela, Alfredo

2016-04-01

In 2015, the National Autonomous University of Mexico (UNAM) joined the family of Universities and Research Centers where advanced visualization and computing plays a key role to promote and advance missions in research, education, community outreach, as well as business-oriented consulting. This initiative provides access to a great variety of advanced hardware and software resources and offers a range of consulting services that spans a variety of areas related to scientific visualization, among which are: neuroanatomy, embryonic development, genome related studies, geosciences, geography, physics and mathematics related disciplines. The National Laboratory for Advanced Scientific Visualization delivers services through three main infrastructure environments: the 3D fully immersive display system Cave, the high resolution parallel visualization system Powerwall, the high resolution spherical displays Earth Simulator. The entire visualization infrastructure is interconnected to a high-performance-computing-cluster (HPCC) called ADA in honor to Ada Lovelace, considered to be the first computer programmer. The Cave is an extra large 3.6m wide room with projected images on the front, left and right, as well as floor walls. Specialized crystal eyes LCD-shutter glasses provide a strong stereo depth perception, and a variety of tracking devices allow software to track the position of a user's hand, head and wand. The Powerwall is designed to bring large amounts of complex data together through parallel computing for team interaction and collaboration. This system is composed by 24 (6x4) high-resolution ultra-thin (2 mm) bezel monitors connected to a high-performance GPU cluster. The Earth Simulator is a large (60") high-resolution spherical display used for global-scale data visualization like geophysical, meteorological, climate and ecology data. The HPCC-ADA, is a 1000+ computing core system, which offers parallel computing resources to applications that requires large quantity of memory as well as large and fast parallel storage systems. The entire system temperature is controlled by an energy and space efficient cooling solution, based on large rear door liquid cooled heat exchangers. This state-of-the-art infrastructure will boost research activities in the region, offer a powerful scientific tool for teaching at undergraduate and graduate levels, and enhance association and cooperation with business-oriented organizations.
PanDA: Exascale Federation of Resources for the ATLAS Experiment at the LHC

NASA Astrophysics Data System (ADS)

Barreiro Megino, Fernando; Caballero Bejar, Jose; De, Kaushik; Hover, John; Klimentov, Alexei; Maeno, Tadashi; Nilsson, Paul; Oleynik, Danila; Padolski, Siarhei; Panitkin, Sergey; Petrosyan, Artem; Wenaus, Torre

2016-02-01

After a scheduled maintenance and upgrade period, the world's largest and most powerful machine - the Large Hadron Collider(LHC) - is about to enter its second run at unprecedented energies. In order to exploit the scientific potential of the machine, the experiments at the LHC face computational challenges with enormous data volumes that need to be analysed by thousand of physics users and compared to simulated data. Given diverse funding constraints, the computational resources for the LHC have been deployed in a worldwide mesh of data centres, connected to each other through Grid technologies. The PanDA (Production and Distributed Analysis) system was developed in 2005 for the ATLAS experiment on top of this heterogeneous infrastructure to seamlessly integrate the computational resources and give the users the feeling of a unique system. Since its origins, PanDA has evolved together with upcoming computing paradigms in and outside HEP, such as changes in the networking model, Cloud Computing and HPC. It is currently running steadily up to 200 thousand simultaneous cores (limited by the available resources for ATLAS), up to two million aggregated jobs per day and processes over an exabyte of data per year. The success of PanDA in ATLAS is triggering the widespread adoption and testing by other experiments. In this contribution we will give an overview of the PanDA components and focus on the new features and upcoming challenges that are relevant to the next decade of distributed computing workload management using PanDA.
Access control and privacy in large distributed systems

NASA Technical Reports Server (NTRS)

Leiner, B. M.; Bishop, M.

1986-01-01

Large scale distributed systems consists of workstations, mainframe computers, supercomputers and other types of servers, all connected by a computer network. These systems are being used in a variety of applications including the support of collaborative scientific research. In such an environment, issues of access control and privacy arise. Access control is required for several reasons, including the protection of sensitive resources and cost control. Privacy is also required for similar reasons, including the protection of a researcher's proprietary results. A possible architecture for integrating available computer and communications security technologies into a system that meet these requirements is described. This architecture is meant as a starting point for discussion, rather that the final answer.
Adapting the serial Alpgen parton-interaction generator to simulate LHC collisions on millions of parallel threads

NASA Astrophysics Data System (ADS)

Childers, J. T.; Uram, T. D.; LeCompte, T. J.; Papka, M. E.; Benjamin, D. P.

2017-01-01

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application and the performance that was achieved.
Experimental quantum computing without entanglement.

PubMed

Lanyon, B P; Barbieri, M; Almeida, M P; White, A G

2008-11-14

Deterministic quantum computation with one pure qubit (DQC1) is an efficient model of computation that uses highly mixed states. Unlike pure-state models, its power is not derived from the generation of a large amount of entanglement. Instead it has been proposed that other nonclassical correlations are responsible for the computational speedup, and that these can be captured by the quantum discord. In this Letter we implement DQC1 in an all-optical architecture, and experimentally observe the generated correlations. We find no entanglement, but large amounts of quantum discord-except in three cases where an efficient classical simulation is always possible. Our results show that even fully separable, highly mixed, states can contain intrinsically quantum mechanical correlations and that these could offer a valuable resource for quantum information technologies.

Job Scheduling in a Heterogeneous Grid Environment

NASA Technical Reports Server (NTRS)

Shan, Hong-Zhang; Smith, Warren; Oliker, Leonid; Biswas, Rupak

2004-01-01

Computational grids have the potential for solving large-scale scientific problems using heterogeneous and geographically distributed resources. However, a number of major technical hurdles must be overcome before this potential can be realized. One problem that is critical to effective utilization of computational grids is the efficient scheduling of jobs. This work addresses this problem by describing and evaluating a grid scheduling architecture and three job migration algorithms. The architecture is scalable and does not assume control of local site resources. The job migration policies use the availability and performance of computer systems, the network bandwidth available between systems, and the volume of input and output data associated with each job. An extensive performance comparison is presented using real workloads from leading computational centers. The results, based on several key metrics, demonstrate that the performance of our distributed migration algorithms is significantly greater than that of a local scheduling framework and comparable to a non-scalable global scheduling approach.
ACToR-Aggregated Computational Resource | Science ...

EPA Pesticide Factsheets

ACToR (Aggregated Computational Toxicology Resource) is a database and set of software applications that bring into one central location many types and sources of data on environmental chemicals. Currently, the ACToR chemical database contains information on chemical structure, in vitro bioassays and in vivo toxicology assays derived from more than 150 sources including the U.S. Environmental Protection Agency (EPA), Centers for Disease Control (CDC), U.S. Food & Drug Administration (FDA), National Institutes of Health (NIH), state agencies, corresponding government agencies in Canada, Europe and Japan, universities, the World Health Organization (WHO) and non-governmental organizations (NGOs). At the EPA National Center for Computational Toxicology, ACToR helps manage large data sets being used in a high throughput environmental chemical screening and prioritization program called ToxCast(TM).
ACToR - Aggregated Computational Toxicology Resource

DOE Office of Scientific and Technical Information (OSTI.GOV)

Judson, Richard; Richard, Ann; Dix, David

2008-11-15

ACToR (Aggregated Computational Toxicology Resource) is a database and set of software applications that bring into one central location many types and sources of data on environmental chemicals. Currently, the ACToR chemical database contains information on chemical structure, in vitro bioassays and in vivo toxicology assays derived from more than 150 sources including the U.S. Environmental Protection Agency (EPA), Centers for Disease Control (CDC), U.S. Food and Drug Administration (FDA), National Institutes of Health (NIH), state agencies, corresponding government agencies in Canada, Europe and Japan, universities, the World Health Organization (WHO) and non-governmental organizations (NGOs). At the EPA National Centermore » for Computational Toxicology, ACToR helps manage large data sets being used in a high-throughput environmental chemical screening and prioritization program called ToxCast{sup TM}.« less
Cyber-workstation for computational neuroscience.

PubMed

Digiovanna, Jack; Rattanatamrong, Prapaporn; Zhao, Ming; Mahmoudi, Babak; Hermer, Linda; Figueiredo, Renato; Principe, Jose C; Fortes, Jose; Sanchez, Justin C

2010-01-01

A Cyber-Workstation (CW) to study in vivo, real-time interactions between computational models and large-scale brain subsystems during behavioral experiments has been designed and implemented. The design philosophy seeks to directly link the in vivo neurophysiology laboratory with scalable computing resources to enable more sophisticated computational neuroscience investigation. The architecture designed here allows scientists to develop new models and integrate them with existing models (e.g. recursive least-squares regressor) by specifying appropriate connections in a block-diagram. Then, adaptive middleware transparently implements these user specifications using the full power of remote grid-computing hardware. In effect, the middleware deploys an on-demand and flexible neuroscience research test-bed to provide the neurophysiology laboratory extensive computational power from an outside source. The CW consolidates distributed software and hardware resources to support time-critical and/or resource-demanding computing during data collection from behaving animals. This power and flexibility is important as experimental and theoretical neuroscience evolves based on insights gained from data-intensive experiments, new technologies and engineering methodologies. This paper describes briefly the computational infrastructure and its most relevant components. Each component is discussed within a systematic process of setting up an in vivo, neuroscience experiment. Furthermore, a co-adaptive brain machine interface is implemented on the CW to illustrate how this integrated computational and experimental platform can be used to study systems neurophysiology and learning in a behavior task. We believe this implementation is also the first remote execution and adaptation of a brain-machine interface.
Cyber-Workstation for Computational Neuroscience

PubMed Central

DiGiovanna, Jack; Rattanatamrong, Prapaporn; Zhao, Ming; Mahmoudi, Babak; Hermer, Linda; Figueiredo, Renato; Principe, Jose C.; Fortes, Jose; Sanchez, Justin C.

2009-01-01

A Cyber-Workstation (CW) to study in vivo, real-time interactions between computational models and large-scale brain subsystems during behavioral experiments has been designed and implemented. The design philosophy seeks to directly link the in vivo neurophysiology laboratory with scalable computing resources to enable more sophisticated computational neuroscience investigation. The architecture designed here allows scientists to develop new models and integrate them with existing models (e.g. recursive least-squares regressor) by specifying appropriate connections in a block-diagram. Then, adaptive middleware transparently implements these user specifications using the full power of remote grid-computing hardware. In effect, the middleware deploys an on-demand and flexible neuroscience research test-bed to provide the neurophysiology laboratory extensive computational power from an outside source. The CW consolidates distributed software and hardware resources to support time-critical and/or resource-demanding computing during data collection from behaving animals. This power and flexibility is important as experimental and theoretical neuroscience evolves based on insights gained from data-intensive experiments, new technologies and engineering methodologies. This paper describes briefly the computational infrastructure and its most relevant components. Each component is discussed within a systematic process of setting up an in vivo, neuroscience experiment. Furthermore, a co-adaptive brain machine interface is implemented on the CW to illustrate how this integrated computational and experimental platform can be used to study systems neurophysiology and learning in a behavior task. We believe this implementation is also the first remote execution and adaptation of a brain-machine interface. PMID:20126436
QMC Goes BOINC: Using Public Resource Computing to Perform Quantum Monte Carlo Calculations

NASA Astrophysics Data System (ADS)

Rainey, Cameron; Engelhardt, Larry; Schröder, Christian; Hilbig, Thomas

2008-10-01

Theoretical modeling of magnetic molecules traditionally involves the diagonalization of quantum Hamiltonian matrices. However, as the complexity of these molecules increases, the matrices become so large that this process becomes unusable. An additional challenge to this modeling is that many repetitive calculations must be performed, further increasing the need for computing power. Both of these obstacles can be overcome by using a quantum Monte Carlo (QMC) method and a distributed computing project. We have recently implemented a QMC method within the Spinhenge@home project, which is a Public Resource Computing (PRC) project where private citizens allow part-time usage of their PCs for scientific computing. The use of PRC for scientific computing will be described in detail, as well as how you can contribute to the project. See, e.g., L. Engelhardt, et. al., Angew. Chem. Int. Ed. 47, 924 (2008). C. Schröoder, in Distributed & Grid Computing - Science Made Transparent for Everyone. Principles, Applications and Supporting Communities. (Weber, M.H.W., ed., 2008). Project URL: http://spin.fh-bielefeld.de
Key Lessons in Building "Data Commons": The Open Science Data Cloud Ecosystem

NASA Astrophysics Data System (ADS)

Patterson, M.; Grossman, R.; Heath, A.; Murphy, M.; Wells, W.

2015-12-01

Cloud computing technology has created a shift around data and data analysis by allowing researchers to push computation to data as opposed to having to pull data to an individual researcher's computer. Subsequently, cloud-based resources can provide unique opportunities to capture computing environments used both to access raw data in its original form and also to create analysis products which may be the source of data for tables and figures presented in research publications. Since 2008, the Open Cloud Consortium (OCC) has operated the Open Science Data Cloud (OSDC), which provides scientific researchers with computational resources for storing, sharing, and analyzing large (terabyte and petabyte-scale) scientific datasets. OSDC has provided compute and storage services to over 750 researchers in a wide variety of data intensive disciplines. Recently, internal users have logged about 2 million core hours each month. The OSDC also serves the research community by colocating these resources with access to nearly a petabyte of public scientific datasets in a variety of fields also accessible for download externally by the public. In our experience operating these resources, researchers are well served by "data commons," meaning cyberinfrastructure that colocates data archives, computing, and storage infrastructure and supports essential tools and services for working with scientific data. In addition to the OSDC public data commons, the OCC operates a data commons in collaboration with NASA and is developing a data commons for NOAA datasets. As cloud-based infrastructures for distributing and computing over data become more pervasive, we ask, "What does it mean to publish data in a data commons?" Here we present the OSDC perspective and discuss several services that are key in architecting data commons, including digital identifier services.
Law of Large Numbers: the Theory, Applications and Technology-based Education

PubMed Central

Dinov, Ivo D.; Christou, Nicolas; Gould, Robert

2011-01-01

Modern approaches for technology-based blended education utilize a variety of recently developed novel pedagogical, computational and network resources. Such attempts employ technology to deliver integrated, dynamically-linked, interactive-content and heterogeneous learning environments, which may improve student comprehension and information retention. In this paper, we describe one such innovative effort of using technological tools to expose students in probability and statistics courses to the theory, practice and usability of the Law of Large Numbers (LLN). We base our approach on integrating pedagogical instruments with the computational libraries developed by the Statistics Online Computational Resource (www.SOCR.ucla.edu). To achieve this merger we designed a new interactive Java applet and a corresponding demonstration activity that illustrate the concept and the applications of the LLN. The LLN applet and activity have common goals – to provide graphical representation of the LLN principle, build lasting student intuition and present the common misconceptions about the law of large numbers. Both the SOCR LLN applet and activity are freely available online to the community to test, validate and extend (Applet: http://socr.ucla.edu/htmls/exp/Coin_Toss_LLN_Experiment.html, and Activity: http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_LLN). PMID:21603584
Mira: Argonne's 10-petaflops supercomputer

ScienceCinema

Papka, Michael; Coghlan, Susan; Isaacs, Eric; Peters, Mark; Messina, Paul

2018-02-13

Mira, Argonne's petascale IBM Blue Gene/Q system, ushers in a new era of scientific supercomputing at the Argonne Leadership Computing Facility. An engineering marvel, the 10-petaflops supercomputer is capable of carrying out 10 quadrillion calculations per second. As a machine for open science, any researcher with a question that requires large-scale computing resources can submit a proposal for time on Mira, typically in allocations of millions of core-hours, to run programs for their experiments. This adds up to billions of hours of computing time per year.
Mira: Argonne's 10-petaflops supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Papka, Michael; Coghlan, Susan; Isaacs, Eric

2013-07-03

Mira, Argonne's petascale IBM Blue Gene/Q system, ushers in a new era of scientific supercomputing at the Argonne Leadership Computing Facility. An engineering marvel, the 10-petaflops supercomputer is capable of carrying out 10 quadrillion calculations per second. As a machine for open science, any researcher with a question that requires large-scale computing resources can submit a proposal for time on Mira, typically in allocations of millions of core-hours, to run programs for their experiments. This adds up to billions of hours of computing time per year.
Applications of the pipeline environment for visual informatics and genomics computations

PubMed Central

2011-01-01

Background Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols. Results This paper reports on the applications of the LONI Pipeline environment to address two informatics challenges - graphical management of diverse genomics tools, and the interoperability of informatics software. Specifically, this manuscript presents the concrete details of deploying general informatics suites and individual software tools to new hardware infrastructures, the design, validation and execution of new visual analysis protocols via the Pipeline graphical interface, and integration of diverse informatics tools via the Pipeline eXtensible Markup Language syntax. We demonstrate each of these processes using several established informatics packages (e.g., miBLAST, EMBOSS, mrFAST, GWASS, MAQ, SAMtools, Bowtie) for basic local sequence alignment and search, molecular biology data analysis, and genome-wide association studies. These examples demonstrate the power of the Pipeline graphical workflow environment to enable integration of bioinformatics resources which provide a well-defined syntax for dynamic specification of the input/output parameters and the run-time execution controls. Conclusions The LONI Pipeline environment http://pipeline.loni.ucla.edu provides a flexible graphical infrastructure for efficient biomedical computing and distributed informatics research. The interactive Pipeline resource manager enables the utilization and interoperability of diverse types of informatics resources. The Pipeline client-server model provides computational power to a broad spectrum of informatics investigators - experienced developers and novice users, user with or without access to advanced computational-resources (e.g., Grid, data), as well as basic and translational scientists. The open development, validation and dissemination of computational networks (pipeline workflows) facilitates the sharing of knowledge, tools, protocols and best practices, and enables the unbiased validation and replication of scientific findings by the entire community. PMID:21791102
Data Intensive Scientific Workflows on a Federated Cloud: CRADA Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garzoglio, Gabriele

The Fermilab Scientific Computing Division and the KISTI Global Science Experimental Data Hub Center have built a prototypical large-scale infrastructure to handle scientific workflows of stakeholders to run on multiple cloud resources. The demonstrations have been in the areas of (a) Data-Intensive Scientific Workflows on Federated Clouds, (b) Interoperability and Federation of Cloud Resources, and (c) Virtual Infrastructure Automation to enable On-Demand Services.
Integration and Exposure of Large Scale Computational Resources Across the Earth System Grid Federation (ESGF)

NASA Astrophysics Data System (ADS)

Duffy, D.; Maxwell, T. P.; Doutriaux, C.; Williams, D. N.; Chaudhary, A.; Ames, S.

2015-12-01

As the size of remote sensing observations and model output data grows, the volume of the data has become overwhelming, even to many scientific experts. As societies are forced to better understand, mitigate, and adapt to climate changes, the combination of Earth observation data and global climate model projects is crucial to not only scientists but to policy makers, downstream applications, and even the public. Scientific progress on understanding climate is critically dependent on the availability of a reliable infrastructure that promotes data access, management, and provenance. The Earth System Grid Federation (ESGF) has created such an environment for the Intergovernmental Panel on Climate Change (IPCC). ESGF provides a federated global cyber infrastructure for data access and management of model outputs generated for the IPCC Assessment Reports (AR). The current generation of the ESGF federated grid allows consumers of the data to find and download data with limited capabilities for server-side processing. Since the amount of data for future AR is expected to grow dramatically, ESGF is working on integrating server-side analytics throughout the federation. The ESGF Compute Working Team (CWT) has created a Web Processing Service (WPS) Application Programming Interface (API) to enable access scalable computational resources. The API is the exposure point to high performance computing resources across the federation. Specifically, the API allows users to execute simple operations, such as maximum, minimum, average, and anomalies, on ESGF data without having to download the data. These operations are executed at the ESGF data node site with access to large amounts of parallel computing capabilities. This presentation will highlight the WPS API, its capabilities, provide implementation details, and discuss future developments.
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems

PubMed Central

Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.

2014-01-01

The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545
Distributed Computation of the knn Graph for Large High-Dimensional Point Sets

PubMed Central

Plaku, Erion; Kavraki, Lydia E.

2009-01-01

High-dimensional problems arising from robot motion planning, biology, data mining, and geographic information systems often require the computation of k nearest neighbor (knn) graphs. The knn graph of a data set is obtained by connecting each point to its k closest points. As the research in the above-mentioned fields progressively addresses problems of unprecedented complexity, the demand for computing knn graphs based on arbitrary distance metrics and large high-dimensional data sets increases, exceeding resources available to a single machine. In this work we efficiently distribute the computation of knn graphs for clusters of processors with message passing. Extensions to our distributed framework include the computation of graphs based on other proximity queries, such as approximate knn or range queries. Our experiments show nearly linear speedup with over one hundred processors and indicate that similar speedup can be obtained with several hundred processors. PMID:19847318
A Science Cloud: OneSpaceNet

NASA Astrophysics Data System (ADS)

Morikawa, Y.; Murata, K. T.; Watari, S.; Kato, H.; Yamamoto, K.; Inoue, S.; Tsubouchi, K.; Fukazawa, K.; Kimura, E.; Tatebe, O.; Shimojo, S.

2010-12-01

Main methodologies of Solar-Terrestrial Physics (STP) so far are theoretical, experimental and observational, and computer simulation approaches. Recently "informatics" is expected as a new (fourth) approach to the STP studies. Informatics is a methodology to analyze large-scale data (observation data and computer simulation data) to obtain new findings using a variety of data processing techniques. At NICT (National Institute of Information and Communications Technology, Japan) we are now developing a new research environment named "OneSpaceNet". The OneSpaceNet is a cloud-computing environment specialized for science works, which connects many researchers with high-speed network (JGN: Japan Gigabit Network). The JGN is a wide-area back-born network operated by NICT; it provides 10G network and many access points (AP) over Japan. The OneSpaceNet also provides with rich computer resources for research studies, such as super-computers, large-scale data storage area, licensed applications, visualization devices (like tiled display wall: TDW), database/DBMS, cluster computers (4-8 nodes) for data processing and communication devices. What is amazing in use of the science cloud is that a user simply prepares a terminal (low-cost PC). Once connecting the PC to JGN2plus, the user can make full use of the rich resources of the science cloud. Using communication devices, such as video-conference system, streaming and reflector servers, and media-players, the users on the OneSpaceNet can make research communications as if they belong to a same (one) laboratory: they are members of a virtual laboratory. The specification of the computer resources on the OneSpaceNet is as follows: The size of data storage we have developed so far is almost 1PB. The number of the data files managed on the cloud storage is getting larger and now more than 40,000,000. What is notable is that the disks forming the large-scale storage are distributed to 5 data centers over Japan (but the storage system performs as one disk). There are three supercomputers allocated on the cloud, one from Tokyo, one from Osaka and the other from Nagoya. One's simulation job data on any supercomputers are saved on the cloud data storage (same directory); it is a kind of virtual computing environment. The tiled display wall has 36 panels acting as one display; the pixel (resolution) size of it is as large as 18000x4300. This size is enough to preview or analyze the large-scale computer simulation data. It also allows us to take a look of multiple (e.g., 100 pictures) on one screen together with many researchers. In our talk we also present a brief report of the initial results using the OneSpaceNet for Global MHD simulations as an example of successful use of our science cloud; (i) Ultra-high time resolution visualization of Global MHD simulations on the large-scale storage and parallel processing system on the cloud, (ii) Database of real-time Global MHD simulation and statistic analyses of the data, and (iii) 3D Web service of Global MHD simulations.
Advancing Cyberinfrastructure to support high resolution water resources modeling

NASA Astrophysics Data System (ADS)

Tarboton, D. G.; Ogden, F. L.; Jones, N.; Horsburgh, J. S.

2012-12-01

Addressing the problem of how the availability and quality of water resources at large scales are sensitive to climate variability, watershed alterations and management activities requires computational resources that combine data from multiple sources and support integrated modeling. Related cyberinfrastructure challenges include: 1) how can we best structure data and computer models to address this scientific problem through the use of high-performance and data-intensive computing, and 2) how can we do this in a way that discipline scientists without extensive computational and algorithmic knowledge and experience can take advantage of advances in cyberinfrastructure? This presentation will describe a new system called CI-WATER that is being developed to address these challenges and advance high resolution water resources modeling in the Western U.S. We are building on existing tools that enable collaboration to develop model and data interfaces that link integrated system models running within an HPC environment to multiple data sources. Our goal is to enhance the use of computational simulation and data-intensive modeling to better understand water resources. Addressing water resource problems in the Western U.S. requires simulation of natural and engineered systems, as well as representation of legal (water rights) and institutional constraints alongside the representation of physical processes. We are establishing data services to represent the engineered infrastructure and legal and institutional systems in a way that they can be used with high resolution multi-physics watershed modeling at high spatial resolution. These services will enable incorporation of location-specific information on water management infrastructure and systems into the assessment of regional water availability in the face of growing demands, uncertain future meteorological forcings, and existing prior-appropriations water rights. This presentation will discuss the informatics challenges involved with data management and easy-to-use access to high performance computing being tackled in this project.
Massive Cloud-Based Big Data Processing for Ocean Sensor Networks and Remote Sensing

NASA Astrophysics Data System (ADS)

Schwehr, K. D.

2017-12-01

Until recently, the work required to integrate and analyze data for global-scale environmental issues was prohibitive both in cost and availability. Traditional desktop processing systems are not able to effectively store and process all the data, and super computer solutions are financially out of the reach of most people. The availability of large-scale cloud computing has created tools that are usable by small groups and individuals regardless of financial resources or locally available computational resources. These systems give scientists and policymakers the ability to see how critical resources are being used across the globe with little or no barrier to entry. Google Earth Engine has the Moderate Resolution Imaging Spectroradiometer (MODIS) Terra, MODIS Aqua, and Global Land Data Assimilation Systems (GLDAS) data catalogs available live online. Here we demonstrate these data to calculate the correlation between lagged chlorophyll and rainfall to identify areas of eutrophication, matching these events to ocean currents from datasets like HYbrid Coordinate Ocean Model (HYCOM) to check if there are constraints from oceanographic configurations. The system can provide addition ground truth with observations from sensor networks like the International Comprehensive Ocean-Atmosphere Data Set / Voluntary Observing Ship (ICOADS/VOS) and Argo floats. This presentation is intended to introduce users to the datasets, programming idioms, and functionality of Earth Engine for large-scale, data-driven oceanography.
Virtual screening of integrase inhibitors by large scale binding free energy calculations: the SAMPL4 challenge

PubMed Central

Gallicchio, Emilio; Deng, Nanjie; He, Peng; Wickstrom, Lauren; Perryman, Alexander L.; Santiago, Daniel N.; Forli, Stefano; Olson, Arthur J.; Levy, Ronald M.

2014-01-01

As part of the SAMPL4 blind challenge, filtered AutoDock Vina ligand docking predictions and large scale binding energy distribution analysis method binding free energy calculations have been applied to the virtual screening of a focused library of candidate binders to the LEDGF site of the HIV integrase protein. The computational protocol leveraged docking and high level atomistic models to improve enrichment. The enrichment factor of our blind predictions ranked best among all of the computational submissions, and second best overall. This work represents to our knowledge the first example of the application of an all-atom physics-based binding free energy model to large scale virtual screening. A total of 285 parallel Hamiltonian replica exchange molecular dynamics absolute protein-ligand binding free energy simulations were conducted starting from docked poses. The setup of the simulations was fully automated, calculations were distributed on multiple computing resources and were completed in a 6-weeks period. The accuracy of the docked poses and the inclusion of intramolecular strain and entropic losses in the binding free energy estimates were the major factors behind the success of the method. Lack of sufficient time and computing resources to investigate additional protonation states of the ligands was a major cause of mispredictions. The experiment demonstrated the applicability of binding free energy modeling to improve hit rates in challenging virtual screening of focused ligand libraries during lead optimization. PMID:24504704
DOE Office of Scientific and Technical Information (OSTI.GOV)

Hussain, Hameed; Malik, Saif Ur Rehman; Hameed, Abdul

An efficient resource allocation is a fundamental requirement in high performance computing (HPC) systems. Many projects are dedicated to large-scale distributed computing systems that have designed and developed resource allocation mechanisms with a variety of architectures and services. In our study, through analysis, a comprehensive survey for describing resource allocation in various HPCs is reported. The aim of the work is to aggregate under a joint framework, the existing solutions for HPC to provide a thorough analysis and characteristics of the resource management and allocation strategies. Resource allocation mechanisms and strategies play a vital role towards the performance improvement ofmore » all the HPCs classifications. Therefore, a comprehensive discussion of widely used resource allocation strategies deployed in HPC environment is required, which is one of the motivations of this survey. Moreover, we have classified the HPC systems into three broad categories, namely: (a) cluster, (b) grid, and (c) cloud systems and define the characteristics of each class by extracting sets of common attributes. All of the aforementioned systems are cataloged into pure software and hybrid/hardware solutions. The system classification is used to identify approaches followed by the implementation of existing resource allocation strategies that are widely presented in the literature.« less

Healthcare4VideoStorm: Making Smart Decisions Based on Storm Metrics.

PubMed

Zhang, Weishan; Duan, Pengcheng; Chen, Xiufeng; Lu, Qinghua

2016-04-23

Storm-based stream processing is widely used for real-time large-scale distributed processing. Knowing the run-time status and ensuring performance is critical to providing expected dependability for some applications, e.g., continuous video processing for security surveillance. The existing scheduling strategies' granularity is too coarse to have good performance, and mainly considers network resources without computing resources while scheduling. In this paper, we propose Healthcare4Storm, a framework that finds Storm insights based on Storm metrics to gain knowledge from the health status of an application, finally ending up with smart scheduling decisions. It takes into account both network and computing resources and conducts scheduling at a fine-grained level using tuples instead of topologies. The comprehensive evaluation shows that the proposed framework has good performance and can improve the dependability of the Storm-based applications.
Computer assisted audit techniques for UNIX (UNIX-CAATS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Polk, W.T.

1991-12-31

Federal and DOE regulations impose specific requirements for internal controls of computer systems. These controls include adequate separation of duties and sufficient controls for access of system and data. The DOE Inspector General`s Office has the responsibility to examine internal controls, as well as efficient use of computer system resources. As a result, DOE supported NIST development of computer assisted audit techniques to examine BSD UNIX computers (UNIX-CAATS). These systems were selected due to the increasing number of UNIX workstations in use within DOE. This paper describes the design and development of these techniques, as well as the results ofmore » testing at NIST and the first audit at a DOE site. UNIX-CAATS consists of tools which examine security of passwords, file systems, and network access. In addition, a tool was developed to examine efficiency of disk utilization. Test results at NIST indicated inadequate password management, as well as weak network resource controls. File system security was considered adequate. Audit results at a DOE site indicated weak password management and inefficient disk utilization. During the audit, we also found improvements to UNIX-CAATS were needed when applied to large systems. NIST plans to enhance the techniques developed for DOE/IG in future work. This future work would leverage currently available tools, along with needed enhancements. These enhancements would enable DOE/IG to audit large systems, such as supercomputers.« less
Computer assisted audit techniques for UNIX (UNIX-CAATS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Polk, W.T.

1991-01-01

Federal and DOE regulations impose specific requirements for internal controls of computer systems. These controls include adequate separation of duties and sufficient controls for access of system and data. The DOE Inspector General's Office has the responsibility to examine internal controls, as well as efficient use of computer system resources. As a result, DOE supported NIST development of computer assisted audit techniques to examine BSD UNIX computers (UNIX-CAATS). These systems were selected due to the increasing number of UNIX workstations in use within DOE. This paper describes the design and development of these techniques, as well as the results ofmore » testing at NIST and the first audit at a DOE site. UNIX-CAATS consists of tools which examine security of passwords, file systems, and network access. In addition, a tool was developed to examine efficiency of disk utilization. Test results at NIST indicated inadequate password management, as well as weak network resource controls. File system security was considered adequate. Audit results at a DOE site indicated weak password management and inefficient disk utilization. During the audit, we also found improvements to UNIX-CAATS were needed when applied to large systems. NIST plans to enhance the techniques developed for DOE/IG in future work. This future work would leverage currently available tools, along with needed enhancements. These enhancements would enable DOE/IG to audit large systems, such as supercomputers.« less
Adapting the serial Alpgen parton-interaction generator to simulate LHC collisions on millions of parallel threads

DOE Office of Scientific and Technical Information (OSTI.GOV)

Childers, J. T.; Uram, T. D.; LeCompte, T. J.

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the World- wide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
Adapting the serial Alpgen parton-interaction generator to simulate LHC collisions on millions of parallel threads

DOE PAGES

Childers, J. T.; Uram, T. D.; LeCompte, T. J.; ...

2016-09-29

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. Finally, this paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
Adapting the serial Alpgen parton-interaction generator to simulate LHC collisions on millions of parallel threads

DOE Office of Scientific and Technical Information (OSTI.GOV)

Childers, J. T.; Uram, T. D.; LeCompte, T. J.

As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. Finally, this paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application andmore » the performance that was achieved.« less
RSTensorFlow: GPU Enabled TensorFlow for Deep Learning on Commodity Android Devices

PubMed Central

Alzantot, Moustafa; Wang, Yingnan; Ren, Zhengshuang; Srivastava, Mani B.

2018-01-01

Mobile devices have become an essential part of our daily lives. By virtue of both their increasing computing power and the recent progress made in AI, mobile devices evolved to act as intelligent assistants in many tasks rather than a mere way of making phone calls. However, popular and commonly used tools and frameworks for machine intelligence are still lacking the ability to make proper use of the available heterogeneous computing resources on mobile devices. In this paper, we study the benefits of utilizing the heterogeneous (CPU and GPU) computing resources available on commodity android devices while running deep learning models. We leveraged the heterogeneous computing framework RenderScript to accelerate the execution of deep learning models on commodity Android devices. Our system is implemented as an extension to the popular open-source framework TensorFlow. By integrating our acceleration framework tightly into TensorFlow, machine learning engineers can now easily make benefit of the heterogeneous computing resources on mobile devices without the need of any extra tools. We evaluate our system on different android phones models to study the trade-offs of running different neural network operations on the GPU. We also compare the performance of running different models architectures such as convolutional and recurrent neural networks on CPU only vs using heterogeneous computing resources. Our result shows that although GPUs on the phones are capable of offering substantial performance gain in matrix multiplication on mobile devices. Therefore, models that involve multiplication of large matrices can run much faster (approx. 3 times faster in our experiments) due to GPU support. PMID:29629431
RSTensorFlow: GPU Enabled TensorFlow for Deep Learning on Commodity Android Devices.

PubMed

Alzantot, Moustafa; Wang, Yingnan; Ren, Zhengshuang; Srivastava, Mani B

2017-06-01

Mobile devices have become an essential part of our daily lives. By virtue of both their increasing computing power and the recent progress made in AI, mobile devices evolved to act as intelligent assistants in many tasks rather than a mere way of making phone calls. However, popular and commonly used tools and frameworks for machine intelligence are still lacking the ability to make proper use of the available heterogeneous computing resources on mobile devices. In this paper, we study the benefits of utilizing the heterogeneous (CPU and GPU) computing resources available on commodity android devices while running deep learning models. We leveraged the heterogeneous computing framework RenderScript to accelerate the execution of deep learning models on commodity Android devices. Our system is implemented as an extension to the popular open-source framework TensorFlow. By integrating our acceleration framework tightly into TensorFlow, machine learning engineers can now easily make benefit of the heterogeneous computing resources on mobile devices without the need of any extra tools. We evaluate our system on different android phones models to study the trade-offs of running different neural network operations on the GPU. We also compare the performance of running different models architectures such as convolutional and recurrent neural networks on CPU only vs using heterogeneous computing resources. Our result shows that although GPUs on the phones are capable of offering substantial performance gain in matrix multiplication on mobile devices. Therefore, models that involve multiplication of large matrices can run much faster (approx. 3 times faster in our experiments) due to GPU support.
Information Power Grid: Distributed High-Performance Computing and Large-Scale Data Management for Science and Engineering

NASA Technical Reports Server (NTRS)

Johnston, William E.; Gannon, Dennis; Nitzberg, Bill; Feiereisen, William (Technical Monitor)

2000-01-01

The term "Grid" refers to distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. The vision for NASN's Information Power Grid - a computing and data Grid - is that it will provide significant new capabilities to scientists and engineers by facilitating routine construction of information based problem solving environments / frameworks that will knit together widely distributed computing, data, instrument, and human resources into just-in-time systems that can address complex and large-scale computing and data analysis problems. IPG development and deployment is addressing requirements obtained by analyzing a number of different application areas, in particular from the NASA Aero-Space Technology Enterprise. This analysis has focussed primarily on two types of users: The scientist / design engineer whose primary interest is problem solving (e.g., determining wing aerodynamic characteristics in many different operating environments), and whose primary interface to IPG will be through various sorts of problem solving frameworks. The second type of user if the tool designer: The computational scientists who convert physics and mathematics into code that can simulate the physical world. These are the two primary users of IPG, and they have rather different requirements. This paper describes the current state of IPG (the operational testbed), the set of capabilities being put into place for the operational prototype IPG, as well as some of the longer term R&D tasks.
Northwest Trajectory Analysis Capability: A Platform for Enhancing Computational Biophysics Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peterson, Elena S.; Stephan, Eric G.; Corrigan, Abigail L.

2008-07-30

As computational resources continue to increase, the ability of computational simulations to effectively complement, and in some cases replace, experimentation in scientific exploration also increases. Today, large-scale simulations are recognized as an effective tool for scientific exploration in many disciplines including chemistry and biology. A natural side effect of this trend has been the need for an increasingly complex analytical environment. In this paper, we describe Northwest Trajectory Analysis Capability (NTRAC), an analytical software suite developed to enhance the efficiency of computational biophysics analyses. Our strategy is to layer higher-level services and introduce improved tools within the user’s familiar environmentmore » without preventing researchers from using traditional tools and methods. Our desire is to share these experiences to serve as an example for effectively analyzing data intensive large scale simulation data.« less
Verifiable fault tolerance in measurement-based quantum computation

NASA Astrophysics Data System (ADS)

Fujii, Keisuke; Hayashi, Masahito

2017-09-01

Quantum systems, in general, cannot be simulated efficiently by a classical computer, and hence are useful for solving certain mathematical problems and simulating quantum many-body systems. This also implies, unfortunately, that verification of the output of the quantum systems is not so trivial, since predicting the output is exponentially hard. As another problem, the quantum system is very delicate for noise and thus needs an error correction. Here, we propose a framework for verification of the output of fault-tolerant quantum computation in a measurement-based model. In contrast to existing analyses on fault tolerance, we do not assume any noise model on the resource state, but an arbitrary resource state is tested by using only single-qubit measurements to verify whether or not the output of measurement-based quantum computation on it is correct. Verifiability is equipped by a constant time repetition of the original measurement-based quantum computation in appropriate measurement bases. Since full characterization of quantum noise is exponentially hard for large-scale quantum computing systems, our framework provides an efficient way to practically verify the experimental quantum error correction.
Aggregating Data for Computational Toxicology Applications ...

EPA Pesticide Factsheets

Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built usi
Sign: large-scale gene network estimation environment for high performance computing.

PubMed

Tamada, Yoshinori; Shimamura, Teppei; Yamaguchi, Rui; Imoto, Seiya; Nagasaki, Masao; Miyano, Satoru

2011-01-01

Our research group is currently developing software for estimating large-scale gene networks from gene expression data. The software, called SiGN, is specifically designed for the Japanese flagship supercomputer "K computer" which is planned to achieve 10 petaflops in 2012, and other high performance computing environments including Human Genome Center (HGC) supercomputer system. SiGN is a collection of gene network estimation software with three different sub-programs: SiGN-BN, SiGN-SSM and SiGN-L1. In these three programs, five different models are available: static and dynamic nonparametric Bayesian networks, state space models, graphical Gaussian models, and vector autoregressive models. All these models require a huge amount of computational resources for estimating large-scale gene networks and therefore are designed to be able to exploit the speed of 10 petaflops. The software will be available freely for "K computer" and HGC supercomputer system users. The estimated networks can be viewed and analyzed by Cell Illustrator Online and SBiP (Systems Biology integrative Pipeline). The software project web site is available at http://sign.hgc.jp/ .
Signal and image processing algorithm performance in a virtual and elastic computing environment

NASA Astrophysics Data System (ADS)

Bennett, Kelly W.; Robertson, James

2013-05-01

The U.S. Army Research Laboratory (ARL) supports the development of classification, detection, tracking, and localization algorithms using multiple sensing modalities including acoustic, seismic, E-field, magnetic field, PIR, and visual and IR imaging. Multimodal sensors collect large amounts of data in support of algorithm development. The resulting large amount of data, and their associated high-performance computing needs, increases and challenges existing computing infrastructures. Purchasing computer power as a commodity using a Cloud service offers low-cost, pay-as-you-go pricing models, scalability, and elasticity that may provide solutions to develop and optimize algorithms without having to procure additional hardware and resources. This paper provides a detailed look at using a commercial cloud service provider, such as Amazon Web Services (AWS), to develop and deploy simple signal and image processing algorithms in a cloud and run the algorithms on a large set of data archived in the ARL Multimodal Signatures Database (MMSDB). Analytical results will provide performance comparisons with existing infrastructure. A discussion on using cloud computing with government data will discuss best security practices that exist within cloud services, such as AWS.
Quo vadis: Hydrologic inverse analyses using high-performance computing and a D-Wave quantum annealer

NASA Astrophysics Data System (ADS)

O'Malley, D.; Vesselinov, V. V.

2017-12-01

Classical microprocessors have had a dramatic impact on hydrology for decades, due largely to the exponential growth in computing power predicted by Moore's law. However, this growth is not expected to continue indefinitely and has already begun to slow. Quantum computing is an emerging alternative to classical microprocessors. Here, we demonstrated cutting edge inverse model analyses utilizing some of the best available resources in both worlds: high-performance classical computing and a D-Wave quantum annealer. The classical high-performance computing resources are utilized to build an advanced numerical model that assimilates data from O(10^5) observations, including water levels, drawdowns, and contaminant concentrations. The developed model accurately reproduces the hydrologic conditions at a Los Alamos National Laboratory contamination site, and can be leveraged to inform decision-making about site remediation. We demonstrate the use of a D-Wave 2X quantum annealer to solve hydrologic inverse problems. This work can be seen as an early step in quantum-computational hydrology. We compare and contrast our results with an early inverse approach in classical-computational hydrology that is comparable to the approach we use with quantum annealing. Our results show that quantum annealing can be useful for identifying regions of high and low permeability within an aquifer. While the problems we consider are small-scale compared to the problems that can be solved with modern classical computers, they are large compared to the problems that could be solved with early classical CPUs. Further, the binary nature of the high/low permeability problem makes it well-suited to quantum annealing, but challenging for classical computers.
Facilitating NASA Earth Science Data Processing Using Nebula Cloud Computing

NASA Technical Reports Server (NTRS)

Pham, Long; Chen, Aijun; Kempler, Steven; Lynnes, Christopher; Theobald, Michael; Asghar, Esfandiari; Campino, Jane; Vollmer, Bruce

2011-01-01

Cloud Computing has been implemented in several commercial arenas. The NASA Nebula Cloud Computing platform is an Infrastructure as a Service (IaaS) built in 2008 at NASA Ames Research Center and 2010 at GSFC. Nebula is an open source Cloud platform intended to: a) Make NASA realize significant cost savings through efficient resource utilization, reduced energy consumption, and reduced labor costs. b) Provide an easier way for NASA scientists and researchers to efficiently explore and share large and complex data sets. c) Allow customers to provision, manage, and decommission computing capabilities on an as-needed bases
Design and Implement of Astronomical Cloud Computing Environment In China-VO

NASA Astrophysics Data System (ADS)

Li, Changhua; Cui, Chenzhou; Mi, Linying; He, Boliang; Fan, Dongwei; Li, Shanshan; Yang, Sisi; Xu, Yunfei; Han, Jun; Chen, Junyi; Zhang, Hailong; Yu, Ce; Xiao, Jian; Wang, Chuanjun; Cao, Zihuang; Fan, Yufeng; Liu, Liang; Chen, Xiao; Song, Wenming; Du, Kangyu

2017-06-01

Astronomy cloud computing environment is a cyber-Infrastructure for Astronomy Research initiated by Chinese Virtual Observatory (China-VO) under funding support from NDRC (National Development and Reform commission) and CAS (Chinese Academy of Sciences). Based on virtualization technology, astronomy cloud computing environment was designed and implemented by China-VO team. It consists of five distributed nodes across the mainland of China. Astronomer can get compuitng and storage resource in this cloud computing environment. Through this environments, astronomer can easily search and analyze astronomical data collected by different telescopes and data centers , and avoid the large scale dataset transportation.
Implementation of DFT application on ternary optical computer

NASA Astrophysics Data System (ADS)

Junjie, Peng; Youyi, Fu; Xiaofeng, Zhang; Shuai, Kong; Xinyu, Wei

2018-03-01

As its characteristics of huge number of data bits and low energy consumption, optical computing may be used in the applications such as DFT etc. which needs a lot of computation and can be implemented in parallel. According to this, DFT implementation methods in full parallel as well as in partial parallel are presented. Based on resources ternary optical computer (TOC), extensive experiments were carried out. Experimental results show that the proposed schemes are correct and feasible. They provide a foundation for further exploration of the applications on TOC that needs a large amount calculation and can be processed in parallel.
Fault-tolerant linear optical quantum computing with small-amplitude coherent States.

PubMed

Lund, A P; Ralph, T C; Haselgrove, H L

2008-01-25

Quantum computing using two coherent states as a qubit basis is a proposed alternative architecture with lower overheads but has been questioned as a practical way of performing quantum computing due to the fragility of diagonal states with large coherent amplitudes. We show that using error correction only small amplitudes (alpha>1.2) are required for fault-tolerant quantum computing. We study fault tolerance under the effects of small amplitudes and loss using a Monte Carlo simulation. The first encoding level resources are orders of magnitude lower than the best single photon scheme.
Advanced Optical Burst Switched Network Concepts

NASA Astrophysics Data System (ADS)

Nejabati, Reza; Aracil, Javier; Castoldi, Piero; de Leenheer, Marc; Simeonidou, Dimitra; Valcarenghi, Luca; Zervas, Georgios; Wu, Jian

In recent years, as the bandwidth and the speed of networks have increased significantly, a new generation of network-based applications using the concept of distributed computing and collaborative services is emerging (e.g., Grid computing applications). The use of the available fiber and DWDM infrastructure for these applications is a logical choice offering huge amounts of cheap bandwidth and ensuring global reach of computing resources [230]. Currently, there is a great deal of interest in deploying optical circuit (wavelength) switched network infrastructure for distributed computing applications that require long-lived wavelength paths and address the specific needs of a small number of well-known users. Typical users are particle physicists who, due to their international collaborations and experiments, generate enormous amounts of data (Petabytes per year). These users require a network infrastructures that can support processing and analysis of large datasets through globally distributed computing resources [230]. However, providing wavelength granularity bandwidth services is not an efficient and scalable solution for applications and services that address a wider base of user communities with different traffic profiles and connectivity requirements. Examples of such applications may be: scientific collaboration in smaller scale (e.g., bioinformatics, environmental research), distributed virtual laboratories (e.g., remote instrumentation), e-health, national security and defense, personalized learning environments and digital libraries, evolving broadband user services (i.e., high resolution home video editing, real-time rendering, high definition interactive TV). As a specific example, in e-health services and in particular mammography applications due to the size and quantity of images produced by remote mammography, stringent network requirements are necessary. Initial calculations have shown that for 100 patients to be screened remotely, the network would have to securely transport 1.2 GB of data every 30 s [230]. According to the above explanation it is clear that these types of applications need a new network infrastructure and transport technology that makes large amounts of bandwidth at subwavelength granularity, storage, computation, and visualization resources potentially available to a wide user base for specified time durations. As these types of collaborative and network-based applications evolve addressing a wide range and large number of users, it is infeasible to build dedicated networks for each application type or category. Consequently, there should be an adaptive network infrastructure able to support all application types, each with their own access, network, and resource usage patterns. This infrastructure should offer flexible and intelligent network elements and control mechanism able to deploy new applications quickly and efficiently.

e-Infrastructures for Astronomy: An Integrated View

NASA Astrophysics Data System (ADS)

Pasian, F.; Longo, G.

2010-12-01

As for other disciplines, the capability of performing “Big Science” in astrophysics requires the availability of large facilities. In the field of ICT, computational resources (e.g. HPC) are important, but are far from being enough for the community: as a matter of fact, the whole set of e-infrastructures (network, computing nodes, data repositories, applications) need to work in an interoperable way. This implies the development of common (or at least compatible) user interfaces to computing resources, transparent access to observations and numerical simulations through the Virtual Observatory, integrated data processing pipelines, data mining and semantic web applications. Achieving this interoperability goal is a must to build a real “Knowledge Infrastructure” in the astrophysical domain. Also, the emergence of new professional profiles (e.g. the “astro-informatician”) is necessary to allow defining and implementing properly this conceptual schema.
Exploiting short-term memory in soft body dynamics as a computational resource.

PubMed

Nakajima, K; Li, T; Hauser, H; Pfeifer, R

2014-11-06

Soft materials are not only highly deformable, but they also possess rich and diverse body dynamics. Soft body dynamics exhibit a variety of properties, including nonlinearity, elasticity and potentially infinitely many degrees of freedom. Here, we demonstrate that such soft body dynamics can be employed to conduct certain types of computation. Using body dynamics generated from a soft silicone arm, we show that they can be exploited to emulate functions that require memory and to embed robust closed-loop control into the arm. Our results suggest that soft body dynamics have a short-term memory and can serve as a computational resource. This finding paves the way towards exploiting passive body dynamics for control of a large class of underactuated systems. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience

PubMed Central

Stockton, David B.; Santamaria, Fidel

2015-01-01

We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project. PMID:26528175
NeuroManager: a workflow analysis based simulation management engine for computational neuroscience.

PubMed

Stockton, David B; Santamaria, Fidel

2015-01-01

We developed NeuroManager, an object-oriented simulation management software engine for computational neuroscience. NeuroManager automates the workflow of simulation job submissions when using heterogeneous computational resources, simulators, and simulation tasks. The object-oriented approach (1) provides flexibility to adapt to a variety of neuroscience simulators, (2) simplifies the use of heterogeneous computational resources, from desktops to super computer clusters, and (3) improves tracking of simulator/simulation evolution. We implemented NeuroManager in MATLAB, a widely used engineering and scientific language, for its signal and image processing tools, prevalence in electrophysiology analysis, and increasing use in college Biology education. To design and develop NeuroManager we analyzed the workflow of simulation submission for a variety of simulators, operating systems, and computational resources, including the handling of input parameters, data, models, results, and analyses. This resulted in 22 stages of simulation submission workflow. The software incorporates progress notification, automatic organization, labeling, and time-stamping of data and results, and integrated access to MATLAB's analysis and visualization tools. NeuroManager provides users with the tools to automate daily tasks, and assists principal investigators in tracking and recreating the evolution of research projects performed by multiple people. Overall, NeuroManager provides the infrastructure needed to improve workflow, manage multiple simultaneous simulations, and maintain provenance of the potentially large amounts of data produced during the course of a research project.
KITTEN Lightweight Kernel 0.1 Beta

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pedretti, Kevin; Levenhagen, Michael; Kelly, Suzanne

2007-12-12

The Kitten Lightweight Kernel is a simplified OS (operating system) kernel that is intended to manage a compute node's hardware resources. It provides a set of mechanisms to user-level applications for utilizing hardware resources (e.g., allocating memory, creating processes, accessing the network). Kitten is much simpler than general-purpose OS kernels, such as Linux or Windows, but includes all of the esssential functionality needed to support HPC (high-performance computing) MPI, PGAS and OpenMP applications. Kitten provides unique capabilities such as physically contiguous application memory, transparent large page support, and noise-free tick-less operation, which enable HPC applications to obtain greater efficiency andmore » scalability than with general purpose OS kernels.« less
FPGA-Based Stochastic Echo State Networks for Time-Series Forecasting.

PubMed

Alomar, Miquel L; Canals, Vincent; Perez-Mora, Nicolas; Martínez-Moll, Víctor; Rosselló, Josep L

2016-01-01

Hardware implementation of artificial neural networks (ANNs) allows exploiting the inherent parallelism of these systems. Nevertheless, they require a large amount of resources in terms of area and power dissipation. Recently, Reservoir Computing (RC) has arisen as a strategic technique to design recurrent neural networks (RNNs) with simple learning capabilities. In this work, we show a new approach to implement RC systems with digital gates. The proposed method is based on the use of probabilistic computing concepts to reduce the hardware required to implement different arithmetic operations. The result is the development of a highly functional system with low hardware resources. The presented methodology is applied to chaotic time-series forecasting.
FPGA-Based Stochastic Echo State Networks for Time-Series Forecasting

PubMed Central

Alomar, Miquel L.; Canals, Vincent; Perez-Mora, Nicolas; Martínez-Moll, Víctor; Rosselló, Josep L.

2016-01-01

Hardware implementation of artificial neural networks (ANNs) allows exploiting the inherent parallelism of these systems. Nevertheless, they require a large amount of resources in terms of area and power dissipation. Recently, Reservoir Computing (RC) has arisen as a strategic technique to design recurrent neural networks (RNNs) with simple learning capabilities. In this work, we show a new approach to implement RC systems with digital gates. The proposed method is based on the use of probabilistic computing concepts to reduce the hardware required to implement different arithmetic operations. The result is the development of a highly functional system with low hardware resources. The presented methodology is applied to chaotic time-series forecasting. PMID:26880876
GPU-computing in econophysics and statistical physics

NASA Astrophysics Data System (ADS)

Preis, T.

2011-03-01

A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics - the Ising model - is ported to a graphics card architecture as well, resulting in large speedup values.
Computational Models of Consumer Confidence from Large-Scale Online Attention Data: Crowd-Sourcing Econometrics

PubMed Central

2015-01-01

Economies are instances of complex socio-technical systems that are shaped by the interactions of large numbers of individuals. The individual behavior and decision-making of consumer agents is determined by complex psychological dynamics that include their own assessment of present and future economic conditions as well as those of others, potentially leading to feedback loops that affect the macroscopic state of the economic system. We propose that the large-scale interactions of a nation's citizens with its online resources can reveal the complex dynamics of their collective psychology, including their assessment of future system states. Here we introduce a behavioral index of Chinese Consumer Confidence (C3I) that computationally relates large-scale online search behavior recorded by Google Trends data to the macroscopic variable of consumer confidence. Our results indicate that such computational indices may reveal the components and complex dynamics of consumer psychology as a collective socio-economic phenomenon, potentially leading to improved and more refined economic forecasting. PMID:25826692
Computational models of consumer confidence from large-scale online attention data: crowd-sourcing econometrics.

PubMed

Dong, Xianlei; Bollen, Johan

2015-01-01

Economies are instances of complex socio-technical systems that are shaped by the interactions of large numbers of individuals. The individual behavior and decision-making of consumer agents is determined by complex psychological dynamics that include their own assessment of present and future economic conditions as well as those of others, potentially leading to feedback loops that affect the macroscopic state of the economic system. We propose that the large-scale interactions of a nation's citizens with its online resources can reveal the complex dynamics of their collective psychology, including their assessment of future system states. Here we introduce a behavioral index of Chinese Consumer Confidence (C3I) that computationally relates large-scale online search behavior recorded by Google Trends data to the macroscopic variable of consumer confidence. Our results indicate that such computational indices may reveal the components and complex dynamics of consumer psychology as a collective socio-economic phenomenon, potentially leading to improved and more refined economic forecasting.
Low Cost, Scalable Proteomics Data Analysis Using Amazon's Cloud Computing Services and Open Source Search Algorithms

PubMed Central

Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.

2009-01-01

One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578
Integration of a neuroimaging processing pipeline into a pan-canadian computing grid

NASA Astrophysics Data System (ADS)

Lavoie-Courchesne, S.; Rioux, P.; Chouinard-Decorte, F.; Sherif, T.; Rousseau, M.-E.; Das, S.; Adalat, R.; Doyon, J.; Craddock, C.; Margulies, D.; Chu, C.; Lyttelton, O.; Evans, A. C.; Bellec, P.

2012-02-01

The ethos of the neuroimaging field is quickly moving towards the open sharing of resources, including both imaging databases and processing tools. As a neuroimaging database represents a large volume of datasets and as neuroimaging processing pipelines are composed of heterogeneous, computationally intensive tools, such open sharing raises specific computational challenges. This motivates the design of novel dedicated computing infrastructures. This paper describes an interface between PSOM, a code-oriented pipeline development framework, and CBRAIN, a web-oriented platform for grid computing. This interface was used to integrate a PSOM-compliant pipeline for preprocessing of structural and functional magnetic resonance imaging into CBRAIN. We further tested the capacity of our infrastructure to handle a real large-scale project. A neuroimaging database including close to 1000 subjects was preprocessed using our interface and publicly released to help the participants of the ADHD-200 international competition. This successful experiment demonstrated that our integrated grid-computing platform is a powerful solution for high-throughput pipeline analysis in the field of neuroimaging.
Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms.

PubMed

Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N

2009-06-01

One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).
The HEPCloud Facility: elastic computing for High Energy Physics – The NOvA Use Case

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fuess, S.; Garzoglio, G.; Holzman, B.

The need for computing in the HEP community follows cycles of peaks and valleys mainly driven by conference dates, accelerator shutdown, holiday schedules, and other factors. Because of this, the classical method of provisioning these resources at providing facilities has drawbacks such as potential overprovisioning. As the appetite for computing increases, however, so does the need to maximize cost efficiency by developing a model for dynamically provisioning resources only when needed. To address this issue, the HEPCloud project was launched by the Fermilab Scientific Computing Division in June 2015. Its goal is to develop a facility that provides a commonmore » interface to a variety of resources, including local clusters, grids, high performance computers, and community and commercial Clouds. Initially targeted experiments include CMS and NOvA, as well as other Fermilab stakeholders. In its first phase, the project has demonstrated the use of the “elastic” provisioning model offered by commercial clouds, such as Amazon Web Services. In this model, resources are rented and provisioned automatically over the Internet upon request. In January 2016, the project demonstrated the ability to increase the total amount of global CMS resources by 58,000 cores from 150,000 cores - a 25 percent increase - in preparation for the Recontres de Moriond. In March 2016, the NOvA experiment has also demonstrated resource burst capabilities with an additional 7,300 cores, achieving a scale almost four times as large as the local allocated resources and utilizing the local AWS s3 storage to optimize data handling operations and costs. NOvA was using the same familiar services used for local computations, such as data handling and job submission, in preparation for the Neutrino 2016 conference. In both cases, the cost was contained by the use of the Amazon Spot Instance Market and the Decision Engine, a HEPCloud component that aims at minimizing cost and job interruption. This paper describes the Fermilab HEPCloud Facility and the challenges overcome for the CMS and NOvA communities.« less
Findings and Challenges in Fine-Resolution Large-Scale Hydrological Modeling

NASA Astrophysics Data System (ADS)

Her, Y. G.

2017-12-01

Fine-resolution large-scale (FL) modeling can provide the overall picture of the hydrological cycle and transport while taking into account unique local conditions in the simulation. It can also help develop water resources management plans consistent across spatial scales by describing the spatial consequences of decisions and hydrological events extensively. FL modeling is expected to be common in the near future as global-scale remotely sensed data are emerging, and computing resources have been advanced rapidly. There are several spatially distributed models available for hydrological analyses. Some of them rely on numerical methods such as finite difference/element methods (FDM/FEM), which require excessive computing resources (implicit scheme) to manipulate large matrices or small simulation time intervals (explicit scheme) to maintain the stability of the solution, to describe two-dimensional overland processes. Others make unrealistic assumptions such as constant overland flow velocity to reduce the computational loads of the simulation. Thus, simulation efficiency often comes at the expense of precision and reliability in FL modeling. Here, we introduce a new FL continuous hydrological model and its application to four watersheds in different landscapes and sizes from 3.5 km2 to 2,800 km2 at the spatial resolution of 30 m on an hourly basis. The model provided acceptable accuracy statistics in reproducing hydrological observations made in the watersheds. The modeling outputs including the maps of simulated travel time, runoff depth, soil water content, and groundwater recharge, were animated, visualizing the dynamics of hydrological processes occurring in the watersheds during and between storm events. Findings and challenges were discussed in the context of modeling efficiency, accuracy, and reproducibility, which we found can be improved by employing advanced computing techniques and hydrological understandings, by using remotely sensed hydrological observations such as soil moisture and radar rainfall depth and by sharing the model and its codes in public domain, respectively.
Price schedules coordination for electricity pool markets

NASA Astrophysics Data System (ADS)

Legbedji, Alexis Motto

2002-04-01

We consider the optimal coordination of a class of mathematical programs with equilibrium constraints, which is formally interpreted as a resource-allocation problem. Many decomposition techniques were proposed to circumvent the difficulty of solving large systems with limited computer resources. The considerable improvement in computer architecture has allowed the solution of large-scale problems with increasing speed. Consequently, interest in decomposition techniques has waned. Nonetheless, there is an important class of applications for which decomposition techniques will still be relevant, among others, distributed systems---the Internet, perhaps, being the most conspicuous example---and competitive economic systems. Conceptually, a competitive economic system is a collection of agents that have similar or different objectives while sharing the same system resources. In theory, constructing a large-scale mathematical program and solving it centrally, using currently available computing power can optimize such systems of agents. In practice, however, because agents are self-interested and not willing to reveal some sensitive corporate data, one cannot solve these kinds of coordination problems by simply maximizing the sum of agent's objective functions with respect to their constraints. An iterative price decomposition or Lagrangian dual method is considered best suited because it can operate with limited information. A price-directed strategy, however, can only work successfully when coordinating or equilibrium prices exist, which is not generally the case when a weak duality is unavoidable. Showing when such prices exist and how to compute them is the main subject of this thesis. Among our results, we show that, if the Lagrangian function of a primal program is additively separable, price schedules coordination may be attained. The prices are Lagrange multipliers, and are also the decision variables of a dual program. In addition, we propose a new form of augmented or nonlinear pricing, which is an example of the use of penalty functions in mathematical programming. Applications are drawn from mathematical programming problems of the form arising in electric power system scheduling under competition.
A Modeling Framework for Optimal Computational Resource Allocation Estimation: Considering the Trade-offs between Physical Resolutions, Uncertainty and Computational Costs

NASA Astrophysics Data System (ADS)

Moslehi, M.; de Barros, F.; Rajagopal, R.

2014-12-01

Hydrogeological models that represent flow and transport in subsurface domains are usually large-scale with excessive computational complexity and uncertain characteristics. Uncertainty quantification for predicting flow and transport in heterogeneous formations often entails utilizing a numerical Monte Carlo framework, which repeatedly simulates the model according to a random field representing hydrogeological characteristics of the field. The physical resolution (e.g. grid resolution associated with the physical space) for the simulation is customarily chosen based on recommendations in the literature, independent of the number of Monte Carlo realizations. This practice may lead to either excessive computational burden or inaccurate solutions. We propose an optimization-based methodology that considers the trade-off between the following conflicting objectives: time associated with computational costs, statistical convergence of the model predictions and physical errors corresponding to numerical grid resolution. In this research, we optimally allocate computational resources by developing a modeling framework for the overall error based on a joint statistical and numerical analysis and optimizing the error model subject to a given computational constraint. The derived expression for the overall error explicitly takes into account the joint dependence between the discretization error of the physical space and the statistical error associated with Monte Carlo realizations. The accuracy of the proposed framework is verified in this study by applying it to several computationally extensive examples. Having this framework at hand aims hydrogeologists to achieve the optimum physical and statistical resolutions to minimize the error with a given computational budget. Moreover, the influence of the available computational resources and the geometric properties of the contaminant source zone on the optimum resolutions are investigated. We conclude that the computational cost associated with optimal allocation can be substantially reduced compared with prevalent recommendations in the literature.
Development and implementation of a PACS network and resource manager

NASA Astrophysics Data System (ADS)

Stewart, Brent K.; Taira, Ricky K.; Dwyer, Samuel J., III; Huang, H. K.

1992-07-01

Clinical acceptance of PACS is predicated upon maximum uptime. Upon component failure, detection, diagnosis, reconfiguration and repair must occur immediately. Our current PACS network is large, heterogeneous, complex and wide-spread geographically. The overwhelming number of network devices, computers and software processes involved in a departmental or inter-institutional PACS makes development of tools for network and resource management critical. The authors have developed and implemented a comprehensive solution (PACS Network-Resource Manager) using the OSI Network Management Framework with network element agents that respond to queries and commands for network management stations. Managed resources include: communication protocol layers for Ethernet, FDDI and UltraNet; network devices; computer and operating system resources; and application, database and network services. The Network-Resource Manager is currently being used for warning, fault, security violation and configuration modification event notification. Analysis, automation and control applications have been added so that PACS resources can be dynamically reconfigured and so that users are notified when active involvement is required. Custom data and error logging have been implemented that allow statistics for each PACS subsystem to be charted for performance data. The Network-Resource Manager allows our departmental PACS system to be monitored continuously and thoroughly, with a minimal amount of personal involvement and time.
Comparison of numerical weather prediction based deterministic and probabilistic wind resource assessment methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Jie; Draxl, Caroline; Hopson, Thomas

Numerical weather prediction (NWP) models have been widely used for wind resource assessment. Model runs with higher spatial resolution are generally more accurate, yet extremely computational expensive. An alternative approach is to use data generated by a low resolution NWP model, in conjunction with statistical methods. In order to analyze the accuracy and computational efficiency of different types of NWP-based wind resource assessment methods, this paper performs a comparison of three deterministic and probabilistic NWP-based wind resource assessment methodologies: (i) a coarse resolution (0.5 degrees x 0.67 degrees) global reanalysis data set, the Modern-Era Retrospective Analysis for Research and Applicationsmore » (MERRA); (ii) an analog ensemble methodology based on the MERRA, which provides both deterministic and probabilistic predictions; and (iii) a fine resolution (2-km) NWP data set, the Wind Integration National Dataset (WIND) Toolkit, based on the Weather Research and Forecasting model. Results show that: (i) as expected, the analog ensemble and WIND Toolkit perform significantly better than MERRA confirming their ability to downscale coarse estimates; (ii) the analog ensemble provides the best estimate of the multi-year wind distribution at seven of the nine sites, while the WIND Toolkit is the best at one site; (iii) the WIND Toolkit is more accurate in estimating the distribution of hourly wind speed differences, which characterizes the wind variability, at five of the available sites, with the analog ensemble being best at the remaining four locations; and (iv) the analog ensemble computational cost is negligible, whereas the WIND Toolkit requires large computational resources. Future efforts could focus on the combination of the analog ensemble with intermediate resolution (e.g., 10-15 km) NWP estimates, to considerably reduce the computational burden, while providing accurate deterministic estimates and reliable probabilistic assessments.« less
Overview of the LINCS architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fletcher, J.G.; Watson, R.W.

1982-01-13

Computing at the Lawrence Livermore National Laboratory (LLNL) has evolved over the past 15 years with a computer network based resource sharing environment. The increasing use of low cost and high performance micro, mini and midi computers and commercially available local networking systems will accelerate this trend. Further, even the large scale computer systems, on which much of the LLNL scientific computing depends, are evolving into multiprocessor systems. It is our belief that the most cost effective use of this environment will depend on the development of application systems structured into cooperating concurrent program modules (processes) distributed appropriately over differentmore » nodes of the environment. A node is defined as one or more processors with a local (shared) high speed memory. Given the latter view, the environment can be characterized as consisting of: multiple nodes communicating over noisy channels with arbitrary delays and throughput, heterogenous base resources and information encodings, no single administration controlling all resources, distributed system state, and no uniform time base. The system design problem is - how to turn the heterogeneous base hardware/firmware/software resources of this environment into a coherent set of resources that facilitate development of cost effective, reliable, and human engineered applications. We believe the answer lies in developing a layered, communication oriented distributed system architecture; layered and modular to support ease of understanding, reconfiguration, extensibility, and hiding of implementation or nonessential local details; communication oriented because that is a central feature of the environment. The Livermore Interactive Network Communication System (LINCS) is a hierarchical architecture designed to meet the above needs. While having characteristics in common with other architectures, it differs in several respects.« less

An efficient two-stage approach for image-based FSI analysis of atherosclerotic arteries

PubMed Central

Rayz, Vitaliy L.; Mofrad, Mohammad R. K.; Saloner, David

2010-01-01

Patient-specific biomechanical modeling of atherosclerotic arteries has the potential to aid clinicians in characterizing lesions and determining optimal treatment plans. To attain high levels of accuracy, recent models use medical imaging data to determine plaque component boundaries in three dimensions, and fluid–structure interaction is used to capture mechanical loading of the diseased vessel. As the plaque components and vessel wall are often highly complex in shape, constructing a suitable structured computational mesh is very challenging and can require a great deal of time. Models based on unstructured computational meshes require relatively less time to construct and are capable of accurately representing plaque components in three dimensions. These models unfortunately require additional computational resources and computing time for accurate and meaningful results. A two-stage modeling strategy based on unstructured computational meshes is proposed to achieve a reasonable balance between meshing difficulty and computational resource and time demand. In this method, a coarsegrained simulation of the full arterial domain is used to guide and constrain a fine-scale simulation of a smaller region of interest within the full domain. Results for a patient-specific carotid bifurcation model demonstrate that the two-stage approach can afford a large savings in both time for mesh generation and time and resources needed for computation. The effects of solid and fluid domain truncation were explored, and were shown to minimally affect accuracy of the stress fields predicted with the two-stage approach. PMID:19756798
Implementing Scientific Simulation Codes Highly Tailored for Vector Architectures Using Custom Configurable Computing Machines

NASA Technical Reports Server (NTRS)

Rutishauser, David

2006-01-01

The motivation for this work comes from an observation that amidst the push for Massively Parallel (MP) solutions to high-end computing problems such as numerical physical simulations, large amounts of legacy code exist that are highly optimized for vector supercomputers. Because re-hosting legacy code often requires a complete re-write of the original code, which can be a very long and expensive effort, this work examines the potential to exploit reconfigurable computing machines in place of a vector supercomputer to implement an essentially unmodified legacy source code. Custom and reconfigurable computing resources could be used to emulate an original application's target platform to the extent required to achieve high performance. To arrive at an architecture that delivers the desired performance subject to limited resources involves solving a multi-variable optimization problem with constraints. Prior research in the area of reconfigurable computing has demonstrated that designing an optimum hardware implementation of a given application under hardware resource constraints is an NP-complete problem. The premise of the approach is that the general issue of applying reconfigurable computing resources to the implementation of an application, maximizing the performance of the computation subject to physical resource constraints, can be made a tractable problem by assuming a computational paradigm, such as vector processing. This research contributes a formulation of the problem and a methodology to design a reconfigurable vector processing implementation of a given application that satisfies a performance metric. A generic, parametric, architectural framework for vector processing implemented in reconfigurable logic is developed as a target for a scheduling/mapping algorithm that maps an input computation to a given instance of the architecture. This algorithm is integrated with an optimization framework to arrive at a specification of the architecture parameters that attempts to minimize execution time, while staying within resource constraints. The flexibility of using a custom reconfigurable implementation is exploited in a unique manner to leverage the lessons learned in vector supercomputer development. The vector processing framework is tailored to the application, with variable parameters that are fixed in traditional vector processing. Benchmark data that demonstrates the functionality and utility of the approach is presented. The benchmark data includes an identified bottleneck in a real case study example vector code, the NASA Langley Terminal Area Simulation System (TASS) application.
High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joslyn, Cliff A.; Adolf, Robert D.; Al-Saffar, Sinan

2010-10-04

As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors.« less
GeoBrain Computational Cyber-laboratory for Earth Science Studies

NASA Astrophysics Data System (ADS)

Deng, M.; di, L.

2009-12-01

Computational approaches (e.g., computer-based data visualization, analysis and modeling) are critical for conducting increasingly data-intensive Earth science (ES) studies to understand functions and changes of the Earth system. However, currently Earth scientists, educators, and students have met two major barriers that prevent them from being effectively using computational approaches in their learning, research and application activities. The two barriers are: 1) difficulties in finding, obtaining, and using multi-source ES data; and 2) lack of analytic functions and computing resources (e.g., analysis software, computing models, and high performance computing systems) to analyze the data. Taking advantages of recent advances in cyberinfrastructure, Web service, and geospatial interoperability technologies, GeoBrain, a project funded by NASA, has developed a prototype computational cyber-laboratory to effectively remove the two barriers. The cyber-laboratory makes ES data and computational resources at large organizations in distributed locations available to and easily usable by the Earth science community through 1) enabling seamless discovery, access and retrieval of distributed data, 2) federating and enhancing data discovery with a catalogue federation service and a semantically-augmented catalogue service, 3) customizing data access and retrieval at user request with interoperable, personalized, and on-demand data access and services, 4) automating or semi-automating multi-source geospatial data integration, 5) developing a large number of analytic functions as value-added, interoperable, and dynamically chainable geospatial Web services and deploying them in high-performance computing facilities, 6) enabling the online geospatial process modeling and execution, and 7) building a user-friendly extensible web portal for users to access the cyber-laboratory resources. Users can interactively discover the needed data and perform on-demand data analysis and modeling through the web portal. The GeoBrain cyber-laboratory provides solutions to meet common needs of ES research and education, such as, distributed data access and analysis services, easy access to and use of ES data, and enhanced geoprocessing and geospatial modeling capability. It greatly facilitates ES research, education, and applications. The development of the cyber-laboratory provides insights, lessons-learned, and technology readiness to build more capable computing infrastructure for ES studies, which can meet wide-range needs of current and future generations of scientists, researchers, educators, and students for their formal or informal educational training, research projects, career development, and lifelong learning.
Integrating information about location and value of resources by white-faced saki monkeys (Pithecia pithecia).

PubMed

Cunningham, Elena; Janson, Charles

2007-07-01

Most studies of spatial memory in primates focus on species that inhabit large home ranges and have dispersed, patchy resources. Researchers assume that primates use memory to minimize distances traveled between resources. We investigated the use of spatial memory in a group of six white-faced sakis (Pithecia pithecia) on 12.8-ha Round Island, Guri Lake, Venezuela during a period of fruit abundance. The sakis' movements were analyzed with logistic regressions, a predictive computer model and a computer model that simulates movements. We considered all the resources available to the sakis and compared observed distances to predicted distances from a computer model for foragers who know nothing about the location of resources. Surprisingly, the observed distances were four times greater than the predicted distances, suggesting that the sakis passed by a majority of the available fruit trees without feeding. The odds of visiting a food tree, however, were significantly increased if the tree had been visited in the previous 3 days and had more than 100 fruit. The sakis' preferred resources were highly productive fruit trees, Capparis trees, and trees with water holes. They traveled efficiently to these sites. The sakis choice of feeding sites indicate that they combined knowledge acquired by repeatedly traveling through their home range with 'what' and 'where' information gained from individual visits to resources. Although the sakis' foraging choices increased the distance they traveled overall, choosing more valued sites allowed the group to minimize intra-group feeding competition, maintain intergroup dominance over important resources, and monitor the state of resources throughout their home range. The sakis' foraging decisions appear to have used spatial memory, elements of episodic-like memory and social and nutritional considerations.
A Computationally Efficient Parallel Levenberg-Marquardt Algorithm for Large-Scale Big-Data Inversion

NASA Astrophysics Data System (ADS)

Lin, Y.; O'Malley, D.; Vesselinov, V. V.

2015-12-01

Inverse modeling seeks model parameters given a set of observed state variables. However, for many practical problems due to the facts that the observed data sets are often large and model parameters are often numerous, conventional methods for solving the inverse modeling can be computationally expensive. We have developed a new, computationally-efficient Levenberg-Marquardt method for solving large-scale inverse modeling. Levenberg-Marquardt methods require the solution of a dense linear system of equations which can be prohibitively expensive to compute for large-scale inverse problems. Our novel method projects the original large-scale linear problem down to a Krylov subspace, such that the dimensionality of the measurements can be significantly reduced. Furthermore, instead of solving the linear system for every Levenberg-Marquardt damping parameter, we store the Krylov subspace computed when solving the first damping parameter and recycle it for all the following damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved by using these computational techniques. We apply this new inverse modeling method to invert for a random transitivity field. Our algorithm is fast enough to solve for the distributed model parameters (transitivity) at each computational node in the model domain. The inversion is also aided by the use regularization techniques. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. By comparing with a Levenberg-Marquardt method using standard linear inversion techniques, our Levenberg-Marquardt method yields speed-up ratio of 15 in a multi-core computational environment and a speed-up ratio of 45 in a single-core computational environment. Therefore, our new inverse modeling method is a powerful tool for large-scale applications.
bioNerDS: exploring bioinformatics’ database and software use through literature mining

PubMed Central

2013-01-01

Background Biology-focused databases and software define bioinformatics and their use is central to computational biology. In such a complex and dynamic field, it is of interest to understand what resources are available, which are used, how much they are used, and for what they are used. While scholarly literature surveys can provide some insights, large-scale computer-based approaches to identify mentions of bioinformatics databases and software from primary literature would automate systematic cataloguing, facilitate the monitoring of usage, and provide the foundations for the recovery of computational methods for analysing biological data, with the long-term aim of identifying best/common practice in different areas of biology. Results We have developed bioNerDS, a named entity recogniser for the recovery of bioinformatics databases and software from primary literature. We identify such entities with an F-measure ranging from 63% to 91% at the mention level and 63-78% at the document level, depending on corpus. Not attaining a higher F-measure is mostly due to high ambiguity in resource naming, which is compounded by the on-going introduction of new resources. To demonstrate the software, we applied bioNerDS to full-text articles from BMC Bioinformatics and Genome Biology. General mention patterns reflect the remit of these journals, highlighting BMC Bioinformatics’s emphasis on new tools and Genome Biology’s greater emphasis on data analysis. The data also illustrates some shifts in resource usage: for example, the past decade has seen R and the Gene Ontology join BLAST and GenBank as the main components in bioinformatics processing. Abstract Conclusions We demonstrate the feasibility of automatically identifying resource names on a large-scale from the scientific literature and show that the generated data can be used for exploration of bioinformatics database and software usage. For example, our results help to investigate the rate of change in resource usage and corroborate the suspicion that a vast majority of resources are created, but rarely (if ever) used thereafter. bioNerDS is available at http://bionerds.sourceforge.net/. PMID:23768135
Streaming support for data intensive cloud-based sequence analysis.

PubMed

Issa, Shadi A; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

2013-01-01

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of "resources-on-demand" and "pay-as-you-go", scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation.
Law of Large Numbers: The Theory, Applications and Technology-Based Education

ERIC Educational Resources Information Center

Dinov, Ivo D.; Christou, Nicolas; Gould, Robert

2009-01-01

Modern approaches for technology-based blended education utilize a variety of recently developed novel pedagogical, computational and network resources. Such attempts employ technology to deliver integrated, dynamically-linked, interactive-content and heterogeneous learning environments, which may improve student comprehension and information…
Networking Foundations for Collaborative Computing at Internet Scope

DTIC Science & Technology

2006-01-01

network-supported synchronous multime- dia groupwork at Internet scope and for large user groups. Contributions entail an novel classification for...multimedia resources in interactive groupwork , generalized to the domain of CSCW from the “right to speak” [26]. A floor control protocol mediates access to
ASIC-based architecture for the real-time computation of 2D convolution with large kernel size

NASA Astrophysics Data System (ADS)

Shao, Rui; Zhong, Sheng; Yan, Luxin

2015-12-01

Bidimensional convolution is a low-level processing algorithm of interest in many areas, but its high computational cost constrains the size of the kernels, especially in real-time embedded systems. This paper presents a hardware architecture for the ASIC-based implementation of 2-D convolution with medium-large kernels. Aiming to improve the efficiency of storage resources on-chip, reducing off-chip bandwidth of these two issues, proposed construction of a data cache reuse. Multi-block SPRAM to cross cached images and the on-chip ping-pong operation takes full advantage of the data convolution calculation reuse, design a new ASIC data scheduling scheme and overall architecture. Experimental results show that the structure can achieve 40× 32 size of template real-time convolution operations, and improve the utilization of on-chip memory bandwidth and on-chip memory resources, the experimental results show that the structure satisfies the conditions to maximize data throughput output , reducing the need for off-chip memory bandwidth.
Faster than Real-Time Dynamic Simulation for Large-Size Power System with Detailed Dynamic Models using High-Performance Computing Platform

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huang, Renke; Jin, Shuangshuang; Chen, Yousu

This paper presents a faster-than-real-time dynamic simulation software package that is designed for large-size power system dynamic simulation. It was developed on the GridPACKTM high-performance computing (HPC) framework. The key features of the developed software package include (1) faster-than-real-time dynamic simulation for a WECC system (17,000 buses) with different types of detailed generator, controller, and relay dynamic models, (2) a decoupled parallel dynamic simulation algorithm with optimized computation architecture to better leverage HPC resources and technologies, (3) options for HPC-based linear and iterative solvers, (4) hidden HPC details, such as data communication and distribution, to enable development centered on mathematicalmore » models and algorithms rather than on computational details for power system researchers, and (5) easy integration of new dynamic models and related algorithms into the software package.« less
The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies.

PubMed

Harispe, Sébastien; Ranwez, Sylvie; Janaqi, Stefan; Montmain, Jacky

2014-03-01

The semantic measures library and toolkit are robust open-source and easy to use software solutions dedicated to semantic measures. They can be used for large-scale computations and analyses of semantic similarities between terms/concepts defined in terminologies and ontologies. The comparison of entities (e.g. genes) annotated by concepts is also supported. A large collection of measures is available. Not limited to a specific application context, the library and the toolkit can be used with various controlled vocabularies and ontology specifications (e.g. Open Biomedical Ontology, Resource Description Framework). The project targets both designers and practitioners of semantic measures providing a JAVA library, as well as a command-line tool that can be used on personal computers or computer clusters. Downloads, documentation, tutorials, evaluation and support are available at http://www.semantic-measures-library.org.
Users matter : multi-agent systems model of high performance computing cluster users.

DOE Office of Scientific and Technical Information (OSTI.GOV)

North, M. J.; Hood, C. S.; Decision and Information Sciences

2005-01-01

High performance computing clusters have been a critical resource for computational science for over a decade and have more recently become integral to large-scale industrial analysis. Despite their well-specified components, the aggregate behavior of clusters is poorly understood. The difficulties arise from complicated interactions between cluster components during operation. These interactions have been studied by many researchers, some of whom have identified the need for holistic multi-scale modeling that simultaneously includes network level, operating system level, process level, and user level behaviors. Each of these levels presents its own modeling challenges, but the user level is the most complex duemore » to the adaptability of human beings. In this vein, there are several major user modeling goals, namely descriptive modeling, predictive modeling and automated weakness discovery. This study shows how multi-agent techniques were used to simulate a large-scale computing cluster at each of these levels.« less
Automation of the CFD Process on Distributed Computing Systems

NASA Technical Reports Server (NTRS)

Tejnil, Ed; Gee, Ken; Rizk, Yehia M.

2000-01-01

A script system was developed to automate and streamline portions of the CFD process. The system was designed to facilitate the use of CFD flow solvers on supercomputer and workstation platforms within a parametric design event. Integrating solver pre- and postprocessing phases, the fully automated ADTT script system marshalled the required input data, submitted the jobs to available computational resources, and processed the resulting output data. A number of codes were incorporated into the script system, which itself was part of a larger integrated design environment software package. The IDE and scripts were used in a design event involving a wind tunnel test. This experience highlighted the need for efficient data and resource management in all parts of the CFD process. To facilitate the use of CFD methods to perform parametric design studies, the script system was developed using UNIX shell and Perl languages. The goal of the work was to minimize the user interaction required to generate the data necessary to fill a parametric design space. The scripts wrote out the required input files for the user-specified flow solver, transferred all necessary input files to the computational resource, submitted and tracked the jobs using the resource queuing structure, and retrieved and post-processed the resulting dataset. For computational resources that did not run queueing software, the script system established its own simple first-in-first-out queueing structure to manage the workload. A variety of flow solvers were incorporated in the script system, including INS2D, PMARC, TIGER and GASP. Adapting the script system to a new flow solver was made easier through the use of object-oriented programming methods. The script system was incorporated into an ADTT integrated design environment and evaluated as part of a wind tunnel experiment. The system successfully generated the data required to fill the desired parametric design space. This stressed the computational resources required to compute and store the information. The scripts were continually modified to improve the utilization of the computational resources and reduce the likelihood of data loss due to failures. An ad-hoc file server was created to manage the large amount of data being generated as part of the design event. Files were stored and retrieved as needed to create new jobs and analyze the results. Additional information is contained in the original.
FermiGrid—experience and future plans

NASA Astrophysics Data System (ADS)

Chadwick, K.; Berman, E.; Canal, P.; Hesselroth, T.; Garzoglio, G.; Levshina, T.; Sergeev, V.; Sfiligoi, I.; Sharma, N.; Timm, S.; Yocum, D. R.

2008-07-01

Fermilab supports a scientific program that includes experiments and scientists located across the globe. In order to better serve this community, Fermilab has placed its production computer resources in a Campus Grid infrastructure called 'FermiGrid'. The FermiGrid infrastructure allows the large experiments at Fermilab to have priority access to their own resources, enables sharing of these resources in an opportunistic fashion, and movement of work (jobs, data) between the Campus Grid and National Grids such as Open Science Grid (OSG) and the Worldwide LHC Computing Grid Collaboration (WLCG). FermiGrid resources support multiple Virtual Organizations (VOs), including VOs from the OSG, EGEE, and the WLCG. Fermilab also makes leading contributions to the Open Science Grid in the areas of accounting, batch computing, grid security, job management, resource selection, site infrastructure, storage management, and VO services. Through the FermiGrid interfaces, authenticated and authorized VOs and individuals may access our core grid services, the 10,000+ Fermilab resident CPUs, near-petabyte (including CMS) online disk pools and the multi-petabyte Fermilab Mass Storage System. These core grid services include a site wide Globus gatekeeper, VO management services for several VOs, Fermilab site authorization services, grid user mapping services, as well as job accounting and monitoring, resource selection and data movement services. Access to these services is via standard and well-supported grid interfaces. We will report on the user experience of using the FermiGrid campus infrastructure interfaced to a national cyberinfrastructure - the successes and the problems.
FermiGrid - experience and future plans

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chadwick, K.; Berman, E.; Canal, P.

2007-09-01

Fermilab supports a scientific program that includes experiments and scientists located across the globe. In order to better serve this community, Fermilab has placed its production computer resources in a Campus Grid infrastructure called 'FermiGrid'. The FermiGrid infrastructure allows the large experiments at Fermilab to have priority access to their own resources, enables sharing of these resources in an opportunistic fashion, and movement of work (jobs, data) between the Campus Grid and National Grids such as Open Science Grid and the WLCG. FermiGrid resources support multiple Virtual Organizations (VOs), including VOs from the Open Science Grid (OSG), EGEE and themore » Worldwide LHC Computing Grid Collaboration (WLCG). Fermilab also makes leading contributions to the Open Science Grid in the areas of accounting, batch computing, grid security, job management, resource selection, site infrastructure, storage management, and VO services. Through the FermiGrid interfaces, authenticated and authorized VOs and individuals may access our core grid services, the 10,000+ Fermilab resident CPUs, near-petabyte (including CMS) online disk pools and the multi-petabyte Fermilab Mass Storage System. These core grid services include a site wide Globus gatekeeper, VO management services for several VOs, Fermilab site authorization services, grid user mapping services, as well as job accounting and monitoring, resource selection and data movement services. Access to these services is via standard and well-supported grid interfaces. We will report on the user experience of using the FermiGrid campus infrastructure interfaced to a national cyberinfrastructure--the successes and the problems.« less
Accounting and Accountability for Distributed and Grid Systems

NASA Technical Reports Server (NTRS)

Thigpen, William; McGinnis, Laura F.; Hacker, Thomas J.

2001-01-01

While the advent of distributed and grid computing systems will open new opportunities for scientific exploration, the reality of such implementations could prove to be a system administrator's nightmare. A lot of effort is being spent on identifying and resolving the obvious problems of security, scheduling, authentication and authorization. Lurking in the background, though, are the largely unaddressed issues of accountability and usage accounting: (1) mapping resource usage to resource users; (2) defining usage economies or methods for resource exchange; (3) describing implementation standards that minimize and compartmentalize the tasks required for a site to participate in a grid.
Grid and Cloud for Developing Countries

NASA Astrophysics Data System (ADS)

Petitdidier, Monique

2014-05-01

The European Grid e-infrastructure has shown the capacity to connect geographically distributed heterogeneous compute resources in a secure way taking advantages of a robust and fast REN (Research and Education Network). In many countries like in Africa the first step has been to implement a REN and regional organizations like Ubuntunet, WACREN or ASREN to coordinate the development, improvement of the network and its interconnection. The Internet connections are still exploding in those countries. The second step has been to fill up compute needs of the scientists. Even if many of them have their own multi-core or not laptops for more and more applications it is not enough because they have to face intensive computing due to the large amount of data to be processed and/or complex codes. So far one solution has been to go abroad in Europe or in America to run large applications or not to participate to international communities. The Grid is very attractive to connect geographically-distributed heterogeneous resources, aggregate new ones and create new sites on the REN with a secure access. All the users have the same servicers even if they have no resources in their institute. With faster and more robust internet they will be able to take advantage of the European Grid. There are different initiatives to provide resources and training like UNESCO/HP Brain Gain initiative, EUMEDGrid, ..Nowadays Cloud becomes very attractive and they start to be developed in some countries. In this talk challenges for those countries to implement such e-infrastructures, to develop in parallel scientific and technical research and education in the new technologies will be presented illustrated by examples.
BigData and computing challenges in high energy and nuclear physics

NASA Astrophysics Data System (ADS)

Klimentov, A.; Grigorieva, M.; Kiryanov, A.; Zarochentsev, A.

2017-06-01

In this contribution we discuss the various aspects of the computing resource needs experiments in High Energy and Nuclear Physics, in particular at the Large Hadron Collider. This will evolve in the future when moving from LHC to HL-LHC in ten years from now, when the already exascale levels of data we are processing could increase by a further order of magnitude. The distributed computing environment has been a great success and the inclusion of new super-computing facilities, cloud computing and volunteering computing for the future is a big challenge, which we are successfully mastering with a considerable contribution from many super-computing centres around the world, academic and commercial cloud providers. We also discuss R&D computing projects started recently in National Research Center ``Kurchatov Institute''

Cloud-Based Numerical Weather Prediction for Near Real-Time Forecasting and Disaster Response

NASA Technical Reports Server (NTRS)

Molthan, Andrew; Case, Jonathan; Venners, Jason; Schroeder, Richard; Checchi, Milton; Zavodsky, Bradley; Limaye, Ashutosh; O'Brien, Raymond

2015-01-01

The use of cloud computing resources continues to grow within the public and private sector components of the weather enterprise as users become more familiar with cloud-computing concepts, and competition among service providers continues to reduce costs and other barriers to entry. Cloud resources can also provide capabilities similar to high-performance computing environments, supporting multi-node systems required for near real-time, regional weather predictions. Referred to as "Infrastructure as a Service", or IaaS, the use of cloud-based computing hardware in an on-demand payment system allows for rapid deployment of a modeling system in environments lacking access to a large, supercomputing infrastructure. Use of IaaS capabilities to support regional weather prediction may be of particular interest to developing countries that have not yet established large supercomputing resources, but would otherwise benefit from a regional weather forecasting capability. Recently, collaborators from NASA Marshall Space Flight Center and Ames Research Center have developed a scripted, on-demand capability for launching the NOAA/NWS Science and Training Resource Center (STRC) Environmental Modeling System (EMS), which includes pre-compiled binaries of the latest version of the Weather Research and Forecasting (WRF) model. The WRF-EMS provides scripting for downloading appropriate initial and boundary conditions from global models, along with higher-resolution vegetation, land surface, and sea surface temperature data sets provided by the NASA Short-term Prediction Research and Transition (SPoRT) Center. This presentation will provide an overview of the modeling system capabilities and benchmarks performed on the Amazon Elastic Compute Cloud (EC2) environment. In addition, the presentation will discuss future opportunities to deploy the system in support of weather prediction in developing countries supported by NASA's SERVIR Project, which provides capacity building activities in environmental monitoring and prediction across a growing number of regional hubs throughout the world. Capacity-building applications that extend numerical weather prediction to developing countries are intended to provide near real-time applications to benefit public health, safety, and economic interests, but may have a greater impact during disaster events by providing a source for local predictions of weather-related hazards, or impacts that local weather events may have during the recovery phase.
Development of a SaaS application probe to the physical properties of the Earth's interior: An attempt at moving HPC to the cloud

NASA Astrophysics Data System (ADS)

Huang, Qian

2014-09-01

Scientific computing often requires the availability of a massive number of computers for performing large-scale simulations, and computing in mineral physics is no exception. In order to investigate physical properties of minerals at extreme conditions in computational mineral physics, parallel computing technology is used to speed up the performance by utilizing multiple computer resources to process a computational task simultaneously thereby greatly reducing computation time. Traditionally, parallel computing has been addressed by using High Performance Computing (HPC) solutions and installed facilities such as clusters and super computers. Today, it has been seen that there is a tremendous growth in cloud computing. Infrastructure as a Service (IaaS), the on-demand and pay-as-you-go model, creates a flexible and cost-effective mean to access computing resources. In this paper, a feasibility report of HPC on a cloud infrastructure is presented. It is found that current cloud services in IaaS layer still need to improve performance to be useful to research projects. On the other hand, Software as a Service (SaaS), another type of cloud computing, is introduced into an HPC system for computing in mineral physics, and an application of which is developed. In this paper, an overall description of this SaaS application is presented. This contribution can promote cloud application development in computational mineral physics, and cross-disciplinary studies.
Generic Divide and Conquer Internet-Based Computing

NASA Technical Reports Server (NTRS)

Follen, Gregory J. (Technical Monitor); Radenski, Atanas

2003-01-01

The growth of Internet-based applications and the proliferation of networking technologies have been transforming traditional commercial application areas as well as computer and computational sciences and engineering. This growth stimulates the exploration of Peer to Peer (P2P) software technologies that can open new research and application opportunities not only for the commercial world, but also for the scientific and high-performance computing applications community. The general goal of this project is to achieve better understanding of the transition to Internet-based high-performance computing and to develop solutions for some of the technical challenges of this transition. In particular, we are interested in creating long-term motivation for end users to provide their idle processor time to support computationally intensive tasks. We believe that a practical P2P architecture should provide useful service to both clients with high-performance computing needs and contributors of lower-end computing resources. To achieve this, we are designing dual -service architecture for P2P high-performance divide-and conquer computing; we are also experimenting with a prototype implementation. Our proposed architecture incorporates a master server, utilizes dual satellite servers, and operates on the Internet in a dynamically changing large configuration of lower-end nodes provided by volunteer contributors. A dual satellite server comprises a high-performance computing engine and a lower-end contributor service engine. The computing engine provides generic support for divide and conquer computations. The service engine is intended to provide free useful HTTP-based services to contributors of lower-end computing resources. Our proposed architecture is complementary to and accessible from computational grids, such as Globus, Legion, and Condor. Grids provide remote access to existing higher-end computing resources; in contrast, our goal is to utilize idle processor time of lower-end Internet nodes. Our project is focused on a generic divide and conquer paradigm and on mobile applications of this paradigm that can operate on a loose and ever changing pool of lower-end Internet nodes.
Orchestrating Distributed Resource Ensembles for Petascale Science

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baldin, Ilya; Mandal, Anirban; Ruth, Paul

2014-04-24

Distributed, data-intensive computational science applications of interest to DOE scientific com- munities move large amounts of data for experiment data management, distributed analysis steps, remote visualization, and accessing scientific instruments. These applications need to orchestrate ensembles of resources from multiple resource pools and interconnect them with high-capacity multi- layered networks across multiple domains. It is highly desirable that mechanisms are designed that provide this type of resource provisioning capability to a broad class of applications. It is also important to have coherent monitoring capabilities for such complex distributed environments. In this project, we addressed these problems by designing an abstractmore » API, enabled by novel semantic resource descriptions, for provisioning complex and heterogeneous resources from multiple providers using their native provisioning mechanisms and control planes: computational, storage, and multi-layered high-speed network domains. We used an extensible resource representation based on semantic web technologies to afford maximum flexibility to applications in specifying their needs. We evaluated the effectiveness of provisioning using representative data-intensive ap- plications. We also developed mechanisms for providing feedback about resource performance to the application, to enable closed-loop feedback control and dynamic adjustments to resource allo- cations (elasticity). This was enabled through development of a novel persistent query framework that consumes disparate sources of monitoring data, including perfSONAR, and provides scalable distribution of asynchronous notifications.« less
Program on application of communications satellites to educational development

NASA Technical Reports Server (NTRS)

Morgan, R. P.; Singh, J. P.

1971-01-01

Interdisciplinary research in needs analysis, communications technology studies, and systems synthesis is reported. Existing and planned educational telecommunications services are studied and library utilization of telecommunications is described. Preliminary estimates are presented of ranges of utilization of educational telecommunications services for 1975 and 1985; instructional and public television, computer-aided instruction, computing resources, and information resource sharing for various educational levels and purposes. Communications technology studies include transmission schemes for still-picture television, use of Gunn effect devices, and TV receiver front ends for direct satellite reception at 12 GHz. Two major studies in the systems synthesis project concern (1) organizational and administrative aspects of a large-scale instructional satellite system to be used with schools and (2) an analysis of future development of instructional television, with emphasis on the use of video tape recorders and cable television. A communications satellite system synthesis program developed for NASA is now operational on the university IBM 360-50 computer.
Perspectives on the Future of CFD

NASA Technical Reports Server (NTRS)

Kwak, Dochan

2000-01-01

This viewgraph presentation gives an overview of the future of computational fluid dynamics (CFD), which in the past has pioneered the field of flow simulation. Over time CFD has progressed as computing power. Numerical methods have been advanced as CPU and memory capacity increases. Complex configurations are routinely computed now and direct numerical simulations (DNS) and large eddy simulations (LES) are used to study turbulence. As the computing resources changed to parallel and distributed platforms, computer science aspects such as scalability (algorithmic and implementation) and portability and transparent codings have advanced. Examples of potential future (or current) challenges include risk assessment, limitations of the heuristic model, and the development of CFD and information technology (IT) tools.
Application of computational aero-acoustics to real world problems

NASA Technical Reports Server (NTRS)

Hardin, Jay C.

1996-01-01

The application of computational aeroacoustics (CAA) to real problems is discussed in relation to the analysis performed with the aim of assessing the application of the various techniques. It is considered that the applications are limited by the inability of the computational resources to resolve the large range of scales involved in high Reynolds number flows. Possible simplifications are discussed. It is considered that problems remain to be solved in relation to the efficient use of the power of parallel computers and in the development of turbulent modeling schemes. The goal of CAA is stated as being the implementation of acoustic design studies on a computer terminal with reasonable run times.
The relationship of race to women's use of health information resources.

PubMed

Nicholson, Wanda K; Grason, Holly A; Powe, Neil R

2003-02-01

The purpose of this study was to examine, among the general public, the independent effect of race on women's use of health information resources. A population-based random-digit dialing survey of adult women, aged 18 to 64 years, was conducted between October 1999 and January 2000. Subjects included 509 women (341 white women, 135 black women, and 33 women of other races). The response rate was 66%. The main outcome variable was the use of health information resources (print health or news media, broadcast media, computer resources [Internet], health organizations, organized health events). Logistic regression was used to determine the independent effect of race/ethnicity on the use of different information resources, with an adjustment for age, income, education, and marital status. After the adjustment for socioeconomic factors, black women had <50% odds of using print news media (odds ratio, 0.5; 95% CI, 0.4-0.8), <60% odds of using computer-based resources (odds ratio, 0.4; 95% CI, 0.2-0.6), and <70% odds of using health policy organizations (odds ratio, 0.3; 95% CI, 0.2-0.7), compared with white women. There is a large racial disparity in women's use of health information resources. Traditional sources that are used to provide patient information may not be effective in certain populations.
National Fusion Collaboratory: Grid Computing for Simulations and Experiments

NASA Astrophysics Data System (ADS)

Greenwald, Martin

2004-05-01

The National Fusion Collaboratory Project is creating a computational grid designed to advance scientific understanding and innovation in magnetic fusion research by facilitating collaborations, enabling more effective integration of experiments, theory and modeling and allowing more efficient use of experimental facilities. The philosophy of FusionGrid is that data, codes, analysis routines, visualization tools, and communication tools should be thought of as network available services, easily used by the fusion scientist. In such an environment, access to services is stressed rather than portability. By building on a foundation of established computer science toolkits, deployment time can be minimized. These services all share the same basic infrastructure that allows for secure authentication and resource authorization which allows stakeholders to control their own resources such as computers, data and experiments. Code developers can control intellectual property, and fair use of shared resources can be demonstrated and controlled. A key goal is to shield scientific users from the implementation details such that transparency and ease-of-use are maximized. The first FusionGrid service deployed was the TRANSP code, a widely used tool for transport analysis. Tools for run preparation, submission, monitoring and management have been developed and shared among a wide user base. This approach saves user sites from the laborious effort of maintaining such a large and complex code while at the same time reducing the burden on the development team by avoiding the need to support a large number of heterogeneous installations. Shared visualization and A/V tools are being developed and deployed to enhance long-distance collaborations. These include desktop versions of the Access Grid, a highly capable multi-point remote conferencing tool and capabilities for sharing displays and analysis tools over local and wide-area networks.
Perspectives on Sharing Models and Related Resources in Computational Biomechanics Research.

PubMed

Erdemir, Ahmet; Hunter, Peter J; Holzapfel, Gerhard A; Loew, Leslie M; Middleton, John; Jacobs, Christopher R; Nithiarasu, Perumal; Löhner, Rainlad; Wei, Guowei; Winkelstein, Beth A; Barocas, Victor H; Guilak, Farshid; Ku, Joy P; Hicks, Jennifer L; Delp, Scott L; Sacks, Michael; Weiss, Jeffrey A; Ateshian, Gerard A; Maas, Steve A; McCulloch, Andrew D; Peng, Grace C Y

2018-02-01

The role of computational modeling for biomechanics research and related clinical care will be increasingly prominent. The biomechanics community has been developing computational models routinely for exploration of the mechanics and mechanobiology of diverse biological structures. As a result, a large array of models, data, and discipline-specific simulation software has emerged to support endeavors in computational biomechanics. Sharing computational models and related data and simulation software has first become a utilitarian interest, and now, it is a necessity. Exchange of models, in support of knowledge exchange provided by scholarly publishing, has important implications. Specifically, model sharing can facilitate assessment of reproducibility in computational biomechanics and can provide an opportunity for repurposing and reuse, and a venue for medical training. The community's desire to investigate biological and biomechanical phenomena crossing multiple systems, scales, and physical domains, also motivates sharing of modeling resources as blending of models developed by domain experts will be a required step for comprehensive simulation studies as well as the enhancement of their rigor and reproducibility. The goal of this paper is to understand current perspectives in the biomechanics community for the sharing of computational models and related resources. Opinions on opportunities, challenges, and pathways to model sharing, particularly as part of the scholarly publishing workflow, were sought. A group of journal editors and a handful of investigators active in computational biomechanics were approached to collect short opinion pieces as a part of a larger effort of the IEEE EMBS Computational Biology and the Physiome Technical Committee to address model reproducibility through publications. A synthesis of these opinion pieces indicates that the community recognizes the necessity and usefulness of model sharing. There is a strong will to facilitate model sharing, and there are corresponding initiatives by the scientific journals. Outside the publishing enterprise, infrastructure to facilitate model sharing in biomechanics exists, and simulation software developers are interested in accommodating the community's needs for sharing of modeling resources. Encouragement for the use of standardized markups, concerns related to quality assurance, acknowledgement of increased burden, and importance of stewardship of resources are noted. In the short-term, it is advisable that the community builds upon recent strategies and experiments with new pathways for continued demonstration of model sharing, its promotion, and its utility. Nonetheless, the need for a long-term strategy to unify approaches in sharing computational models and related resources is acknowledged. Development of a sustainable platform supported by a culture of open model sharing will likely evolve through continued and inclusive discussions bringing all stakeholders at the table, e.g., by possibly establishing a consortium.
Controlling user access to electronic resources without password

DOEpatents

Smith, Fred Hewitt

2015-06-16

Described herein are devices and techniques for remotely controlling user access to a restricted computer resource. The process includes pre-determining an association of the restricted computer resource and computer-resource-proximal environmental information. Indicia of user-proximal environmental information are received from a user requesting access to the restricted computer resource. Received indicia of user-proximal environmental information are compared to associated computer-resource-proximal environmental information. User access to the restricted computer resource is selectively granted responsive to a favorable comparison in which the user-proximal environmental information is sufficiently similar to the computer-resource proximal environmental information. In at least some embodiments, the process further includes comparing user-supplied biometric measure and comparing it with a predetermined association of at least one biometric measure of an authorized user. Access to the restricted computer resource is granted in response to a favorable comparison.
A bioinformatics knowledge discovery in text application for grid computing

PubMed Central

Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco

2009-01-01

Background A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. Methods The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. Results A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. Conclusion In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities. PMID:19534749
A bioinformatics knowledge discovery in text application for grid computing.

PubMed

Castellano, Marcello; Mastronardi, Giuseppe; Bellotti, Roberto; Tarricone, Gianfranco

2009-06-16

A fundamental activity in biomedical research is Knowledge Discovery which has the ability to search through large amounts of biomedical information such as documents and data. High performance computational infrastructures, such as Grid technologies, are emerging as a possible infrastructure to tackle the intensive use of Information and Communication resources in life science. The goal of this work was to develop a software middleware solution in order to exploit the many knowledge discovery applications on scalable and distributed computing systems to achieve intensive use of ICT resources. The development of a grid application for Knowledge Discovery in Text using a middleware solution based methodology is presented. The system must be able to: perform a user application model, process the jobs with the aim of creating many parallel jobs to distribute on the computational nodes. Finally, the system must be aware of the computational resources available, their status and must be able to monitor the execution of parallel jobs. These operative requirements lead to design a middleware to be specialized using user application modules. It included a graphical user interface in order to access to a node search system, a load balancing system and a transfer optimizer to reduce communication costs. A middleware solution prototype and the performance evaluation of it in terms of the speed-up factor is shown. It was written in JAVA on Globus Toolkit 4 to build the grid infrastructure based on GNU/Linux computer grid nodes. A test was carried out and the results are shown for the named entity recognition search of symptoms and pathologies. The search was applied to a collection of 5,000 scientific documents taken from PubMed. In this paper we discuss the development of a grid application based on a middleware solution. It has been tested on a knowledge discovery in text process to extract new and useful information about symptoms and pathologies from a large collection of unstructured scientific documents. As an example a computation of Knowledge Discovery in Database was applied on the output produced by the KDT user module to extract new knowledge about symptom and pathology bio-entities.
USMC Installations Command Information Environment: Opportunities and Analysis for Integration of First Responder Communications

DTIC Science & Technology

2014-09-01

becoming a more and more prevalent technology in the business world today. According to Syal and Goswami (2012), cloud technology is seen as a...use of computing resources, applications, and personal files without reliance on a single computer or system ( Syal & Goswami, 2012). By operating in...cloud services largely being web-based, which can be retrieved through most systems with access to the Internet ( Syal & Goswami, 2012). The end user can
Multi-discipline resource inventory of soils, vegetation and geology

NASA Technical Reports Server (NTRS)

Simonson, G. H. (Principal Investigator); Paine, D. P.; Lawrence, R. D.; Norgren, J. A.; Pyott, W. Y.; Herzog, J. H.; Murray, R. J.; Rogers, R.

1973-01-01

The author has identified the following significant results. Computer classification of natural vegetation, in the vicinity of Big Summit Prairie, Crook County, Oregon was carried out using MSS digital data. Impure training sets, representing eleven vegetation types plus water, were selected from within the area to be classified. Close correlations were visually observed between vegetation types mapped from the large scale photographs and the computer classification of the ERTS data (Frame 1021-18151, 13 August 1972).
Laboratory Computing Resource Center

Science.gov Websites

Systems Computing and Data Resources Purchasing Resources Future Plans For Users Getting Started Using LCRC Software Best Practices and Policies Getting Help Support Laboratory Computing Resource Center Laboratory Computing Resource Center Latest Announcements See All April 27, 2018, Announcements, John Low
Lost in Cloud

NASA Technical Reports Server (NTRS)

Maluf, David A.; Shetye, Sandeep D.; Chilukuri, Sri; Sturken, Ian

2012-01-01

Cloud computing can reduce cost significantly because businesses can share computing resources. In recent years Small and Medium Businesses (SMB) have used Cloud effectively for cost saving and for sharing IT expenses. With the success of SMBs, many perceive that the larger enterprises ought to move into Cloud environment as well. Government agency s stove-piped environments are being considered as candidates for potential use of Cloud either as an enterprise entity or pockets of small communities. Cloud Computing is the delivery of computing as a service rather than as a product, whereby shared resources, software, and information are provided to computers and other devices as a utility over a network. Underneath the offered services, there exists a modern infrastructure cost of which is often spread across its services or its investors. As NASA is considered as an Enterprise class organization, like other enterprises, a shift has been occurring in perceiving its IT services as candidates for Cloud services. This paper discusses market trends in cloud computing from an enterprise angle and then addresses the topic of Cloud Computing for NASA in two possible forms. First, in the form of a public Cloud to support it as an enterprise, as well as to share it with the commercial and public at large. Second, as a private Cloud wherein the infrastructure is operated solely for NASA, whether managed internally or by a third-party and hosted internally or externally. The paper addresses the strengths and weaknesses of both paradigms of public and private Clouds, in both internally and externally operated settings. The content of the paper is from a NASA perspective but is applicable to any large enterprise with thousands of employees and contractors.
Integral Images: Efficient Algorithms for Their Computation and Storage in Resource-Constrained Embedded Vision Systems

PubMed Central

Ehsan, Shoaib; Clark, Adrian F.; ur Rehman, Naveed; McDonald-Maier, Klaus D.

2015-01-01

The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size. For resource-constrained real-time embedded vision systems, computation and storage of integral image presents several design challenges due to strict timing and hardware limitations. Although calculation of the integral image only consists of simple addition operations, the total number of operations is large owing to the generally large size of image data. Recursive equations allow substantial decrease in the number of operations but require calculation in a serial fashion. This paper presents two new hardware algorithms that are based on the decomposition of these recursive equations, allowing calculation of up to four integral image values in a row-parallel way without significantly increasing the number of operations. An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video). Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements. Finally, the paper provides a case study that highlights the utility of the proposed architectures in embedded vision systems. PMID:26184211
Integral Images: Efficient Algorithms for Their Computation and Storage in Resource-Constrained Embedded Vision Systems.

PubMed

Ehsan, Shoaib; Clark, Adrian F; Naveed ur Rehman; McDonald-Maier, Klaus D

2015-07-10

The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size. For resource-constrained real-time embedded vision systems, computation and storage of integral image presents several design challenges due to strict timing and hardware limitations. Although calculation of the integral image only consists of simple addition operations, the total number of operations is large owing to the generally large size of image data. Recursive equations allow substantial decrease in the number of operations but require calculation in a serial fashion. This paper presents two new hardware algorithms that are based on the decomposition of these recursive equations, allowing calculation of up to four integral image values in a row-parallel way without significantly increasing the number of operations. An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video). Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements. Finally, the paper provides a case study that highlights the utility of the proposed architectures in embedded vision systems.
An efficient dynamic load balancing algorithm

NASA Astrophysics Data System (ADS)

Lagaros, Nikos D.

2014-01-01

In engineering problems, randomness and uncertainties are inherent. Robust design procedures, formulated in the framework of multi-objective optimization, have been proposed in order to take into account sources of randomness and uncertainty. These design procedures require orders of magnitude more computational effort than conventional analysis or optimum design processes since a very large number of finite element analyses is required to be dealt. It is therefore an imperative need to exploit the capabilities of computing resources in order to deal with this kind of problems. In particular, parallel computing can be implemented at the level of metaheuristic optimization, by exploiting the physical parallelization feature of the nondominated sorting evolution strategies method, as well as at the level of repeated structural analyses required for assessing the behavioural constraints and for calculating the objective functions. In this study an efficient dynamic load balancing algorithm for optimum exploitation of available computing resources is proposed and, without loss of generality, is applied for computing the desired Pareto front. In such problems the computation of the complete Pareto front with feasible designs only, constitutes a very challenging task. The proposed algorithm achieves linear speedup factors and almost 100% speedup factor values with reference to the sequential procedure.

High performance semantic factoring of giga-scale semantic graph databases.

DOE Office of Scientific and Technical Information (OSTI.GOV)

al-Saffar, Sinan; Adolf, Bob; Haglin, David

2010-10-01

As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.« less
Resource Management In Peer-To-Peer Networks: A Nadse Approach

NASA Astrophysics Data System (ADS)

Patel, R. B.; Garg, Vishal

2011-12-01

This article presents a common solution to Peer-to-Peer (P2P) network problems and distributed computing with the help of "Neighbor Assisted Distributed and Scalable Environment" (NADSE). NADSE supports both device and code mobility. In this article mainly we focus on the NADSE based resource management technique. How information dissemination and searching is speedup when using the NADSE service provider node in large network. Results show that performance of the NADSE network is better in comparison to Gnutella, and Freenet.
Research on elastic resource management for multi-queue under cloud computing environment

NASA Astrophysics Data System (ADS)

CHENG, Zhenjing; LI, Haibo; HUANG, Qiulan; Cheng, Yaodong; CHEN, Gang

2017-10-01

As a new approach to manage computing resource, virtualization technology is more and more widely applied in the high-energy physics field. A virtual computing cluster based on Openstack was built at IHEP, using HTCondor as the job queue management system. In a traditional static cluster, a fixed number of virtual machines are pre-allocated to the job queue of different experiments. However this method cannot be well adapted to the volatility of computing resource requirements. To solve this problem, an elastic computing resource management system under cloud computing environment has been designed. This system performs unified management of virtual computing nodes on the basis of job queue in HTCondor based on dual resource thresholds as well as the quota service. A two-stage pool is designed to improve the efficiency of resource pool expansion. This paper will present several use cases of the elastic resource management system in IHEPCloud. The practical run shows virtual computing resource dynamically expanded or shrunk while computing requirements change. Additionally, the CPU utilization ratio of computing resource was significantly increased when compared with traditional resource management. The system also has good performance when there are multiple condor schedulers and multiple job queues.
Science Information System in Japan. NIER Occasional Paper 02/83.

ERIC Educational Resources Information Center

Matsumura, Tamiko

This paper describes the development of a proposed Japanese Science Information System (SIS), a nationwide network of research and academic libraries, large-scale computer centers, national research institutes, and other organizations, to be formed for the purpose of sharing information and resources in the natural sciences, technology, the…
An Aggregate IRT Procedure for Exploratory Factor Analysis

ERIC Educational Resources Information Center

Camilli, Gregory; Fox, Jean-Paul

2015-01-01

An aggregation strategy is proposed to potentially address practical limitation related to computing resources for two-level multidimensional item response theory (MIRT) models with large data sets. The aggregate model is derived by integration of the normal ogive model, and an adaptation of the stochastic approximation expectation maximization…
A Rich Metadata Filesystem for Scientific Data

ERIC Educational Resources Information Center

Bui, Hoang

2012-01-01

As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide both data archival services and rich metadata, and cleanly integrate with large scale computing resources. ROARS is a hybrid approach to distributed storage that provides…
In vitro data and in silico models for computational toxicology (Teratology Society ILSI HESI workshop)

EPA Science Inventory

The challenge of assessing the potential developmental health risks for the tens of thousands of environmental chemicals is beyond the capacity for resource-intensive animal protocols. Large data streams coming from high-throughput (HTS) and high-content (HCS) profiling of biolog...
A Case for Data Commons

PubMed Central

Grossman, Robert L.; Heath, Allison; Murphy, Mark; Patterson, Maria; Wells, Walt

2017-01-01

Data commons collocate data, storage, and computing infrastructure with core services and commonly used tools and applications for managing, analyzing, and sharing data to create an interoperable resource for the research community. An architecture for data commons is described, as well as some lessons learned from operating several large-scale data commons. PMID:29033693
SCIMITAR: Scalable Stream-Processing for Sensor Information Brokering

DTIC Science & Technology

2013-11-01

IaaS) cloud frameworks including Amazon Web Services and Eucalyptus . For load testing, we used The Grinder [9], a Java load testing framework that...internal Eucalyptus cluster which we could not scale as large as the Amazon environment due to a lack of computation resources. We recreated our
Investigation into Cloud Computing for More Robust Automated Bulk Image Geoprocessing

NASA Technical Reports Server (NTRS)

Brown, Richard B.; Smoot, James C.; Underwood, Lauren; Armstrong, C. Duane

2012-01-01

Geospatial resource assessments frequently require timely geospatial data processing that involves large multivariate remote sensing data sets. In particular, for disasters, response requires rapid access to large data volumes, substantial storage space and high performance processing capability. The processing and distribution of this data into usable information products requires a processing pipeline that can efficiently manage the required storage, computing utilities, and data handling requirements. In recent years, with the availability of cloud computing technology, cloud processing platforms have made available a powerful new computing infrastructure resource that can meet this need. To assess the utility of this resource, this project investigates cloud computing platforms for bulk, automated geoprocessing capabilities with respect to data handling and application development requirements. This presentation is of work being conducted by Applied Sciences Program Office at NASA-Stennis Space Center. A prototypical set of image manipulation and transformation processes that incorporate sample Unmanned Airborne System data were developed to create value-added products and tested for implementation on the "cloud". This project outlines the steps involved in creating and testing of open source software developed process code on a local prototype platform, and then transitioning this code with associated environment requirements into an analogous, but memory and processor enhanced cloud platform. A data processing cloud was used to store both standard digital camera panchromatic and multi-band image data, which were subsequently subjected to standard image processing functions such as NDVI (Normalized Difference Vegetation Index), NDMI (Normalized Difference Moisture Index), band stacking, reprojection, and other similar type data processes. Cloud infrastructure service providers were evaluated by taking these locally tested processing functions, and then applying them to a given cloud-enabled infrastructure to assesses and compare environment setup options and enabled technologies. This project reviews findings that were observed when cloud platforms were evaluated for bulk geoprocessing capabilities based on data handling and application development requirements.
Software Manages Documentation in a Large Test Facility

NASA Technical Reports Server (NTRS)

Gurneck, Joseph M.

2001-01-01

The 3MCS computer program assists and instrumentation engineer in performing the 3 essential functions of design, documentation, and configuration management of measurement and control systems in a large test facility. Services provided by 3MCS are acceptance of input from multiple engineers and technicians working at multiple locations;standardization of drawings;automated cross-referencing; identification of errors;listing of components and resources; downloading of test settings; and provision of information to customers.
GapMap: Enabling Comprehensive Autism Resource Epidemiology

PubMed Central

Albert, Nikhila; Schwartz, Jessey; Du, Michael

2017-01-01

Background For individuals with autism spectrum disorder (ASD), finding resources can be a lengthy and difficult process. The difficulty in obtaining global, fine-grained autism epidemiological data hinders researchers from quickly and efficiently studying large-scale correlations among ASD, environmental factors, and geographical and cultural factors. Objective The objective of this study was to define resource load and resource availability for families affected by autism and subsequently create a platform to enable a more accurate representation of prevalence rates and resource epidemiology. Methods We created a mobile application, GapMap, to collect locational, diagnostic, and resource use information from individuals with autism to compute accurate prevalence rates and better understand autism resource epidemiology. GapMap is hosted on AWS S3, running on a React and Redux front-end framework. The backend framework is comprised of an AWS API Gateway and Lambda Function setup, with secure and scalable end points for retrieving prevalence and resource data, and for submitting participant data. Measures of autism resource scarcity, including resource load, resource availability, and resource gaps were defined and preliminarily computed using simulated or scraped data. Results The average distance from an individual in the United States to the nearest diagnostic center is approximately 182 km (50 miles), with a standard deviation of 235 km (146 miles). The average distance from an individual with ASD to the nearest diagnostic center, however, is only 32 km (20 miles), suggesting that individuals who live closer to diagnostic services are more likely to be diagnosed. Conclusions This study confirmed that individuals closer to diagnostic services are more likely to be diagnosed and proposes GapMap, a means to measure and enable the alleviation of increasingly overburdened diagnostic centers and resource-poor areas where parents are unable to diagnose their children as quickly and easily as needed. GapMap will collect information that will provide more accurate data for computing resource loads and availability, uncovering the impact of resource epidemiology on age and likelihood of diagnosis, and gathering localized autism prevalence rates. PMID:28473303
GapMap: Enabling Comprehensive Autism Resource Epidemiology.

PubMed

Albert, Nikhila; Daniels, Jena; Schwartz, Jessey; Du, Michael; Wall, Dennis P

2017-05-04

For individuals with autism spectrum disorder (ASD), finding resources can be a lengthy and difficult process. The difficulty in obtaining global, fine-grained autism epidemiological data hinders researchers from quickly and efficiently studying large-scale correlations among ASD, environmental factors, and geographical and cultural factors. The objective of this study was to define resource load and resource availability for families affected by autism and subsequently create a platform to enable a more accurate representation of prevalence rates and resource epidemiology. We created a mobile application, GapMap, to collect locational, diagnostic, and resource use information from individuals with autism to compute accurate prevalence rates and better understand autism resource epidemiology. GapMap is hosted on AWS S3, running on a React and Redux front-end framework. The backend framework is comprised of an AWS API Gateway and Lambda Function setup, with secure and scalable end points for retrieving prevalence and resource data, and for submitting participant data. Measures of autism resource scarcity, including resource load, resource availability, and resource gaps were defined and preliminarily computed using simulated or scraped data. The average distance from an individual in the United States to the nearest diagnostic center is approximately 182 km (50 miles), with a standard deviation of 235 km (146 miles). The average distance from an individual with ASD to the nearest diagnostic center, however, is only 32 km (20 miles), suggesting that individuals who live closer to diagnostic services are more likely to be diagnosed. This study confirmed that individuals closer to diagnostic services are more likely to be diagnosed and proposes GapMap, a means to measure and enable the alleviation of increasingly overburdened diagnostic centers and resource-poor areas where parents are unable to diagnose their children as quickly and easily as needed. GapMap will collect information that will provide more accurate data for computing resource loads and availability, uncovering the impact of resource epidemiology on age and likelihood of diagnosis, and gathering localized autism prevalence rates. ©Nikhila Albert, Jena Daniels, Jessey Schwartz, Michael Du, Dennis P Wall. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 04.05.2017.
A Computational framework for telemedicine.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Foster, I.; von Laszewski, G.; Thiruvathukal, G. K.

1998-07-01

Emerging telemedicine applications require the ability to exploit diverse and geographically distributed resources. Highspeed networks are used to integrate advanced visualization devices, sophisticated instruments, large databases, archival storage devices, PCs, workstations, and supercomputers. This form of telemedical environment is similar to networked virtual supercomputers, also known as metacomputers. Metacomputers are already being used in many scientific application areas. In this article, we analyze requirements necessary for a telemedical computing infrastructure and compare them with requirements found in a typical metacomputing environment. We will show that metacomputing environments can be used to enable a more powerful and unified computational infrastructure formore » telemedicine. The Globus metacomputing toolkit can provide the necessary low level mechanisms to enable a large scale telemedical infrastructure. The Globus toolkit components are designed in a modular fashion and can be extended to support the specific requirements for telemedicine.« less
High Performance Descriptive Semantic Analysis of Semantic Graph Databases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joslyn, Cliff A.; Adolf, Robert D.; al-Saffar, Sinan

As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a novel high performance hybrid system comprisingmore » computational capability for semantic graph database processing utilizing the large multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths such for the Billion Triple Challenge 2010 dataset.« less
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

PubMed Central

2012-01-01

Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. PMID:23216909
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework.

PubMed

Lewis, Steven; Csordas, Attila; Killcoyne, Sarah; Hermjakob, Henning; Hoopmann, Michael R; Moritz, Robert L; Deutsch, Eric W; Boyle, John

2012-12-05

For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
Assessment of time-dependent density functional theory with the restricted excitation space approximation for excited state calculations of large systems

NASA Astrophysics Data System (ADS)

Hanson-Heine, Magnus W. D.; George, Michael W.; Besley, Nicholas A.

2018-06-01

The restricted excitation subspace approximation is explored as a basis to reduce the memory storage required in linear response time-dependent density functional theory (TDDFT) calculations within the Tamm-Dancoff approximation. It is shown that excluding the core orbitals and up to 70% of the virtual orbitals in the construction of the excitation subspace does not result in significant changes in computed UV/vis spectra for large molecules. The reduced size of the excitation subspace greatly reduces the size of the subspace vectors that need to be stored when using the Davidson procedure to determine the eigenvalues of the TDDFT equations. Furthermore, additional screening of the two-electron integrals in combination with a reduction in the size of the numerical integration grid used in the TDDFT calculation leads to significant computational savings. The use of these approximations represents a simple approach to extend TDDFT to the study of large systems and make the calculations increasingly tractable using modest computing resources.
Are Cloud Environments Ready for Scientific Applications?

NASA Astrophysics Data System (ADS)

Mehrotra, P.; Shackleford, K.

2011-12-01

Cloud computing environments are becoming widely available both in the commercial and government sectors. They provide flexibility to rapidly provision resources in order to meet dynamic and changing computational needs without the customers incurring capital expenses and/or requiring technical expertise. Clouds also provide reliable access to resources even though the end-user may not have in-house expertise for acquiring or operating such resources. Consolidation and pooling in a cloud environment allow organizations to achieve economies of scale in provisioning or procuring computing resources and services. Because of these and other benefits, many businesses and organizations are migrating their business applications (e.g., websites, social media, and business processes) to cloud environments-evidenced by the commercial success of offerings such as the Amazon EC2. In this paper, we focus on the feasibility of utilizing cloud environments for scientific workloads and workflows particularly of interest to NASA scientists and engineers. There is a wide spectrum of such technical computations. These applications range from small workstation-level computations to mid-range computing requiring small clusters to high-performance simulations requiring supercomputing systems with high bandwidth/low latency interconnects. Data-centric applications manage and manipulate large data sets such as satellite observational data and/or data previously produced by high-fidelity modeling and simulation computations. Most of the applications are run in batch mode with static resource requirements. However, there do exist situations that have dynamic demands, particularly ones with public-facing interfaces providing information to the general public, collaborators and partners, as well as to internal NASA users. In the last few months we have been studying the suitability of cloud environments for NASA's technical and scientific workloads. We have ported several applications to multiple cloud environments including NASA's Nebula environment, Amazon's EC2, Magellan at NERSC, and SGI's Cyclone system. We critically examined the performance of the applications on these systems. We also collected information on the usability of these cloud environments. In this talk we will present the results of our study focusing on the efficacy of using clouds for NASA's scientific applications.
Tools for Analyzing Computing Resource Management Strategies and Algorithms for SDR Clouds

NASA Astrophysics Data System (ADS)

Marojevic, Vuk; Gomez-Miguelez, Ismael; Gelonch, Antoni

2012-09-01

Software defined radio (SDR) clouds centralize the computing resources of base stations. The computing resource pool is shared between radio operators and dynamically loads and unloads digital signal processing chains for providing wireless communications services on demand. Each new user session request particularly requires the allocation of computing resources for executing the corresponding SDR transceivers. The huge amount of computing resources of SDR cloud data centers and the numerous session requests at certain hours of a day require an efficient computing resource management. We propose a hierarchical approach, where the data center is divided in clusters that are managed in a distributed way. This paper presents a set of computing resource management tools for analyzing computing resource management strategies and algorithms for SDR clouds. We use the tools for evaluating a different strategies and algorithms. The results show that more sophisticated algorithms can achieve higher resource occupations and that a tradeoff exists between cluster size and algorithm complexity.

Optimization of coupled systems: A critical overview of approaches

NASA Technical Reports Server (NTRS)

Balling, R. J.; Sobieszczanski-Sobieski, J.

1994-01-01

A unified overview is given of problem formulation approaches for the optimization of multidisciplinary coupled systems. The overview includes six fundamental approaches upon which a large number of variations may be made. Consistent approach names and a compact approach notation are given. The approaches are formulated to apply to general nonhierarchic systems. The approaches are compared both from a computational viewpoint and a managerial viewpoint. Opportunities for parallelism of both computation and manpower resources are discussed. Recommendations regarding the need for future research are advanced.
ADHydro: A Large-scale High Resolution Multi-Physics Distributed Water Resources Model for Water Resource Simulations in a Parallel Computing Environment

NASA Astrophysics Data System (ADS)

lai, W.; Steinke, R. C.; Ogden, F. L.

2013-12-01

Physics-based watershed models are useful tools for hydrologic studies, water resources management and economic analyses in the contexts of climate, land-use, and water-use changes. This poster presents development of a physics-based, high-resolution, distributed water resources model suitable for simulating large watersheds in a massively parallel computing environment. Developing this model is one of the objectives of the NSF EPSCoR RII Track II CI-WATER project, which is joint between Wyoming and Utah. The model, which we call ADHydro, is aimed at simulating important processes in the Rocky Mountain west, includes: rainfall and infiltration, snowfall and snowmelt in complex terrain, vegetation and evapotranspiration, soil heat flux and freezing, overland flow, channel flow, groundwater flow and water management. The ADHydro model uses the explicit finite volume method to solve PDEs for 2D overland flow, 2D saturated groundwater flow coupled to 1D channel flow. The model has a quasi-3D formulation that couples 2D overland flow and 2D saturated groundwater flow using the 1D Talbot-Ogden finite water-content infiltration and redistribution model. This eliminates difficulties in solving the highly nonlinear 3D Richards equation, while the finite volume Talbot-Ogden infiltration solution is computationally efficient, guaranteed to conserve mass, and allows simulation of the effect of near-surface groundwater tables on runoff generation. The process-level components of the model are being individually tested and validated. The model as a whole will be tested on the Green River basin in Wyoming and ultimately applied to the entire Upper Colorado River basin. ADHydro development has necessitated development of tools for large-scale watershed modeling, including open-source workflow steps to extract hydromorphological information from GIS data, integrate hydrometeorological and water management forcing input, and post-processing and visualization of large output data sets. The ADHydro model will be coupled with relevant components of the NOAH-MP land surface scheme and the WRF mesoscale meteorological model. Model objectives include well documented Application Programming Interfaces (APIs) to facilitate modifications and additions by others. We will release the model as open-source in 2014 and begin establishing a users' community.
Grid computing enhances standards-compatible geospatial catalogue service

NASA Astrophysics Data System (ADS)

Chen, Aijun; Di, Liping; Bai, Yuqi; Wei, Yaxing; Liu, Yang

2010-04-01

A catalogue service facilitates sharing, discovery, retrieval, management of, and access to large volumes of distributed geospatial resources, for example data, services, applications, and their replicas on the Internet. Grid computing provides an infrastructure for effective use of computing, storage, and other resources available online. The Open Geospatial Consortium has proposed a catalogue service specification and a series of profiles for promoting the interoperability of geospatial resources. By referring to the profile of the catalogue service for Web, an innovative information model of a catalogue service is proposed to offer Grid-enabled registry, management, retrieval of and access to geospatial resources and their replicas. This information model extends the e-business registry information model by adopting several geospatial data and service metadata standards—the International Organization for Standardization (ISO)'s 19115/19119 standards and the US Federal Geographic Data Committee (FGDC) and US National Aeronautics and Space Administration (NASA) metadata standards for describing and indexing geospatial resources. In order to select the optimal geospatial resources and their replicas managed by the Grid, the Grid data management service and information service from the Globus Toolkits are closely integrated with the extended catalogue information model. Based on this new model, a catalogue service is implemented first as a Web service. Then, the catalogue service is further developed as a Grid service conforming to Grid service specifications. The catalogue service can be deployed in both the Web and Grid environments and accessed by standard Web services or authorized Grid services, respectively. The catalogue service has been implemented at the George Mason University/Center for Spatial Information Science and Systems (GMU/CSISS), managing more than 17 TB of geospatial data and geospatial Grid services. This service makes it easy to share and interoperate geospatial resources by using Grid technology and extends Grid technology into the geoscience communities.
Use of DAGMan in CRAB3 to Improve the Splitting of CMS User Jobs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolf, M.; Mascheroni, M.; Woodard, A.

CRAB3 is a workload management tool used by CMS physicists to analyze data acquired by the Compact Muon Solenoid (CMS) detector at the CERN Large Hadron Collider (LHC). Research in high energy physics often requires the analysis of large collections of files, referred to as datasets. The task is divided into jobs that are distributed among a large collection of worker nodes throughout the Worldwide LHC Computing Grid (WLCG). Splitting a large analysis task into optimally sized jobs is critical to efficient use of distributed computing resources. Jobs that are too big will have excessive runtimes and will not distributemore » the work across all of the available nodes. However, splitting the project into a large number of very small jobs is also inefficient, as each job creates additional overhead which increases load on infrastructure resources. Currently this splitting is done manually, using parameters provided by the user. However the resources needed for each job are difficult to predict because of frequent variations in the performance of the user code and the content of the input dataset. As a result, dividing a task into jobs by hand is difficult and often suboptimal. In this work we present a new feature called “automatic splitting” which removes the need for users to manually specify job splitting parameters. We discuss how HTCondor DAGMan can be used to build dynamic Directed Acyclic Graphs (DAGs) to optimize the performance of large CMS analysis jobs on the Grid. We use DAGMan to dynamically generate interconnected DAGs that estimate the processing time the user code will require to analyze each event. This is used to calculate an estimate of the total processing time per job, and a set of analysis jobs are run using this estimate as a specified time limit. Some jobs may not finish within the alloted time; they are terminated at the time limit, and the unfinished data is regrouped into smaller jobs and resubmitted.« less
Use of DAGMan in CRAB3 to improve the splitting of CMS user jobs

NASA Astrophysics Data System (ADS)

Wolf, M.; Mascheroni, M.; Woodard, A.; Belforte, S.; Bockelman, B.; Hernandez, J. M.; Vaandering, E.

2017-10-01

CRAB3 is a workload management tool used by CMS physicists to analyze data acquired by the Compact Muon Solenoid (CMS) detector at the CERN Large Hadron Collider (LHC). Research in high energy physics often requires the analysis of large collections of files, referred to as datasets. The task is divided into jobs that are distributed among a large collection of worker nodes throughout the Worldwide LHC Computing Grid (WLCG). Splitting a large analysis task into optimally sized jobs is critical to efficient use of distributed computing resources. Jobs that are too big will have excessive runtimes and will not distribute the work across all of the available nodes. However, splitting the project into a large number of very small jobs is also inefficient, as each job creates additional overhead which increases load on infrastructure resources. Currently this splitting is done manually, using parameters provided by the user. However the resources needed for each job are difficult to predict because of frequent variations in the performance of the user code and the content of the input dataset. As a result, dividing a task into jobs by hand is difficult and often suboptimal. In this work we present a new feature called “automatic splitting” which removes the need for users to manually specify job splitting parameters. We discuss how HTCondor DAGMan can be used to build dynamic Directed Acyclic Graphs (DAGs) to optimize the performance of large CMS analysis jobs on the Grid. We use DAGMan to dynamically generate interconnected DAGs that estimate the processing time the user code will require to analyze each event. This is used to calculate an estimate of the total processing time per job, and a set of analysis jobs are run using this estimate as a specified time limit. Some jobs may not finish within the alloted time; they are terminated at the time limit, and the unfinished data is regrouped into smaller jobs and resubmitted.
Approaches in highly parameterized inversion-PESTCommander, a graphical user interface for file and run management across networks

USGS Publications Warehouse

Karanovic, Marinko; Muffels, Christopher T.; Tonkin, Matthew J.; Hunt, Randall J.

2012-01-01

Models of environmental systems have become increasingly complex, incorporating increasingly large numbers of parameters in an effort to represent physical processes on a scale approaching that at which they occur in nature. Consequently, the inverse problem of parameter estimation (specifically, model calibration) and subsequent uncertainty analysis have become increasingly computation-intensive endeavors. Fortunately, advances in computing have made computational power equivalent to that of dozens to hundreds of desktop computers accessible through a variety of alternate means: modelers have various possibilities, ranging from traditional Local Area Networks (LANs) to cloud computing. Commonly used parameter estimation software is well suited to take advantage of the availability of such increased computing power. Unfortunately, logistical issues become increasingly important as an increasing number and variety of computers are brought to bear on the inverse problem. To facilitate efficient access to disparate computer resources, the PESTCommander program documented herein has been developed to provide a Graphical User Interface (GUI) that facilitates the management of model files ("file management") and remote launching and termination of "slave" computers across a distributed network of computers ("run management"). In version 1.0 described here, PESTCommander can access and ascertain resources across traditional Windows LANs: however, the architecture of PESTCommander has been developed with the intent that future releases will be able to access computing resources (1) via trusted domains established in Wide Area Networks (WANs) in multiple remote locations and (2) via heterogeneous networks of Windows- and Unix-based operating systems. The design of PESTCommander also makes it suitable for extension to other computational resources, such as those that are available via cloud computing. Version 1.0 of PESTCommander was developed primarily to work with the parameter estimation software PEST; the discussion presented in this report focuses on the use of the PESTCommander together with Parallel PEST. However, PESTCommander can be used with a wide variety of programs and models that require management, distribution, and cleanup of files before or after model execution. In addition to its use with the Parallel PEST program suite, discussion is also included in this report regarding the use of PESTCommander with the Global Run Manager GENIE, which was developed simultaneously with PESTCommander.
A study of computer graphics technology in application of communication resource management

NASA Astrophysics Data System (ADS)

Li, Jing; Zhou, Liang; Yang, Fei

2017-08-01

With the development of computer technology, computer graphics technology has been widely used. Especially, the success of object-oriented technology and multimedia technology promotes the development of graphics technology in the computer software system. Therefore, the computer graphics theory and application technology have become an important topic in the field of computer, while the computer graphics technology becomes more and more extensive in various fields of application. In recent years, with the development of social economy, especially the rapid development of information technology, the traditional way of communication resource management cannot effectively meet the needs of resource management. In this case, the current communication resource management is still using the original management tools and management methods, resource management equipment management and maintenance, which brought a lot of problems. It is very difficult for non-professionals to understand the equipment and the situation in communication resource management. Resource utilization is relatively low, and managers cannot quickly and accurately understand the resource conditions. Aimed at the above problems, this paper proposes to introduce computer graphics technology into the communication resource management. The introduction of computer graphics not only makes communication resource management more vivid, but also reduces the cost of resource management and improves work efficiency.
Trends in life science grid: from computing grid to knowledge grid.

PubMed

Konagaya, Akihiko

2006-12-18

Grid computing has great potential to become a standard cyberinfrastructure for life sciences which often require high-performance computing and large data handling which exceeds the computing capacity of a single institution. This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. Extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community.
Trends in life science grid: from computing grid to knowledge grid

PubMed Central

Konagaya, Akihiko

2006-01-01

Background Grid computing has great potential to become a standard cyberinfrastructure for life sciences which often require high-performance computing and large data handling which exceeds the computing capacity of a single institution. Results This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. Conclusion Extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community. PMID:17254294
A Framework for Control and Observation in Distributed Environments

NASA Technical Reports Server (NTRS)

Smith, Warren

2001-01-01

As organizations begin to deploy large computational grids, it has become apparent that systems for observation and control of the resources, services, and applications that make up such grids are needed. Administrators must observe the operation of resources and services to ensure that they are operating correctly and they must control the resources and services to ensure that their operation meets the needs of users. Further, users need to observe the performance of their applications so that this performance can be improved and control how their applications execute in a dynamic grid environment. In this paper we describe our software framework for control and observation of resources, services, and applications that supports such uses and we provide examples of how our framework can be used.
Mouse Genome Informatics (MGI) Resource: Genetic, Genomic, and Biological Knowledgebase for the Laboratory Mouse.

PubMed

Eppig, Janan T

2017-07-01

The Mouse Genome Informatics (MGI) Resource supports basic, translational, and computational research by providing high-quality, integrated data on the genetics, genomics, and biology of the laboratory mouse. MGI serves a strategic role for the scientific community in facilitating biomedical, experimental, and computational studies investigating the genetics and processes of diseases and enabling the development and testing of new disease models and therapeutic interventions. This review describes the nexus of the body of growing genetic and biological data and the advances in computer technology in the late 1980s, including the World Wide Web, that together launched the beginnings of MGI. MGI develops and maintains a gold-standard resource that reflects the current state of knowledge, provides semantic and contextual data integration that fosters hypothesis testing, continually develops new and improved tools for searching and analysis, and partners with the scientific community to assure research data needs are met. Here we describe one slice of MGI relating to the development of community-wide large-scale mutagenesis and phenotyping projects and introduce ways to access and use these MGI data. References and links to additional MGI aspects are provided. © The Author 2017. Published by Oxford University Press.
Mouse Genome Informatics (MGI) Resource: Genetic, Genomic, and Biological Knowledgebase for the Laboratory Mouse

PubMed Central

Eppig, Janan T.

2017-01-01

Abstract The Mouse Genome Informatics (MGI) Resource supports basic, translational, and computational research by providing high-quality, integrated data on the genetics, genomics, and biology of the laboratory mouse. MGI serves a strategic role for the scientific community in facilitating biomedical, experimental, and computational studies investigating the genetics and processes of diseases and enabling the development and testing of new disease models and therapeutic interventions. This review describes the nexus of the body of growing genetic and biological data and the advances in computer technology in the late 1980s, including the World Wide Web, that together launched the beginnings of MGI. MGI develops and maintains a gold-standard resource that reflects the current state of knowledge, provides semantic and contextual data integration that fosters hypothesis testing, continually develops new and improved tools for searching and analysis, and partners with the scientific community to assure research data needs are met. Here we describe one slice of MGI relating to the development of community-wide large-scale mutagenesis and phenotyping projects and introduce ways to access and use these MGI data. References and links to additional MGI aspects are provided. PMID:28838066
Advanced Aerospace Materials by Design

NASA Technical Reports Server (NTRS)

Srivastava, Deepak; Djomehri, Jahed; Wei, Chen-Yu

2004-01-01

The advances in the emerging field of nanophase thermal and structural composite materials; materials with embedded sensors and actuators for morphing structures; light-weight composite materials for energy and power storage; and large surface area materials for in-situ resource generation and waste recycling, are expected to :revolutionize the capabilities of virtually every system comprising of future robotic and :human moon and mars exploration missions. A high-performance multiscale simulation platform, including the computational capabilities and resources of Columbia - the new supercomputer, is being developed to discover, validate, and prototype next generation (of such advanced materials. This exhibit will describe the porting and scaling of multiscale 'physics based core computer simulation codes for discovering and designing carbon nanotube-polymer composite materials for light-weight load bearing structural and 'thermal protection applications.
Quantum Computation Based on Photons with Three Degrees of Freedom

PubMed Central

Luo, Ming-Xing; Li, Hui-Ran; Lai, Hong; Wang, Xiaojun

2016-01-01

Quantum systems are important resources for quantum computer. Different from previous encoding forms using quantum systems with one degree of freedom (DoF) or two DoFs, we investigate the possibility of photon systems encoding with three DoFs consisting of the polarization DoF and two spatial DoFs. By exploring the optical circular birefringence induced by an NV center in a diamond embedded in the photonic crystal cavity, we propose several hybrid controlled-NOT (hybrid CNOT) gates operating on the two-photon or one-photon system. These hybrid CNOT gates show that three DoFs may be encoded as independent qubits without auxiliary DoFs. Our result provides a useful way to reduce quantum simulation resources by exploring complex quantum systems for quantum applications requiring large qubit systems. PMID:27174302
Quantum Computation Based on Photons with Three Degrees of Freedom.

PubMed

Luo, Ming-Xing; Li, Hui-Ran; Lai, Hong; Wang, Xiaojun

2016-05-13

Quantum systems are important resources for quantum computer. Different from previous encoding forms using quantum systems with one degree of freedom (DoF) or two DoFs, we investigate the possibility of photon systems encoding with three DoFs consisting of the polarization DoF and two spatial DoFs. By exploring the optical circular birefringence induced by an NV center in a diamond embedded in the photonic crystal cavity, we propose several hybrid controlled-NOT (hybrid CNOT) gates operating on the two-photon or one-photon system. These hybrid CNOT gates show that three DoFs may be encoded as independent qubits without auxiliary DoFs. Our result provides a useful way to reduce quantum simulation resources by exploring complex quantum systems for quantum applications requiring large qubit systems.
A resource management architecture based on complex network theory in cloud computing federation

NASA Astrophysics Data System (ADS)

Zhang, Zehua; Zhang, Xuejie

2011-10-01

Cloud Computing Federation is a main trend of Cloud Computing. Resource Management has significant effect on the design, realization, and efficiency of Cloud Computing Federation. Cloud Computing Federation has the typical characteristic of the Complex System, therefore, we propose a resource management architecture based on complex network theory for Cloud Computing Federation (abbreviated as RMABC) in this paper, with the detailed design of the resource discovery and resource announcement mechanisms. Compare with the existing resource management mechanisms in distributed computing systems, a Task Manager in RMABC can use the historical information and current state data get from other Task Managers for the evolution of the complex network which is composed of Task Managers, thus has the advantages in resource discovery speed, fault tolerance and adaptive ability. The result of the model experiment confirmed the advantage of RMABC in resource discovery performance.
Complete distributed computing environment for a HEP experiment: experience with ARC-connected infrastructure for ATLAS

NASA Astrophysics Data System (ADS)

Read, A.; Taga, A.; O-Saada, F.; Pajchel, K.; Samset, B. H.; Cameron, D.

2008-07-01

Computing and storage resources connected by the Nordugrid ARC middleware in the Nordic countries, Switzerland and Slovenia are a part of the ATLAS computing Grid. This infrastructure is being commissioned with the ongoing ATLAS Monte Carlo simulation production in preparation for the commencement of data taking in 2008. The unique non-intrusive architecture of ARC, its straightforward interplay with the ATLAS Production System via the Dulcinea executor, and its performance during the commissioning exercise is described. ARC support for flexible and powerful end-user analysis within the GANGA distributed analysis framework is also shown. Whereas the storage solution for this Grid was earlier based on a large, distributed collection of GridFTP-servers, the ATLAS computing design includes a structured SRM-based system with a limited number of storage endpoints. The characteristics, integration and performance of the old and new storage solutions are presented. Although the hardware resources in this Grid are quite modest, it has provided more than double the agreed contribution to the ATLAS production with an efficiency above 95% during long periods of stable operation.
Streaming Support for Data Intensive Cloud-Based Sequence Analysis

PubMed Central

Issa, Shadi A.; Kienzler, Romeo; El-Kalioby, Mohamed; Tonellato, Peter J.; Wall, Dennis; Bruggmann, Rémy; Abouelhoda, Mohamed

2013-01-01

Cloud computing provides a promising solution to the genomics data deluge problem resulting from the advent of next-generation sequencing (NGS) technology. Based on the concepts of “resources-on-demand” and “pay-as-you-go”, scientists with no or limited infrastructure can have access to scalable and cost-effective computational resources. However, the large size of NGS data causes a significant data transfer latency from the client's site to the cloud, which presents a bottleneck for using cloud computing services. In this paper, we provide a streaming-based scheme to overcome this problem, where the NGS data is processed while being transferred to the cloud. Our scheme targets the wide class of NGS data analysis tasks, where the NGS sequences can be processed independently from one another. We also provide the elastream package that supports the use of this scheme with individual analysis programs or with workflow systems. Experiments presented in this paper show that our solution mitigates the effect of data transfer latency and saves both time and cost of computation. PMID:23710461
PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis

PubMed Central

2014-01-01

Background High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat’s serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. Results We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during ‘spliced alignment’ and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. Conclusions PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system. PMID:24894600
PVT: an efficient computational procedure to speed up next-generation sequence analysis.

PubMed

Maji, Ranjan Kumar; Sarkar, Arijita; Khatua, Sunirmal; Dasgupta, Subhasis; Ghosh, Zhumur

2014-06-04

High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat's serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during 'spliced alignment' and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system.

Optimizing CyberShake Seismic Hazard Workflows for Large HPC Resources

NASA Astrophysics Data System (ADS)

Callaghan, S.; Maechling, P. J.; Juve, G.; Vahi, K.; Deelman, E.; Jordan, T. H.

2014-12-01

The CyberShake computational platform is a well-integrated collection of scientific software and middleware that calculates 3D simulation-based probabilistic seismic hazard curves and hazard maps for the Los Angeles region. Currently each CyberShake model comprises about 235 million synthetic seismograms from about 415,000 rupture variations computed at 286 sites. CyberShake integrates large-scale parallel and high-throughput serial seismological research codes into a processing framework in which early stages produce files used as inputs by later stages. Scientific workflow tools are used to manage the jobs, data, and metadata. The Southern California Earthquake Center (SCEC) developed the CyberShake platform using USC High Performance Computing and Communications systems and open-science NSF resources.CyberShake calculations were migrated to the NSF Track 1 system NCSA Blue Waters when it became operational in 2013, via an interdisciplinary team approach including domain scientists, computer scientists, and middleware developers. Due to the excellent performance of Blue Waters and CyberShake software optimizations, we reduced the makespan (a measure of wallclock time-to-solution) of a CyberShake study from 1467 to 342 hours. We will describe the technical enhancements behind this improvement, including judicious introduction of new GPU software, improved scientific software components, increased workflow-based automation, and Blue Waters-specific workflow optimizations.Our CyberShake performance improvements highlight the benefits of scientific workflow tools. The CyberShake workflow software stack includes the Pegasus Workflow Management System (Pegasus-WMS, which includes Condor DAGMan), HTCondor, and Globus GRAM, with Pegasus-mpi-cluster managing the high-throughput tasks on the HPC resources. The workflow tools handle data management, automatically transferring about 13 TB back to SCEC storage.We will present performance metrics from the most recent CyberShake study, executed on Blue Waters. We will compare the performance of CPU and GPU versions of our large-scale parallel wave propagation code, AWP-ODC-SGT. Finally, we will discuss how these enhancements have enabled SCEC to move forward with plans to increase the CyberShake simulation frequency to 1.0 Hz.
A novel representation of groundwater dynamics in large-scale land surface modelling

NASA Astrophysics Data System (ADS)

Rahman, Mostaquimur; Rosolem, Rafael; Kollet, Stefan

2017-04-01

Land surface processes are connected to groundwater dynamics via shallow soil moisture. For example, groundwater affects evapotranspiration (by influencing the variability of soil moisture) and runoff generation mechanisms. However, contemporary Land Surface Models (LSM) generally consider isolated soil columns and free drainage lower boundary condition for simulating hydrology. This is mainly due to the fact that incorporating detailed groundwater dynamics in LSMs usually requires considerable computing resources, especially for large-scale applications (e.g., continental to global). Yet, these simplifications undermine the potential effect of groundwater dynamics on land surface mass and energy fluxes. In this study, we present a novel approach of representing high-resolution groundwater dynamics in LSMs that is computationally efficient for large-scale applications. This new parameterization is incorporated in the Joint UK Land Environment Simulator (JULES) and tested at the continental-scale.
A distributed parallel storage architecture and its potential application within EOSDIS

NASA Technical Reports Server (NTRS)

Johnston, William E.; Tierney, Brian; Feuquay, Jay; Butzer, Tony

1994-01-01

We describe the architecture, implementation, use of a scalable, high performance, distributed-parallel data storage system developed in the ARPA funded MAGIC gigabit testbed. A collection of wide area distributed disk servers operate in parallel to provide logical block level access to large data sets. Operated primarily as a network-based cache, the architecture supports cooperation among independently owned resources to provide fast, large-scale, on-demand storage to support data handling, simulation, and computation.
Thruput Analysis of AFLC CYBER 73 Computers.

DTIC Science & Technology

1981-12-01

Ref 2:14). This decision permitted a fast conversion effort with minimum programmer/analyst experience (Ref 34). Recently, as the conversion effort...converted (Ref 1:2). 2 . i i i II I i4 Moreover, many of the large data-file and machine-time- consuming systems were not included in the earlier...by LMT personnel revealed that during certain periods i.e., 0000-0800, the machine is normally reserved for the large 3 4 resource- consuming programs
Computer network access to scientific information systems for minority universities

NASA Astrophysics Data System (ADS)

Thomas, Valerie L.; Wakim, Nagi T.

1993-08-01

The evolution of computer networking technology has lead to the establishment of a massive networking infrastructure which interconnects various types of computing resources at many government, academic, and corporate institutions. A large segment of this infrastructure has been developed to facilitate information exchange and resource sharing within the scientific community. The National Aeronautics and Space Administration (NASA) supports both the development and the application of computer networks which provide its community with access to many valuable multi-disciplinary scientific information systems and on-line databases. Recognizing the need to extend the benefits of this advanced networking technology to the under-represented community, the National Space Science Data Center (NSSDC) in the Space Data and Computing Division at the Goddard Space Flight Center has developed the Minority University-Space Interdisciplinary Network (MU-SPIN) Program: a major networking and education initiative for Historically Black Colleges and Universities (HBCUs) and Minority Universities (MUs). In this paper, we will briefly explain the various components of the MU-SPIN Program while highlighting how, by providing access to scientific information systems and on-line data, it promotes a higher level of collaboration among faculty and students and NASA scientists.
Multipartite entanglement verification resistant against dishonest parties.

PubMed

Pappa, Anna; Chailloux, André; Wehner, Stephanie; Diamanti, Eleni; Kerenidis, Iordanis

2012-06-29

Future quantum information networks will consist of quantum and classical agents, who have the ability to communicate in a variety of ways with trusted and untrusted parties and securely delegate computational tasks to untrusted large-scale quantum computing servers. Multipartite quantum entanglement is a fundamental resource for such a network and, hence, it is imperative to study the possibility of verifying a multipartite entanglement source in a way that is efficient and provides strong guarantees even in the presence of multiple dishonest parties. In this Letter, we show how an agent of a quantum network can perform a distributed verification of a source creating multipartite Greenberger-Horne-Zeilinger (GHZ) states with minimal resources, which is, nevertheless, resistant against any number of dishonest parties. Moreover, we provide a tight tradeoff between the level of security and the distance between the state produced by the source and the ideal GHZ state. Last, by adding the resource of a trusted common random source, we can further provide security guarantees for all honest parties in the quantum network simultaneously.
Evaluating High School IT

ERIC Educational Resources Information Center

Thompson, Brett A.

2004-01-01

Since its inception in 1997, Cisco's curriculum has entered thousands of high schools across the U.S. and around the world for two reasons: (1) Cisco has a large portion of the computer networking market, and thus has the resources for and interest in developing high school academies; and (2) high school curriculum development teams recognize the…
Attitudes toward ecosystem management in the United States, 1992-1998

Treesearch

David N. Bengston; George Xu; David P. Fan

2001-01-01

Ecosystem management has been formally adopted by a large number of state and federal agencies and by forest products firms and associations. But little research has examined people's attitudes toward this new approach to natural resources management. This study used computer methods to measure favorable and unfavorable attitudes toward ecosystems managemnet...
A malicious pattern detection engine for embedded security systems in the Internet of Things.

PubMed

Oh, Doohwan; Kim, Deokho; Ro, Won Woo

2014-12-16

With the emergence of the Internet of Things (IoT), a large number of physical objects in daily life have been aggressively connected to the Internet. As the number of objects connected to networks increases, the security systems face a critical challenge due to the global connectivity and accessibility of the IoT. However, it is difficult to adapt traditional security systems to the objects in the IoT, because of their limited computing power and memory size. In light of this, we present a lightweight security system that uses a novel malicious pattern-matching engine. We limit the memory usage of the proposed system in order to make it work on resource-constrained devices. To mitigate performance degradation due to limitations of computation power and memory, we propose two novel techniques, auxiliary shifting and early decision. Through both techniques, we can efficiently reduce the number of matching operations on resource-constrained systems. Experiments and performance analyses show that our proposed system achieves a maximum speedup of 2.14 with an IoT object and provides scalable performance for a large number of patterns.
A resource-sharing model based on a repeated game in fog computing.

PubMed

Sun, Yan; Zhang, Nan

2017-03-01

With the rapid development of cloud computing techniques, the number of users is undergoing exponential growth. It is difficult for traditional data centers to perform many tasks in real time because of the limited bandwidth of resources. The concept of fog computing is proposed to support traditional cloud computing and to provide cloud services. In fog computing, the resource pool is composed of sporadic distributed resources that are more flexible and movable than a traditional data center. In this paper, we propose a fog computing structure and present a crowd-funding algorithm to integrate spare resources in the network. Furthermore, to encourage more resource owners to share their resources with the resource pool and to supervise the resource supporters as they actively perform their tasks, we propose an incentive mechanism in our algorithm. Simulation results show that our proposed incentive mechanism can effectively reduce the SLA violation rate and accelerate the completion of tasks.
A review of Computer Science resources for learning and teaching with K-12 computing curricula: an Australian case study

NASA Astrophysics Data System (ADS)

Falkner, Katrina; Vivian, Rebecca

2015-10-01

To support teachers to implement Computer Science curricula into classrooms from the very first year of school, teachers, schools and organisations seek quality curriculum resources to support implementation and teacher professional development. Until now, many Computer Science resources and outreach initiatives have targeted K-12 school-age children, with the intention to engage children and increase interest, rather than to formally teach concepts and skills. What is the educational quality of existing Computer Science resources and to what extent are they suitable for classroom learning and teaching? In this paper, an assessment framework is presented to evaluate the quality of online Computer Science resources. Further, a semi-systematic review of available online Computer Science resources was conducted to evaluate resources available for classroom learning and teaching and to identify gaps in resource availability, using the Australian curriculum as a case study analysis. The findings reveal a predominance of quality resources, however, a number of critical gaps were identified. This paper provides recommendations and guidance for the development of new and supplementary resources and future research.
Geocomputation over Hybrid Computer Architecture and Systems: Prior Works and On-going Initiatives at UARK

NASA Astrophysics Data System (ADS)

Shi, X.

2015-12-01

As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
The JASMIN Cloud: specialised and hybrid to meet the needs of the Environmental Sciences Community

NASA Astrophysics Data System (ADS)

Kershaw, Philip; Lawrence, Bryan; Churchill, Jonathan; Pritchard, Matt

2014-05-01

Cloud computing provides enormous opportunities for the research community. The large public cloud providers provide near-limitless scaling capability. However, adapting Cloud to scientific workloads is not without its problems. The commodity nature of the public cloud infrastructure can be at odds with the specialist requirements of the research community. Issues such as trust, ownership of data, WAN bandwidth and costing models make additional barriers to more widespread adoption. Alongside the application of public cloud for scientific applications, a number of private cloud initiatives are underway in the research community of which the JASMIN Cloud is one example. Here, cloud service models are being effectively super-imposed over more established services such as data centres, compute cluster facilities and Grids. These have the potential to deliver the specialist infrastructure needed for the science community coupled with the benefits of a Cloud service model. The JASMIN facility based at the Rutherford Appleton Laboratory was established in 2012 to support the data analysis requirements of the climate and Earth Observation community. In its first year of operation, the 5PB of available storage capacity was filled and the hosted compute capability used extensively. JASMIN has modelled the concept of a centralised large-volume data analysis facility. Key characteristics have enabled success: peta-scale fast disk connected via low latency networks to compute resources and the use of virtualisation for effective management of the resources for a range of users. A second phase is now underway funded through NERC's (Natural Environment Research Council) Big Data initiative. This will see significant expansion to the resources available with a doubling of disk-based storage to 12PB and an increase of compute capacity by a factor of ten to over 3000 processing cores. This expansion is accompanied by a broadening in the scope for JASMIN, as a service available to the entire UK environmental science community. Experience with the first phase demonstrated the range of user needs. A trade-off is needed between access privileges to resources, flexibility of use and security. This has influenced the form and types of service under development for the new phase. JASMIN will deploy a specialised private cloud organised into "Managed" and "Unmanaged" components. In the Managed Cloud, users have direct access to the storage and compute resources for optimal performance but for reasons of security, via a more restrictive PaaS (Platform-as-a-Service) interface. The Unmanaged Cloud is deployed in an isolated part of the network but co-located with the rest of the infrastructure. This enables greater liberty to tenants - full IaaS (Infrastructure-as-a-Service) capability to provision customised infrastructure - whilst at the same time protecting more sensitive parts of the system from direct access using these elevated privileges. The private cloud will be augmented with cloud-bursting capability so that it can exploit the resources available from public clouds, making it effectively a hybrid solution. A single interface will overlay the functionality of both the private cloud and external interfaces to public cloud providers giving users the flexibility to migrate resources between infrastructures as requirements dictate.
An acceptable role for computers in the aircraft design process

NASA Technical Reports Server (NTRS)

Gregory, T. J.; Roberts, L.

1980-01-01

Some of the reasons why the computerization trend is not wholly accepted are explored for two typical cases: computer use in the technical specialties and computer use in aircraft synthesis. The factors that limit acceptance are traced in part, to the large resources needed to understand the details of computer programs, the inability to include measured data as input to many of the theoretical programs, and the presentation of final results without supporting intermediate answers. Other factors are due solely to technical issues such as limited detail in aircraft synthesis and major simplifying assumptions in the technical specialties. These factors and others can be influenced by the technical specialist and aircraft designer. Some of these factors may become less significant as the computerization process evolves, but some issues, such as understanding large integrated systems, may remain issues in the future. Suggestions for improved acceptance include publishing computer programs so that they may be reviewed, edited, and read. Other mechanisms include extensive modularization of programs and ways to include measured information as part of the input to theoretical approaches.
Institutional Computing Executive Group Review of Multi-programmatic & Institutional Computing, Fiscal Year 2005 and 2006

DOE Office of Scientific and Technical Information (OSTI.GOV)

Langer, S; Rotman, D; Schwegler, E

The Institutional Computing Executive Group (ICEG) review of FY05-06 Multiprogrammatic and Institutional Computing (M and IC) activities is presented in the attached report. In summary, we find that the M and IC staff does an outstanding job of acquiring and supporting a wide range of institutional computing resources to meet the programmatic and scientific goals of LLNL. The responsiveness and high quality of support given to users and the programs investing in M and IC reflects the dedication and skill of the M and IC staff. M and IC has successfully managed serial capacity, parallel capacity, and capability computing resources.more » Serial capacity computing supports a wide range of scientific projects which require access to a few high performance processors within a shared memory computer. Parallel capacity computing supports scientific projects that require a moderate number of processors (up to roughly 1000) on a parallel computer. Capability computing supports parallel jobs that push the limits of simulation science. M and IC has worked closely with Stockpile Stewardship, and together they have made LLNL a premier institution for computational and simulation science. Such a standing is vital to the continued success of laboratory science programs and to the recruitment and retention of top scientists. This report provides recommendations to build on M and IC's accomplishments and improve simulation capabilities at LLNL. We recommend that institution fully fund (1) operation of the atlas cluster purchased in FY06 to support a few large projects; (2) operation of the thunder and zeus clusters to enable 'mid-range' parallel capacity simulations during normal operation and a limited number of large simulations during dedicated application time; (3) operation of the new yana cluster to support a wide range of serial capacity simulations; (4) improvements to the reliability and performance of the Lustre parallel file system; (5) support for the new GDO petabyte-class storage facility on the green network for use in data intensive external collaborations; and (6) continued support for visualization and other methods for analyzing large simulations. We also recommend that M and IC begin planning in FY07 for the next upgrade of its parallel clusters. LLNL investments in M and IC have resulted in a world-class simulation capability leading to innovative science. We thank the LLNL management for its continued support and thank the M and IC staff for its vision and dedicated efforts to make it all happen.« less
The Opportunity and Challenge of The Age of Big Data

NASA Astrophysics Data System (ADS)

Yunguo, Hong

2017-11-01

The arrival of large data age has gradually expanded the scale of information industry in China, which has created favorable conditions for the expansion of information technology and computer network. Based on big data the computer system service function is becoming more and more perfect, and the efficiency of data processing in the system is improving, which provides important guarantee for the implementation of production plan in various industries. At the same time, the rapid development of fields such as Internet of things, social tools, cloud computing and the widen of information channel, these make the amount of data is increase, expand the influence range of the age of big data, we need to take the opportunities and challenges of the age of big data correctly, use data information resources effectively. Based on this, this paper will study the opportunities and challenges of the era of large data.
Information Power Grid (IPG) Tutorial 2003

NASA Technical Reports Server (NTRS)

Meyers, George

2003-01-01

For NASA and the general community today Grid middleware: a) provides tools to access/use data sources (databases, instruments, ...); b) provides tools to access computing (unique and generic); c) Is an enabler of large scale collaboration. Dynamically responding to needs is a key selling point of a grid. Independent resources can be joined as appropriate to solve a problem. Provide tools to enable the building of a frameworks for application. Provide value added service to the NASA user base for utilizing resources on the grid in new and more efficient ways. Provides tools for development of Frameworks.
Access and visualization using clusters and other parallel computers

NASA Technical Reports Server (NTRS)

Katz, Daniel S.; Bergou, Attila; Berriman, Bruce; Block, Gary; Collier, Jim; Curkendall, Dave; Good, John; Husman, Laura; Jacob, Joe; Laity, Anastasia;

2003-01-01

JPL's Parallel Applications Technologies Group has been exploring the issues of data access and visualization of very large data sets over the past 10 or so years. this work has used a number of types of parallel computers, and today includes the use of commodity clusters. This talk will highlight some of the applications and tools we have developed, including how they use parallel computing resources, and specifically how we are using modern clusters. Our applications focus on NASA's needs; thus our data sets are usually related to Earth and Space Science, including data delivered from instruments in space, and data produced by telescopes on the ground.

How Data Becomes Physics: Inside the RACF

ScienceCinema

Ernst, Michael; Rind, Ofer; Rajagopalan, Srini; Lauret, Jerome; Pinkenburg, Chris

2018-06-22

The RHIC & ATLAS Computing Facility (RACF) at the U.S. Department of Energyâs (DOE) Brookhaven National Laboratory sits at the center of a global computing network. It connects more than 2,500 researchers around the world with the data generated by millions of particle collisions taking place each second at Brookhaven Lab's Relativistic Heavy Ion Collider (RHIC, a DOE Office of Science User Facility for nuclear physics research), and the ATLAS experiment at the Large Hadron Collider in Europe. Watch this video to learn how the people and computing resources of the RACF serve these scientists to turn petabytes of raw data into physics discoveries.
Low-cost space-varying FIR filter architecture for computational imaging systems

NASA Astrophysics Data System (ADS)

Feng, Guotong; Shoaib, Mohammed; Schwartz, Edward L.; Dirk Robinson, M.

2010-01-01

Recent research demonstrates the advantage of designing electro-optical imaging systems by jointly optimizing the optical and digital subsystems. The optical systems designed using this joint approach intentionally introduce large and often space-varying optical aberrations that produce blurry optical images. Digital sharpening restores reduced contrast due to these intentional optical aberrations. Computational imaging systems designed in this fashion have several advantages including extended depth-of-field, lower system costs, and improved low-light performance. Currently, most consumer imaging systems lack the necessary computational resources to compensate for these optical systems with large aberrations in the digital processor. Hence, the exploitation of the advantages of the jointly designed computational imaging system requires low-complexity algorithms enabling space-varying sharpening. In this paper, we describe a low-cost algorithmic framework and associated hardware enabling the space-varying finite impulse response (FIR) sharpening required to restore largely aberrated optical images. Our framework leverages the space-varying properties of optical images formed using rotationally-symmetric optical lens elements. First, we describe an approach to leverage the rotational symmetry of the point spread function (PSF) about the optical axis allowing computational savings. Second, we employ a specially designed bank of sharpening filters tuned to the specific radial variation common to optical aberrations. We evaluate the computational efficiency and image quality achieved by using this low-cost space-varying FIR filter architecture.

An Assessment of Artificial Compressibility and Pressure Projection Methods for Incompressible Flow Simulations

NASA Technical Reports Server (NTRS)

Kwak, Dochan; Kiris, C.; Smith, Charles A. (Technical Monitor)

1998-01-01

Performance of the two commonly used numerical procedures, one based on artificial compressibility method and the other pressure projection method, are compared. These formulations are selected primarily because they are designed for three-dimensional applications. The computational procedures are compared by obtaining steady state solutions of a wake vortex and unsteady solutions of a curved duct flow. For steady computations, artificial compressibility was very efficient in terms of computing time and robustness. For an unsteady flow which requires small physical time step, pressure projection method was found to be computationally more efficient than an artificial compressibility method. This comparison is intended to give some basis for selecting a method or a flow solution code for large three-dimensional applications where computing resources become a critical issue.
Hybrid Quantum-Classical Approach to Quantum Optimal Control.

PubMed

Li, Jun; Yang, Xiaodong; Peng, Xinhua; Sun, Chang-Pu

2017-04-14

A central challenge in quantum computing is to identify more computational problems for which utilization of quantum resources can offer significant speedup. Here, we propose a hybrid quantum-classical scheme to tackle the quantum optimal control problem. We show that the most computationally demanding part of gradient-based algorithms, namely, computing the fitness function and its gradient for a control input, can be accomplished by the process of evolution and measurement on a quantum simulator. By posing queries to and receiving answers from the quantum simulator, classical computing devices update the control parameters until an optimal control solution is found. To demonstrate the quantum-classical scheme in experiment, we use a seven-qubit nuclear magnetic resonance system, on which we have succeeded in optimizing state preparation without involving classical computation of the large Hilbert space evolution.
Provider-Independent Use of the Cloud

NASA Astrophysics Data System (ADS)

Harmer, Terence; Wright, Peter; Cunningham, Christina; Perrott, Ron

Utility computing offers researchers and businesses the potential of significant cost-savings, making it possible for them to match the cost of their computing and storage to their demand for such resources. A utility compute provider enables the purchase of compute infrastructures on-demand; when a user requires computing resources a provider will provision a resource for them and charge them only for their period of use of that resource. There has been a significant growth in the number of cloud computing resource providers and each has a different resource usage model, application process and application programming interface (API)-developing generic multi-resource provider applications is thus difficult and time consuming. We have developed an abstraction layer that provides a single resource usage model, user authentication model and API for compute providers that enables cloud-provider neutral applications to be developed. In this paper we outline the issues in using external resource providers, give examples of using a number of the most popular cloud providers and provide examples of developing provider neutral applications. In addition, we discuss the development of the API to create a generic provisioning model based on a common architecture for cloud computing providers.
dV/dt - Accelerating the Rate of Progress towards Extreme Scale Collaborative Science

DOE Office of Scientific and Technical Information (OSTI.GOV)

Livny, Miron

This report introduces publications that report the results of a project that aimed to design a computational framework that enables computational experimentation at scale while supporting the model of “submit locally, compute globally”. The project focuses on estimating application resource needs, finding the appropriate computing resources, acquiring those resources,deploying the applications and data on the resources, managing applications and resources during run.
Cloud-based MOTIFSIM: Detecting Similarity in Large DNA Motif Data Sets.

PubMed

Tran, Ngoc Tam L; Huang, Chun-Hsi

2017-05-01

We developed the cloud-based MOTIFSIM on Amazon Web Services (AWS) cloud. The tool is an extended version from our web-based tool version 2.0, which was developed based on a novel algorithm for detecting similarity in multiple DNA motif data sets. This cloud-based version further allows researchers to exploit the computing resources available from AWS to detect similarity in multiple large-scale DNA motif data sets resulting from the next-generation sequencing technology. The tool is highly scalable with expandable AWS.
The application of cloud computing to scientific workflows: a study of cost and performance.

PubMed

Berriman, G Bruce; Deelman, Ewa; Juve, Gideon; Rynge, Mats; Vöckler, Jens-S

2013-01-28

The current model of transferring data from data centres to desktops for analysis will soon be rendered impractical by the accelerating growth in the volume of science datasets. Processing will instead often take place on high-performance servers co-located with data. Evaluations of how new technologies such as cloud computing would support such a new distributed computing model are urgently needed. Cloud computing is a new way of purchasing computing and storage resources on demand through virtualization technologies. We report here the results of investigations of the applicability of commercial cloud computing to scientific computing, with an emphasis on astronomy, including investigations of what types of applications can be run cheaply and efficiently on the cloud, and an example of an application well suited to the cloud: processing a large dataset to create a new science product.
Scientific Services on the Cloud

NASA Astrophysics Data System (ADS)

Chapman, David; Joshi, Karuna P.; Yesha, Yelena; Halem, Milt; Yesha, Yaacov; Nguyen, Phuong

Scientific Computing was one of the first every applications for parallel and distributed computation. To this date, scientific applications remain some of the most compute intensive, and have inspired creation of petaflop compute infrastructure such as the Oak Ridge Jaguar and Los Alamos RoadRunner. Large dedicated hardware infrastructure has become both a blessing and a curse to the scientific community. Scientists are interested in cloud computing for much the same reason as businesses and other professionals. The hardware is provided, maintained, and administrated by a third party. Software abstraction and virtualization provide reliability, and fault tolerance. Graduated fees allow for multi-scale prototyping and execution. Cloud computing resources are only a few clicks away, and by far the easiest high performance distributed platform to gain access to. There may still be dedicated infrastructure for ultra-scale science, but the cloud can easily play a major part of the scientific computing initiative.
Machine-learned and codified synthesis parameters of oxide materials

NASA Astrophysics Data System (ADS)

Kim, Edward; Huang, Kevin; Tomala, Alex; Matthews, Sara; Strubell, Emma; Saunders, Adam; McCallum, Andrew; Olivetti, Elsa

2017-09-01

Predictive materials design has rapidly accelerated in recent years with the advent of large-scale resources, such as materials structure and property databases generated by ab initio computations. In the absence of analogous ab initio frameworks for materials synthesis, high-throughput and machine learning techniques have recently been harnessed to generate synthesis strategies for select materials of interest. Still, a community-accessible, autonomously-compiled synthesis planning resource which spans across materials systems has not yet been developed. In this work, we present a collection of aggregated synthesis parameters computed using the text contained within over 640,000 journal articles using state-of-the-art natural language processing and machine learning techniques. We provide a dataset of synthesis parameters, compiled autonomously across 30 different oxide systems, in a format optimized for planning novel syntheses of materials.
Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sreepathi, Sarat; Kumar, Jitendra; Mills, Richard T.

A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like themore » Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.« less
Airloads on Bluff Bodies, with Application to the Rotor-Induced Downloads on Tilt-Rotor Aircraft.

DTIC Science & Technology

1983-09-01

interference aerodynamics would be tion on hover performance (Ref. (11). to study the two-dimensional sec- tion characteristics of a wing in the wake of a...resources for large numbers of vortices; a typical case requires 10-15 min CPU time on the Ames Cray IS computer. Figure 6 shows a typical result. Here...CPU time per case on a Prime 550UPPER SURFACE (WINDWARD) computer to converge to a steady solution; this would be equivalent to one or two seconds on
The economics of time shared computing: Congestion, user costs and capacity

NASA Technical Reports Server (NTRS)

Agnew, C. E.

1982-01-01

Time shared systems permit the fixed costs of computing resources to be spread over large numbers of users. However, bottleneck results in the theory of closed queueing networks can be used to show that this economy of scale will be offset by the increased congestion that results as more users are added to the system. If one considers the total costs, including the congestion cost, there is an optimal number of users for a system which equals the saturation value usually used to define system capacity.
Putting Order Into the Cloud: Object-oriented UML-based Rule Enforcement for Document and Application Organization

DTIC Science & Technology

2010-09-01

Cloud computing describes a new distributed computing paradigm for IT data and services that involves over-the-Internet provision of dynamically scalable and often virtualized resources. While cost reduction and flexibility in storage, services, and maintenance are important considerations when deciding on whether or how to migrate data and applications to the cloud, large organizations like the Department of Defense need to consider the organization and structure of data on the cloud and the operations on such data in order to reap the full benefit of cloud
An Investigation of the Determinants of Employees' Decisions to Use Organizational Computing Resources for Non-Work Purposes

ERIC Educational Resources Information Center

Campbell, Stephen Matthew

2010-01-01

Internet access in the workplace has become ubiquitous in many organizations. Often, employees need this access to perform their duties. However, many studies report a large percentage of employees use their work Internet access for non-work-related activities. These activities can result in reduced efficiency, increased vulnerability to cyber…
What Are They Thinking? Automated Analysis of Student Writing about Acid-Base Chemistry in Introductory Biology

ERIC Educational Resources Information Center

Haudek, Kevin C.; Prevost, Luanna B.; Moscarella, Rosa A.; Merrill, John; Urban-Lurain, Mark

2012-01-01

Students' writing can provide better insight into their thinking than can multiple-choice questions. However, resource constraints often prevent faculty from using writing assessments in large undergraduate science courses. We investigated the use of computer software to analyze student writing and to uncover student ideas about chemistry in an…
Recommendations for open data science.

PubMed

Gymrek, Melissa; Farjoun, Yossi

2016-01-01

Life science research increasingly relies on large-scale computational analyses. However, the code and data used for these analyses are often lacking in publications. To maximize scientific impact, reproducibility, and reuse, it is crucial that these resources are made publicly available and are fully transparent. We provide recommendations for improving the openness of data-driven studies in life sciences.
Beginning School Math Competence: Minority and Majority Comparisons. Report No. 34.

ERIC Educational Resources Information Center

Entwisle, Doris R.; Alexander, Karl L.

This paper uses a structural model with a large random sample of urban children to explain children's competence in math concepts and computation at the time they begin first grade. These two aspects of math ability respond differently to environmental resources, with math concepts much more responsive to family factors before formal schooling…
A Concept for the One Degree Imager (ODI) Data Reduction Pipeline and Archiving System

NASA Astrophysics Data System (ADS)

Knezek, Patricia; Stobie, B.; Michael, S.; Valdes, F.; Marru, S.; Henschel, R.; Pierce, M.

2010-05-01

The One Degree Imager (ODI), currently being built by the WIYN Observatory, will provide tremendous possibilities for conducting diverse scientific programs. ODI will be a complex instrument, using non-conventional Orthogonal Transfer Array (OTA) detectors. Due to its large field of view, small pixel size, use of OTA technology, and expected frequent use, ODI will produce vast amounts of astronomical data. If ODI is to achieve its full potential, a data reduction pipeline must be developed. Long-term archiving must also be incorporated into the pipeline system to ensure the continued value of ODI data. This paper presents a concept for an ODI data reduction pipeline and archiving system. To limit costs and development time, our plan leverages existing software and hardware, including existing pipeline software, Science Gateways, Computational Grid & Cloud Technology, Indiana University's Data Capacitor and Massive Data Storage System, and TeraGrid compute resources. Existing pipeline software will be augmented to add functionality required to meet challenges specific to ODI, enhance end-user control, and enable the execution of the pipeline on grid resources including national grid resources such as the TeraGrid and Open Science Grid. The planned system offers consistent standard reductions and end-user flexibility when working with images beyond the initial instrument signature removal. It also gives end-users access to computational and storage resources far beyond what are typically available at most institutions. Overall, the proposed system provides a wide array of software tools and the necessary hardware resources to use them effectively.
Computation of scattering matrix elements of large and complex shaped absorbing particles with multilevel fast multipole algorithm

NASA Astrophysics Data System (ADS)

Wu, Yueqian; Yang, Minglin; Sheng, Xinqing; Ren, Kuan Fang

2015-05-01

Light scattering properties of absorbing particles, such as the mineral dusts, attract a wide attention due to its importance in geophysical and environment researches. Due to the absorbing effect, light scattering properties of particles with absorption differ from those without absorption. Simple shaped absorbing particles such as spheres and spheroids have been well studied with different methods but little work on large complex shaped particles has been reported. In this paper, the surface Integral Equation (SIE) with Multilevel Fast Multipole Algorithm (MLFMA) is applied to study scattering properties of large non-spherical absorbing particles. SIEs are carefully discretized with piecewise linear basis functions on triangle patches to model whole surface of the particle, hence computation resource needs increase much more slowly with the particle size parameter than the volume discretized methods. To improve further its capability, MLFMA is well parallelized with Message Passing Interface (MPI) on distributed memory computer platform. Without loss of generality, we choose the computation of scattering matrix elements of absorbing dust particles as an example. The comparison of the scattering matrix elements computed by our method and the discrete dipole approximation method (DDA) for an ellipsoid dust particle shows that the precision of our method is very good. The scattering matrix elements of large ellipsoid dusts with different aspect ratios and size parameters are computed. To show the capability of the presented algorithm for complex shaped particles, scattering by asymmetry Chebyshev particle with size parameter larger than 600 of complex refractive index m = 1.555 + 0.004 i and different orientations are studied.
Expanding the user base beyond HEP for the Ganga distributed analysis user interface

NASA Astrophysics Data System (ADS)

Currie, R.; Egede, U.; Richards, A.; Slater, M.; Williams, M.

2017-10-01

This document presents the result of recent developments within Ganga[1] project to support users from new communities outside of HEP. In particular I will examine the case of users from the Large Scale Survey Telescope (LSST) group looking to use resources provided by the UK based GridPP[2][3] DIRAC[4][5] instance. An example use case is work performed with users from the LSST Virtual Organisation (VO) to distribute the workflow used for galaxy shape identification analyses. This work highlighted some LSST specific challenges which could be well solved by common tools within the HEP community. As a result of this work the LSST community was able to take advantage of GridPP[2][3] resources to perform large computing tasks within the UK.
Fast and Scalable Computation of the Forward and Inverse Discrete Periodic Radon Transform.

PubMed

Carranza, Cesar; Llamocca, Daniel; Pattichis, Marios

2016-01-01

The discrete periodic radon transform (DPRT) has extensively been used in applications that involve image reconstructions from projections. Beyond classic applications, the DPRT can also be used to compute fast convolutions that avoids the use of floating-point arithmetic associated with the use of the fast Fourier transform. Unfortunately, the use of the DPRT has been limited by the need to compute a large number of additions and the need for a large number of memory accesses. This paper introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: a parallel array of fixed-point adder trees; circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees; an image block-based approach to DPRT computation that can fit the proposed architecture to available resources; and fast transpositions that are computed in one or a few clock cycles that do not depend on the size of the input image. As a result, for an N × N image (N prime), the proposed approach can compute up to N(2) additions per clock cycle. Compared with the previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For example, for a 251×251 image, for approximately 25% fewer flip-flops than required for a systolic implementation, we have that the scalable DPRT is computed 36 times faster. For the fastest case, we introduce optimized just 2N + ⌈log(2) N⌉ + 1 and 2N + 3 ⌈log(2) N⌉ + B + 2 cycles, architectures that can compute the DPRT and its inverse in respectively, where B is the number of bits used to represent each input pixel. On the other hand, the scalable DPRT approach requires more 1-b additions than for the systolic implementation and provides a tradeoff between speed and additional 1-b additions. All of the proposed DPRT architectures were implemented in VHSIC Hardware Description Language (VHDL) and validated using an Field-Programmable Gate Array (FPGA) implementation.

Parallel Simulation of Unsteady Turbulent Flames

NASA Technical Reports Server (NTRS)

Menon, Suresh

1996-01-01

Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
Concrete resource analysis of the quantum linear-system algorithm used to compute the electromagnetic scattering cross section of a 2D target

NASA Astrophysics Data System (ADS)

Scherer, Artur; Valiron, Benoît; Mau, Siun-Chuon; Alexander, Scott; van den Berg, Eric; Chapuran, Thomas E.

2017-03-01

We provide a detailed estimate for the logical resource requirements of the quantum linear-system algorithm (Harrow et al. in Phys Rev Lett 103:150502, 2009) including the recently described elaborations and application to computing the electromagnetic scattering cross section of a metallic target (Clader et al. in Phys Rev Lett 110:250504, 2013). Our resource estimates are based on the standard quantum-circuit model of quantum computation; they comprise circuit width (related to parallelism), circuit depth (total number of steps), the number of qubits and ancilla qubits employed, and the overall number of elementary quantum gate operations as well as more specific gate counts for each elementary fault-tolerant gate from the standard set { X, Y, Z, H, S, T, { CNOT } }. In order to perform these estimates, we used an approach that combines manual analysis with automated estimates generated via the Quipper quantum programming language and compiler. Our estimates pertain to the explicit example problem size N=332{,}020{,}680 beyond which, according to a crude big-O complexity comparison, the quantum linear-system algorithm is expected to run faster than the best known classical linear-system solving algorithm. For this problem size, a desired calculation accuracy ɛ =0.01 requires an approximate circuit width 340 and circuit depth of order 10^{25} if oracle costs are excluded, and a circuit width and circuit depth of order 10^8 and 10^{29}, respectively, if the resource requirements of oracles are included, indicating that the commonly ignored oracle resources are considerable. In addition to providing detailed logical resource estimates, it is also the purpose of this paper to demonstrate explicitly (using a fine-grained approach rather than relying on coarse big-O asymptotic approximations) how these impressively large numbers arise with an actual circuit implementation of a quantum algorithm. While our estimates may prove to be conservative as more efficient advanced quantum-computation techniques are developed, they nevertheless provide a valid baseline for research targeting a reduction of the algorithmic-level resource requirements, implying that a reduction by many orders of magnitude is necessary for the algorithm to become practical.
Improving Design Efficiency for Large-Scale Heterogeneous Circuits

NASA Astrophysics Data System (ADS)

Gregerson, Anthony

Despite increases in logic density, many Big Data applications must still be partitioned across multiple computing devices in order to meet their strict performance requirements. Among the most demanding of these applications is high-energy physics (HEP), which uses complex computing systems consisting of thousands of FPGAs and ASICs to process the sensor data created by experiments at particles accelerators such as the Large Hadron Collider (LHC). Designing such computing systems is challenging due to the scale of the systems, the exceptionally high-throughput and low-latency performance constraints that necessitate application-specific hardware implementations, the requirement that algorithms are efficiently partitioned across many devices, and the possible need to update the implemented algorithms during the lifetime of the system. In this work, we describe our research to develop flexible architectures for implementing such large-scale circuits on FPGAs. In particular, this work is motivated by (but not limited in scope to) high-energy physics algorithms for the Compact Muon Solenoid (CMS) experiment at the LHC. To make efficient use of logic resources in multi-FPGA systems, we introduce Multi-Personality Partitioning, a novel form of the graph partitioning problem, and present partitioning algorithms that can significantly improve resource utilization on heterogeneous devices while also reducing inter-chip connections. To reduce the high communication costs of Big Data applications, we also introduce Information-Aware Partitioning, a partitioning method that analyzes the data content of application-specific circuits, characterizes their entropy, and selects circuit partitions that enable efficient compression of data between chips. We employ our information-aware partitioning method to improve the performance of the hardware validation platform for evaluating new algorithms for the CMS experiment. Together, these research efforts help to improve the efficiency and decrease the cost of the developing large-scale, heterogeneous circuits needed to enable large-scale application in high-energy physics and other important areas.
TomoMiner and TomoMinerCloud: A software platform for large-scale subtomogram structural analysis

PubMed Central

Frazier, Zachary; Xu, Min; Alber, Frank

2017-01-01

SUMMARY Cryo-electron tomography (cryoET) captures the 3D electron density distribution of macromolecular complexes in close to native state. With the rapid advance of cryoET acquisition technologies, it is possible to generate large numbers (>100,000) of subtomograms, each containing a macromolecular complex. Often, these subtomograms represent a heterogeneous sample due to variations in structure and composition of a complex in situ form or because particles are a mixture of different complexes. In this case subtomograms must be classified. However, classification of large numbers of subtomograms is a time-intensive task and often a limiting bottleneck. This paper introduces an open source software platform, TomoMiner, for large-scale subtomogram classification, template matching, subtomogram averaging, and alignment. Its scalable and robust parallel processing allows efficient classification of tens to hundreds of thousands of subtomograms. Additionally, TomoMiner provides a pre-configured TomoMinerCloud computing service permitting users without sufficient computing resources instant access to TomoMiners high-performance features. PMID:28552576
Statistics Online Computational Resource for Education

ERIC Educational Resources Information Center

Dinov, Ivo D.; Christou, Nicolas

2009-01-01

The Statistics Online Computational Resource (http://www.SOCR.ucla.edu) provides one of the largest collections of free Internet-based resources for probability and statistics education. SOCR develops, validates and disseminates two core types of materials--instructional resources and computational libraries. (Contains 2 figures.)
Water resources of the Black Sea Basin at high spatial and temporal resolution

NASA Astrophysics Data System (ADS)

Rouholahnejad, Elham; Abbaspour, Karim C.; Srinivasan, Raghvan; Bacu, Victor; Lehmann, Anthony

2014-07-01

The pressure on water resources, deteriorating water quality, and uncertainties associated with the climate change create an environment of conflict in large and complex river system. The Black Sea Basin (BSB), in particular, suffers from ecological unsustainability and inadequate resource management leading to severe environmental, social, and economical problems. To better tackle the future challenges, we used the Soil and Water Assessment Tool (SWAT) to model the hydrology of the BSB coupling water quantity, water quality, and crop yield components. The hydrological model of the BSB was calibrated and validated considering sensitivity and uncertainty analysis. River discharges, nitrate loads, and crop yields were used to calibrate the model. Employing grid technology improved calibration computation time by more than an order of magnitude. We calculated components of water resources such as river discharge, infiltration, aquifer recharge, soil moisture, and actual and potential evapotranspiration. Furthermore, available water resources were calculated at subbasin spatial and monthly temporal levels. Within this framework, a comprehensive database of the BSB was created to fill the existing gaps in water resources data in the region. In this paper, we discuss the challenges of building a large-scale model in fine spatial and temporal detail. This study provides the basis for further research on the impacts of climate and land use change on water resources in the BSB.
An Architecture for Cross-Cloud System Management

NASA Astrophysics Data System (ADS)

Dodda, Ravi Teja; Smith, Chris; van Moorsel, Aad

The emergence of the cloud computing paradigm promises flexibility and adaptability through on-demand provisioning of compute resources. As the utilization of cloud resources extends beyond a single provider, for business as well as technical reasons, the issue of effectively managing such resources comes to the fore. Different providers expose different interfaces to their compute resources utilizing varied architectures and implementation technologies. This heterogeneity poses a significant system management problem, and can limit the extent to which the benefits of cross-cloud resource utilization can be realized. We address this problem through the definition of an architecture to facilitate the management of compute resources from different cloud providers in an homogenous manner. This preserves the flexibility and adaptability promised by the cloud computing paradigm, whilst enabling the benefits of cross-cloud resource utilization to be realized. The practical efficacy of the architecture is demonstrated through an implementation utilizing compute resources managed through different interfaces on the Amazon Elastic Compute Cloud (EC2) service. Additionally, we provide empirical results highlighting the performance differential of these different interfaces, and discuss the impact of this performance differential on efficiency and profitability.
Quantum Computing: Selected Internet Resources for Librarians, Researchers, and the Casually Curious

ERIC Educational Resources Information Center

Cirasella, Jill

2009-01-01

This article presents an annotated selection of the most important and informative Internet resources for learning about quantum computing, finding quantum computing literature, and tracking quantum computing news. All of the quantum computing resources described in this article are freely available, English-language web sites that fall into one…
Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package

PubMed Central

2012-01-01

Background Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. Results In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Conclusions Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org. PMID:23281941
Personalized cloud-based bioinformatics services for research and education: use cases and the elasticHPC package.

PubMed

El-Kalioby, Mohamed; Abouelhoda, Mohamed; Krüger, Jan; Giegerich, Robert; Sczyrba, Alexander; Wall, Dennis P; Tonellato, Peter

2012-01-01

Bioinformatics services have been traditionally provided in the form of a web-server that is hosted at institutional infrastructure and serves multiple users. This model, however, is not flexible enough to cope with the increasing number of users, increasing data size, and new requirements in terms of speed and availability of service. The advent of cloud computing suggests a new service model that provides an efficient solution to these problems, based on the concepts of "resources-on-demand" and "pay-as-you-go". However, cloud computing has not yet been introduced within bioinformatics servers due to the lack of usage scenarios and software layers that address the requirements of the bioinformatics domain. In this paper, we provide different use case scenarios for providing cloud computing based services, considering both the technical and financial aspects of the cloud computing service model. These scenarios are for individual users seeking computational power as well as bioinformatics service providers aiming at provision of personalized bioinformatics services to their users. We also present elasticHPC, a software package and a library that facilitates the use of high performance cloud computing resources in general and the implementation of the suggested bioinformatics scenarios in particular. Concrete examples that demonstrate the suggested use case scenarios with whole bioinformatics servers and major sequence analysis tools like BLAST are presented. Experimental results with large datasets are also included to show the advantages of the cloud model. Our use case scenarios and the elasticHPC package are steps towards the provision of cloud based bioinformatics services, which would help in overcoming the data challenge of recent biological research. All resources related to elasticHPC and its web-interface are available at http://www.elasticHPC.org.
A framework supporting the development of a Grid portal for analysis based on ROI.

PubMed

Ichikawa, K; Date, S; Kaishima, T; Shimojo, S

2005-01-01

In our research on brain function analysis, users require two different simultaneous types of processing: interactive processing to a specific part of data and high-performance batch processing to an entire dataset. The difference between these two types of processing is in whether or not the analysis is for data in the region of interest (ROI). In this study, we propose a Grid portal that has a mechanism to freely assign computing resources to the users on a Grid environment according to the users' two different types of processing requirements. We constructed a Grid portal which integrates interactive processing and batch processing by the following two mechanisms. First, a job steering mechanism controls job execution based on user-tagged priority among organizations with heterogeneous computing resources. Interactive jobs are processed in preference to batch jobs by this mechanism. Second, a priority-based result delivery mechanism that administrates a rank of data significance. The portal ensures a turn-around time of interactive processing by the priority-based job controlling mechanism, and provides the users with quality of services (QoS) for interactive processing. The users can access the analysis results of interactive jobs in preference to the analysis results of batch jobs. The Grid portal has also achieved high-performance computation of MEG analysis with batch processing on the Grid environment. The priority-based job controlling mechanism has been realized to freely assign computing resources to the users' requirements. Furthermore the achievement of high-performance computation contributes greatly to the overall progress of brain science. The portal has thus made it possible for the users to flexibly include the large computational power in what they want to analyze.
DOE Office of Scientific and Technical Information (OSTI.GOV)

None, None

The Second SIAM Conference on Computational Science and Engineering was held in San Diego from February 10-12, 2003. Total conference attendance was 553. This is a 23% increase in attendance over the first conference. The focus of this conference was to draw attention to the tremendous range of major computational efforts on large problems in science and engineering, to promote the interdisciplinary culture required to meet these large-scale challenges, and to encourage the training of the next generation of computational scientists. Computational Science & Engineering (CS&E) is now widely accepted, along with theory and experiment, as a crucial third modemore » of scientific investigation and engineering design. Aerospace, automotive, biological, chemical, semiconductor, and other industrial sectors now rely on simulation for technical decision support. For federal agencies also, CS&E has become an essential support for decisions on resources, transportation, and defense. CS&E is, by nature, interdisciplinary. It grows out of physical applications and it depends on computer architecture, but at its heart are powerful numerical algorithms and sophisticated computer science techniques. From an applied mathematics perspective, much of CS&E has involved analysis, but the future surely includes optimization and design, especially in the presence of uncertainty. Another mathematical frontier is the assimilation of very large data sets through such techniques as adaptive multi-resolution, automated feature search, and low-dimensional parameterization. The themes of the 2003 conference included, but were not limited to: Advanced Discretization Methods; Computational Biology and Bioinformatics; Computational Chemistry and Chemical Engineering; Computational Earth and Atmospheric Sciences; Computational Electromagnetics; Computational Fluid Dynamics; Computational Medicine and Bioengineering; Computational Physics and Astrophysics; Computational Solid Mechanics and Materials; CS&E Education; Meshing and Adaptivity; Multiscale and Multiphysics Problems; Numerical Algorithms for CS&E; Discrete and Combinatorial Algorithms for CS&E; Inverse Problems; Optimal Design, Optimal Control, and Inverse Problems; Parallel and Distributed Computing; Problem-Solving Environments; Software and Wddleware Systems; Uncertainty Estimation and Sensitivity Analysis; and Visualization and Computer Graphics.« less
Automated Topographic Change Detection via Dem Differencing at Large Scales Using The Arcticdem Database

NASA Astrophysics Data System (ADS)

Candela, S. G.; Howat, I.; Noh, M. J.; Porter, C. C.; Morin, P. J.

2016-12-01

In the last decade, high resolution satellite imagery has become an increasingly accessible tool for geoscientists to quantify changes in the Arctic land surface due to geophysical, ecological and anthropomorphic processes. However, the trade off between spatial coverage and spatial-temporal resolution has limited detailed, process-level change detection over large (i.e. continental) scales. The ArcticDEM project utilized over 300,000 Worldview image pairs to produce a nearly 100% coverage elevation model (above 60°N) offering the first polar, high spatial - high resolution (2-8m by region) dataset, often with multiple repeats in areas of particular interest to geo-scientists. A dataset of this size (nearly 250 TB) offers endless new avenues of scientific inquiry, but quickly becomes unmanageable computationally and logistically for the computing resources available to the average scientist. Here we present TopoDiff, a framework for a generalized. automated workflow that requires minimal input from the end user about a study site, and utilizes cloud computing resources to provide a temporally sorted and differenced dataset, ready for geostatistical analysis. This hands-off approach allows the end user to focus on the science, without having to manage thousands of files, or petabytes of data. At the same time, TopoDiff provides a consistent and accurate workflow for image sorting, selection, and co-registration enabling cross-comparisons between research projects.
Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data

PubMed Central

2013-01-01

Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made. PMID:23800020
A comprehensive overview of computational resources to aid in precision genome editing with engineered nucleases.

PubMed

Periwal, Vinita

2017-07-01

Genome editing with engineered nucleases (zinc finger nucleases, TAL effector nucleases s and Clustered regularly inter-spaced short palindromic repeats/CRISPR-associated) has recently been shown to have great promise in a variety of therapeutic and biotechnological applications. However, their exploitation in genetic analysis and clinical settings largely depends on their specificity for the intended genomic target. Large and complex genomes often contain highly homologous/repetitive sequences, which limits the specificity of genome editing tools and could result in off-target activity. Over the past few years, various computational approaches have been developed to assist the design process and predict/reduce the off-target activity of these nucleases. These tools could be efficiently used to guide the design of constructs for engineered nucleases and evaluate results after genome editing. This review provides a comprehensive overview of various databases, tools, web servers and resources for genome editing and compares their features and functionalities. Additionally, it also describes tools that have been developed to analyse post-genome editing results. The article also discusses important design parameters that could be considered while designing these nucleases. This review is intended to be a quick reference guide for experimentalists as well as computational biologists working in the field of genome editing with engineered nucleases. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Grid computing technology for hydrological applications

NASA Astrophysics Data System (ADS)

Lecca, G.; Petitdidier, M.; Hluchy, L.; Ivanovic, M.; Kussul, N.; Ray, N.; Thieron, V.

2011-06-01

SummaryAdvances in e-Infrastructure promise to revolutionize sensing systems and the way in which data are collected and assimilated, and complex water systems are simulated and visualized. According to the EU Infrastructure 2010 work-programme, data and compute infrastructures and their underlying technologies, either oriented to tackle scientific challenges or complex problem solving in engineering, are expected to converge together into the so-called knowledge infrastructures, leading to a more effective research, education and innovation in the next decade and beyond. Grid technology is recognized as a fundamental component of e-Infrastructures. Nevertheless, this emerging paradigm highlights several topics, including data management, algorithm optimization, security, performance (speed, throughput, bandwidth, etc.), and scientific cooperation and collaboration issues that require further examination to fully exploit it and to better inform future research policies. The paper illustrates the results of six different surface and subsurface hydrology applications that have been deployed on the Grid. All the applications aim to answer to strong requirements from the Civil Society at large, relatively to natural and anthropogenic risks. Grid technology has been successfully tested to improve flood prediction, groundwater resources management and Black Sea hydrological survey, by providing large computing resources. It is also shown that Grid technology facilitates e-cooperation among partners by means of services for authentication and authorization, seamless access to distributed data sources, data protection and access right, and standardization.
Methodologies for optimal resource allocation to the national space program and new space utilizations. Volume 1: Technical description

NASA Technical Reports Server (NTRS)

1971-01-01

The optimal allocation of resources to the national space program over an extended time period requires the solution of a large combinatorial problem in which the program elements are interdependent. The computer model uses an accelerated search technique to solve this problem. The model contains a large number of options selectable by the user to provide flexible input and a broad range of output for use in sensitivity analyses of all entering elements. Examples of these options are budget smoothing under varied appropriation levels, entry of inflation and discount effects, and probabilistic output which provides quantified degrees of certainty that program costs will remain within planned budget. Criteria and related analytic procedures were established for identifying potential new space program directions. Used in combination with the optimal resource allocation model, new space applications can be analyzed in realistic perspective, including the advantage gain from existing space program plant and on-going programs such as the space transportation system.
Grid Computing and Collaboration Technology in Support of Fusion Energy Sciences

NASA Astrophysics Data System (ADS)

Schissel, D. P.

2004-11-01

The SciDAC Initiative is creating a computational grid designed to advance scientific understanding in fusion research by facilitating collaborations, enabling more effective integration of experiments, theory and modeling, and allowing more efficient use of experimental facilities. The philosophy is that data, codes, analysis routines, visualization tools, and communication tools should be thought of as easy to use network available services. Access to services is stressed rather than portability. Services share the same basic security infrastructure so that stakeholders can control their own resources and helps ensure fair use of resources. The collaborative control room is being developed using the open-source Access Grid software that enables secure group-to-group collaboration with capabilities beyond teleconferencing including application sharing and control. The ability to effectively integrate off-site scientists into a dynamic control room will be critical to the success of future international projects like ITER. Grid computing, the secure integration of computer systems over high-speed networks to provide on-demand access to data analysis capabilities and related functions, is being deployed as an alternative to traditional resource sharing among institutions. The first grid computational service deployed was the transport code TRANSP and included tools for run preparation, submission, monitoring and management. This approach saves user sites from the laborious effort of maintaining a complex code while at the same time reducing the burden on developers by avoiding the support of a large number of heterogeneous installations. This tutorial will present the philosophy behind an advanced collaborative environment, give specific examples, and discuss its usage beyond FES.
Does Cloud Computing in the Atmospheric Sciences Make Sense? A case study of hybrid cloud computing at NASA Langley Research Center

NASA Astrophysics Data System (ADS)

Nguyen, L.; Chee, T.; Minnis, P.; Spangenberg, D.; Ayers, J. K.; Palikonda, R.; Vakhnin, A.; Dubois, R.; Murphy, P. R.

2014-12-01

The processing, storage and dissemination of satellite cloud and radiation products produced at NASA Langley Research Center are key activities for the Climate Science Branch. A constellation of systems operates in sync to accomplish these goals. Because of the complexity involved with operating such intricate systems, there are both high failure rates and high costs for hardware and system maintenance. Cloud computing has the potential to ameliorate cost and complexity issues. Over time, the cloud computing model has evolved and hybrid systems comprising off-site as well as on-site resources are now common. Towards our mission of providing the highest quality research products to the widest audience, we have explored the use of the Amazon Web Services (AWS) Cloud and Storage and present a case study of our results and efforts. This project builds upon NASA Langley Cloud and Radiation Group's experience with operating large and complex computing infrastructures in a reliable and cost effective manner to explore novel ways to leverage cloud computing resources in the atmospheric science environment. Our case study presents the project requirements and then examines the fit of AWS with the LaRC computing model. We also discuss the evaluation metrics, feasibility, and outcomes and close the case study with the lessons we learned that would apply to others interested in exploring the implementation of the AWS system in their own atmospheric science computing environments.
Towards Large-area Field-scale Operational Evapotranspiration for Water Use Mapping

NASA Astrophysics Data System (ADS)

Senay, G. B.; Friedrichs, M.; Morton, C.; Huntington, J. L.; Verdin, J.

2017-12-01

Field-scale evapotranspiration (ET) estimates are needed for improving surface and groundwater use and water budget studies. Ideally, field-scale ET estimates would be at regional to national levels and cover long time periods. As a result of large data storage and computational requirements associated with processing field-scale satellite imagery such as Landsat, numerous challenges remain to develop operational ET estimates over large areas for detailed water use and availability studies. However, the combination of new science, data availability, and cloud computing technology is enabling unprecedented capabilities for ET mapping. To demonstrate this capability, we used Google's Earth Engine cloud computing platform to create nationwide annual ET estimates with 30-meter resolution Landsat ( 16,000 images) and gridded weather data using the Operational Simplified Surface Energy Balance (SSEBop) model in support of the National Water Census, a USGS research program designed to build decision support capacity for water management agencies and other natural resource managers. By leveraging Google's Earth Engine Application Programming Interface (API) and developing software in a collaborative, open-platform environment, we rapidly advance from research towards applications for large-area field-scale ET mapping. Cloud computing of the Landsat image archive combined with other satellite, climate, and weather data, is creating never imagined opportunities for assessing ET model behavior and uncertainty, and ultimately providing the ability for more robust operational monitoring and assessment of water use at field-scales.

Identifying the impact of G-quadruplexes on Affymetrix 3' arrays using cloud computing.

PubMed

Memon, Farhat N; Owen, Anne M; Sanchez-Graillet, Olivia; Upton, Graham J G; Harrison, Andrew P

2010-01-15

A tetramer quadruplex structure is formed by four parallel strands of DNA/ RNA containing runs of guanine. These quadruplexes are able to form because guanine can Hoogsteen hydrogen bond to other guanines, and a tetrad of guanines can form a stable arrangement. Recently we have discovered that probes on Affymetrix GeneChips that contain runs of guanine do not measure gene expression reliably. We associate this finding with the likelihood that quadruplexes are forming on the surface of GeneChips. In order to cope with the rapidly expanding size of GeneChip array datasets in the public domain, we are exploring the use of cloud computing to replicate our experiments on 3' arrays to look at the effect of the location of G-spots (runs of guanines). Cloud computing is a recently introduced high-performance solution that takes advantage of the computational infrastructure of large organisations such as Amazon and Google. We expect that cloud computing will become widely adopted because it enables bioinformaticians to avoid capital expenditure on expensive computing resources and to only pay a cloud computing provider for what is used. Moreover, as well as financial efficiency, cloud computing is an ecologically-friendly technology, it enables efficient data-sharing and we expect it to be faster for development purposes. Here we propose the advantageous use of cloud computing to perform a large data-mining analysis of public domain 3' arrays.
A cloud-based workflow to quantify transcript-expression levels in public cancer compendia

PubMed Central

Tatlow, PJ; Piccolo, Stephen R.

2016-01-01

Public compendia of sequencing data are now measured in petabytes. Accordingly, it is infeasible for researchers to transfer these data to local computers. Recently, the National Cancer Institute began exploring opportunities to work with molecular data in cloud-computing environments. With this approach, it becomes possible for scientists to take their tools to the data and thereby avoid large data transfers. It also becomes feasible to scale computing resources to the needs of a given analysis. We quantified transcript-expression levels for 12,307 RNA-Sequencing samples from the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas. We used two cloud-based configurations and examined the performance and cost profiles of each configuration. Using preemptible virtual machines, we processed the samples for as little as $0.09 (USD) per sample. As the samples were processed, we collected performance metrics, which helped us track the duration of each processing step and quantified computational resources used at different stages of sample processing. Although the computational demands of reference alignment and expression quantification have decreased considerably, there remains a critical need for researchers to optimize preprocessing steps. We have stored the software, scripts, and processed data in a publicly accessible repository (https://osf.io/gqrz9). PMID:27982081
The Montage architecture for grid-enabled science processing of large, distributed datasets

NASA Technical Reports Server (NTRS)

Jacob, Joseph C.; Katz, Daniel S .; Prince, Thomas; Berriman, Bruce G.; Good, John C.; Laity, Anastasia C.; Deelman, Ewa; Singh, Gurmeet; Su, Mei-Hui

2004-01-01

Montage is an Earth Science Technology Office (ESTO) Computational Technologies (CT) Round III Grand Challenge investigation to deploy a portable, compute-intensive, custom astronomical image mosaicking service for the National Virtual Observatory (NVO). Although Montage is developing a compute- and data-intensive service for the astronomy community, we are also helping to address a problem that spans both Earth and Space science, namely how to efficiently access and process multi-terabyte, distributed datasets. In both communities, the datasets are massive, and are stored in distributed archives that are, in most cases, remote from the available Computational resources. Therefore, state of the art computational grid technologies are a key element of the Montage portal architecture. This paper describes the aspects of the Montage design that are applicable to both the Earth and Space science communities.
A Scheduling Algorithm for Computational Grids that Minimizes Centralized Processing in Genome Assembly of Next-Generation Sequencing Data

PubMed Central

Lima, Jakelyne; Cerdeira, Louise Teixeira; Bol, Erick; Schneider, Maria Paula Cruz; Silva, Artur; Azevedo, Vasco; Abelém, Antônio Jorge Gomes

2012-01-01

Improvements in genome sequencing techniques have resulted in generation of huge volumes of data. As a consequence of this progress, the genome assembly stage demands even more computational power, since the incoming sequence files contain large amounts of data. To speed up the process, it is often necessary to distribute the workload among a group of machines. However, this requires hardware and software solutions specially configured for this purpose. Grid computing try to simplify this process of aggregate resources, but do not always offer the best performance possible due to heterogeneity and decentralized management of its resources. Thus, it is necessary to develop software that takes into account these peculiarities. In order to achieve this purpose, we developed an algorithm aimed to optimize the functionality of de novo assembly software ABySS in order to optimize its operation in grids. We run ABySS with and without the algorithm we developed in the grid simulator SimGrid. Tests showed that our algorithm is viable, flexible, and scalable even on a heterogeneous environment, which improved the genome assembly time in computational grids without changing its quality. PMID:22461785
De-quantisation

NASA Astrophysics Data System (ADS)

Gruska, Jozef

2012-06-01

One of the most basic tasks in quantum information processing, communication and security (QIPCC) research, theoretically deep and practically important, is to find bounds on how really important are inherently quantum resources for speeding up computations. This area of research is bringing a variety of results that imply, often in a very unexpected and counter-intuitive way, that: (a) surprisingly large classes of quantum circuits and algorithms can be efficiently simulated on classical computers; (b) the border line between quantum processes that can and cannot be efficiently simulated on classical computers is often surprisingly thin; (c) the addition of a seemingly very simple resource or a tool often enormously increases the power of available quantum tools. These discoveries have put also a new light on our understanding of quantum phenomena and quantum physics and on the potential of its inherently quantum and often mysteriously looking phenomena. The paper motivates and surveys research and its outcomes in the area of de-quantisation, especially presents various approaches and their outcomes concerning efficient classical simulations of various families of quantum circuits and algorithms. To motivate this area of research some outcomes in the area of de-randomization of classical randomized computations.
RabbitQR: fast and flexible big data processing at LSST data rates using existing, shared-use hardware

NASA Astrophysics Data System (ADS)

Kotulla, Ralf; Gopu, Arvind; Hayashi, Soichi

2016-08-01

Processing astronomical data to science readiness was and remains a challenge, in particular in the case of multi detector instruments such as wide-field imagers. One such instrument, the WIYN One Degree Imager, is available to the astronomical community at large, and, in order to be scientifically useful to its varied user community on a short timescale, provides its users fully calibrated data in addition to the underlying raw data. However, time-efficient re-processing of the often large datasets with improved calibration data and/or software requires more than just a large number of CPU-cores and disk space. This is particularly relevant if all computing resources are general purpose and shared with a large number of users in a typical university setup. Our approach to address this challenge is a flexible framework, combining the best of both high performance (large number of nodes, internal communication) and high throughput (flexible/variable number of nodes, no dedicated hardware) computing. Based on the Advanced Message Queuing Protocol, we a developed a Server-Manager- Worker framework. In addition to the server directing the work flow and the worker executing the actual work, the manager maintains a list of available worker, adds and/or removes individual workers from the worker pool, and re-assigns worker to different tasks. This provides the flexibility of optimizing the worker pool to the current task and workload, improves load balancing, and makes the most efficient use of the available resources. We present performance benchmarks and scaling tests, showing that, today and using existing, commodity shared- use hardware we can process data with data throughputs (including data reduction and calibration) approaching that expected in the early 2020s for future observatories such as the Large Synoptic Survey Telescope.
Blast2GO goes grid: developing a grid-enabled prototype for functional genomics analysis.

PubMed

Aparicio, G; Götz, S; Conesa, A; Segrelles, D; Blanquer, I; García, J M; Hernandez, V; Robles, M; Talon, M

2006-01-01

The vast amount in complexity of data generated in Genomic Research implies that new dedicated and powerful computational tools need to be developed to meet their analysis requirements. Blast2GO (B2G) is a bioinformatics tool for Gene Ontology-based DNA or protein sequence annotation and function-based data mining. The application has been developed with the aim of affering an easy-to-use tool for functional genomics research. Typical B2G users are middle size genomics labs carrying out sequencing, ETS and microarray projects, handling datasets up to several thousand sequences. In the current version of B2G. The power and analytical potential of both annotation and function data-mining is somehow restricted to the computational power behind each particular installation. In order to be able to offer the possibility of an enhanced computational capacity within this bioinformatics application, a Grid component is being developed. A prototype has been conceived for the particular problem of speeding up the Blast searches to obtain fast results for large datasets. Many efforts have been done in the literature concerning the speeding up of Blast searches, but few of them deal with the use of large heterogeneous production Grid Infrastructures. These are the infrastructures that could reach the largest number of resources and the best load balancing for data access. The Grid Service under development will analyse requests based on the number of sequences, splitting them accordingly to the available resources. Lower-level computation will be performed through MPIBLAST. The software architecture is based on the WSRF standard.
Universal quantum computation using all-optical hybrid encoding

NASA Astrophysics Data System (ADS)

Guo, Qi; Cheng, Liu-Yong; Wang, Hong-Fu; Zhang, Shou

2015-04-01

By employing displacement operations, single-photon subtractions, and weak cross-Kerr nonlinearity, we propose an alternative way of implementing several universal quantum logical gates for all-optical hybrid qubits encoded in both single-photon polarization state and coherent state. Since these schemes can be straightforwardly implemented only using local operations without teleportation procedure, therefore, less physical resources and simpler operations are required than the existing schemes. With the help of displacement operations, a large phase shift of the coherent state can be obtained via currently available tiny cross-Kerr nonlinearity. Thus, all of these schemes are nearly deterministic and feasible under current technology conditions, which makes them suitable for large-scale quantum computing. Project supported by the National Natural Science Foundation of China (Grant Nos. 61465013, 11465020, and 11264042).
IP-Based Video Modem Extender Requirements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pierson, L G; Boorman, T M; Howe, R E

2003-12-16

Visualization is one of the keys to understanding large complex data sets such as those generated by the large computing resources purchased and developed by the Advanced Simulation and Computing program (aka ASCI). In order to be convenient to researchers, visualization data must be distributed to offices and large complex visualization theaters. Currently, local distribution of the visual data is accomplished by distance limited modems and RGB switches that simply do not scale to hundreds of users across the local, metropolitan, and WAN distances without incurring large costs in fiber plant installation and maintenance. Wide Area application over the DOEmore » Complex is infeasible using these limited distance RGB extenders. On the other hand, Internet Protocols (IP) over Ethernet is a scalable well-proven technology that can distribute large volumes of data over these distances. Visual data has been distributed at lower resolutions over IP in industrial applications. This document describes requirements of the ASCI program in visual signal distribution for the purpose of identifying industrial partners willing to develop products to meet ASCI's needs.« less
Next generation communications satellites: multiple access and network studies

NASA Technical Reports Server (NTRS)

Meadows, H. E.; Schwartz, M.; Stern, T. E.; Ganguly, S.; Kraimeche, B.; Matsuo, K.; Gopal, I.

1982-01-01

Efficient resource allocation and network design for satellite systems serving heterogeneous user populations with large numbers of small direct-to-user Earth stations are discussed. Focus is on TDMA systems involving a high degree of frequency reuse by means of satellite-switched multiple beams (SSMB) with varying degrees of onboard processing. Algorithms for the efficient utilization of the satellite resources were developed. The effect of skewed traffic, overlapping beams and batched arrivals in packet-switched SSMB systems, integration of stream and bursty traffic, and optimal circuit scheduling in SSMB systems: performance bounds and computational complexity are discussed.
Integrating Information Technologies Into Large Organizations

NASA Technical Reports Server (NTRS)

Gottlich, Gretchen; Meyer, John M.; Nelson, Michael L.; Bianco, David J.

1997-01-01

NASA Langley Research Center's product is aerospace research information. To this end, Langley uses information technology tools in three distinct ways. First, information technology tools are used in the production of information via computation, analysis, data collection and reduction. Second, information technology tools assist in streamlining business processes, particularly those that are primarily communication based. By applying these information tools to administrative activities, Langley spends fewer resources on managing itself and can allocate more resources for research. Third, Langley uses information technology tools to disseminate its aerospace research information, resulting in faster turn around time from the laboratory to the end-customer.
NMRbox: A Resource for Biomolecular NMR Computation.

PubMed

Maciejewski, Mark W; Schuyler, Adam D; Gryk, Michael R; Moraru, Ion I; Romero, Pedro R; Ulrich, Eldon L; Eghbalnia, Hamid R; Livny, Miron; Delaglio, Frank; Hoch, Jeffrey C

2017-04-25

Advances in computation have been enabling many recent advances in biomolecular applications of NMR. Due to the wide diversity of applications of NMR, the number and variety of software packages for processing and analyzing NMR data is quite large, with labs relying on dozens, if not hundreds of software packages. Discovery, acquisition, installation, and maintenance of all these packages is a burdensome task. Because the majority of software packages originate in academic labs, persistence of the software is compromised when developers graduate, funding ceases, or investigators turn to other projects. To simplify access to and use of biomolecular NMR software, foster persistence, and enhance reproducibility of computational workflows, we have developed NMRbox, a shared resource for NMR software and computation. NMRbox employs virtualization to provide a comprehensive software environment preconfigured with hundreds of software packages, available as a downloadable virtual machine or as a Platform-as-a-Service supported by a dedicated compute cloud. Ongoing development includes a metadata harvester to regularize, annotate, and preserve workflows and facilitate and enhance data depositions to BioMagResBank, and tools for Bayesian inference to enhance the robustness and extensibility of computational analyses. In addition to facilitating use and preservation of the rich and dynamic software environment for biomolecular NMR, NMRbox fosters the development and deployment of a new class of metasoftware packages. NMRbox is freely available to not-for-profit users. Copyright © 2017 Biophysical Society. All rights reserved.
Computer-intensive simulation of solid-state NMR experiments using SIMPSON.

PubMed

Tošner, Zdeněk; Andersen, Rasmus; Stevensson, Baltzar; Edén, Mattias; Nielsen, Niels Chr; Vosegaard, Thomas

2014-09-01

Conducting large-scale solid-state NMR simulations requires fast computer software potentially in combination with efficient computational resources to complete within a reasonable time frame. Such simulations may involve large spin systems, multiple-parameter fitting of experimental spectra, or multiple-pulse experiment design using parameter scan, non-linear optimization, or optimal control procedures. To efficiently accommodate such simulations, we here present an improved version of the widely distributed open-source SIMPSON NMR simulation software package adapted to contemporary high performance hardware setups. The software is optimized for fast performance on standard stand-alone computers, multi-core processors, and large clusters of identical nodes. We describe the novel features for fast computation including internal matrix manipulations, propagator setups and acquisition strategies. For efficient calculation of powder averages, we implemented interpolation method of Alderman, Solum, and Grant, as well as recently introduced fast Wigner transform interpolation technique. The potential of the optimal control toolbox is greatly enhanced by higher precision gradients in combination with the efficient optimization algorithm known as limited memory Broyden-Fletcher-Goldfarb-Shanno. In addition, advanced parallelization can be used in all types of calculations, providing significant time reductions. SIMPSON is thus reflecting current knowledge in the field of numerical simulations of solid-state NMR experiments. The efficiency and novel features are demonstrated on the representative simulations. Copyright © 2014 Elsevier Inc. All rights reserved.
The Integration of CloudStack and OCCI/OpenNebula with DIRAC

NASA Astrophysics Data System (ADS)

Méndez Muñoz, Víctor; Fernández Albor, Víctor; Graciani Diaz, Ricardo; Casajús Ramo, Adriàn; Fernández Pena, Tomás; Merino Arévalo, Gonzalo; José Saborido Silva, Juan

2012-12-01

The increasing availability of Cloud resources is arising as a realistic alternative to the Grid as a paradigm for enabling scientific communities to access large distributed computing resources. The DIRAC framework for distributed computing is an easy way to efficiently access to resources from both systems. This paper explains the integration of DIRAC with two open-source Cloud Managers: OpenNebula (taking advantage of the OCCI standard) and CloudStack. These are computing tools to manage the complexity and heterogeneity of distributed data center infrastructures, allowing to create virtual clusters on demand, including public, private and hybrid clouds. This approach has required to develop an extension to the previous DIRAC Virtual Machine engine, which was developed for Amazon EC2, allowing the connection with these new cloud managers. In the OpenNebula case, the development has been based on the CernVM Virtual Software Appliance with appropriate contextualization, while in the case of CloudStack, the infrastructure has been kept more general, which permits other Virtual Machine sources and operating systems being used. In both cases, CernVM File System has been used to facilitate software distribution to the computing nodes. With the resulting infrastructure, the cloud resources are transparent to the users through a friendly interface, like the DIRAC Web Portal. The main purpose of this integration is to get a system that can manage cloud and grid resources at the same time. This particular feature pushes DIRAC to a new conceptual denomination as interware, integrating different middleware. Users from different communities do not need to care about the installation of the standard software that is available at the nodes, nor the operating system of the host machine which is transparent to the user. This paper presents an analysis of the overhead of the virtual layer, doing some tests to compare the proposed approach with the existing Grid solution. License Notice: Published under licence in Journal of Physics: Conference Series by IOP Publishing Ltd.
New Resources for Computer-Aided Legal Research: An Assessment of the Usefulness of the DIALOG System in Securities Regulation Studies.

ERIC Educational Resources Information Center

Gruner, Richard; Heron, Carol E.

1984-01-01

Examines usefulness of DIALOG as legal research tool through use of DIALOG's DIALINDEX database to identify those databases among almost 200 available that contain large numbers of records related to federal securities regulation. Eight databases selected for further study are detailed. Twenty-six footnotes, database statistics, and samples are…
Khan Academy as Supplemental Instruction: A Controlled Study of a Computer-Based Mathematics Intervention

ERIC Educational Resources Information Center

Kelly, Daniel P.; Rutherford, Teomara

2017-01-01

Khan Academy is a large and popular open educational resource (OER) with little empirical study into its impact on student achievement in mathematics when used in schools. In this study, we examined the use of Khan Academy as a mathematics intervention among seventh grade students over a 4-week period versus a control group. We also compared…
A Malicious Pattern Detection Engine for Embedded Security Systems in the Internet of Things

PubMed Central

Oh, Doohwan; Kim, Deokho; Ro, Won Woo

2014-01-01

With the emergence of the Internet of Things (IoT), a large number of physical objects in daily life have been aggressively connected to the Internet. As the number of objects connected to networks increases, the security systems face a critical challenge due to the global connectivity and accessibility of the IoT. However, it is difficult to adapt traditional security systems to the objects in the IoT, because of their limited computing power and memory size. In light of this, we present a lightweight security system that uses a novel malicious pattern-matching engine. We limit the memory usage of the proposed system in order to make it work on resource-constrained devices. To mitigate performance degradation due to limitations of computation power and memory, we propose two novel techniques, auxiliary shifting and early decision. Through both techniques, we can efficiently reduce the number of matching operations on resource-constrained systems. Experiments and performance analyses show that our proposed system achieves a maximum speedup of 2.14 with an IoT object and provides scalable performance for a large number of patterns. PMID:25521382
Folding Proteins at 500 ns/hour with Work Queue.

PubMed

Abdul-Wahid, Badi'; Yu, Li; Rajan, Dinesh; Feng, Haoyun; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A

2012-10-01

Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour.
Folding Proteins at 500 ns/hour with Work Queue

PubMed Central

Abdul-Wahid, Badi’; Yu, Li; Rajan, Dinesh; Feng, Haoyun; Darve, Eric; Thain, Douglas; Izaguirre, Jesús A.

2014-01-01

Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour. PMID:25540799
Global Static Indexing for Real-Time Exploration of Very Large Regular Grids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pascucci, V; Frank, R

2001-07-23

In this paper we introduce a new indexing scheme for progressive traversal and visualization of large regular grids. We demonstrate the potential of our approach by providing a tool that displays at interactive rates planar slices of scalar field data with very modest computing resources. We obtain unprecedented results both in terms of absolute performance and, more importantly, in terms of scalability. On a laptop computer we provide real time interaction with a 2048{sup 3} grid (8 Giga-nodes) using only 20MB of memory. On an SGI Onyx we slice interactively an 8192{sup 3} grid (1/2 tera-nodes) using only 60MB ofmore » memory. The scheme relies simply on the determination of an appropriate reordering of the rectilinear grid data and a progressive construction of the output slice. The reordering minimizes the amount of I/O performed during the out-of-core computation. The progressive and asynchronous computation of the output provides flexible quality/speed tradeoffs and a time-critical and interruptible user interface.« less

Middleware for big data processing: test results

NASA Astrophysics Data System (ADS)

Gankevich, I.; Gaiduchok, V.; Korkhov, V.; Degtyarev, A.; Bogdanov, A.

2017-12-01

Dealing with large volumes of data is resource-consuming work which is more and more often delegated not only to a single computer but also to a whole distributed computing system at once. As the number of computers in a distributed system increases, the amount of effort put into effective management of the system grows. When the system reaches some critical size, much effort should be put into improving its fault tolerance. It is difficult to estimate when some particular distributed system needs such facilities for a given workload, so instead they should be implemented in a middleware which works efficiently with a distributed system of any size. It is also difficult to estimate whether a volume of data is large or not, so the middleware should also work with data of any volume. In other words, the purpose of the middleware is to provide facilities that adapt distributed computing system for a given workload. In this paper we introduce such middleware appliance. Tests show that this middleware is well-suited for typical HPC and big data workloads and its performance is comparable with well-known alternatives.
NGScloud: RNA-seq analysis of non-model species using cloud computing.

PubMed

Mora-Márquez, Fernando; Vázquez-Poletti, José Luis; López de Heredia, Unai

2018-05-03

RNA-seq analysis usually requires large computing infrastructures. NGScloud is a bioinformatic system developed to analyze RNA-seq data using the cloud computing services of Amazon that permit the access to ad hoc computing infrastructure scaled according to the complexity of the experiment, so its costs and times can be optimized. The application provides a user-friendly front-end to operate Amazon's hardware resources, and to control a workflow of RNA-seq analysis oriented to non-model species, incorporating the cluster concept, which allows parallel runs of common RNA-seq analysis programs in several virtual machines for faster analysis. NGScloud is freely available at https://github.com/GGFHF/NGScloud/. A manual detailing installation and how-to-use instructions is available with the distribution. unai.lopezdeheredia@upm.es.
Remote sensing image ship target detection method based on visual attention model

NASA Astrophysics Data System (ADS)

Sun, Yuejiao; Lei, Wuhu; Ren, Xiaodong

2017-11-01

The traditional methods of detecting ship targets in remote sensing images mostly use sliding window to search the whole image comprehensively. However, the target usually occupies only a small fraction of the image. This method has high computational complexity for large format visible image data. The bottom-up selective attention mechanism can selectively allocate computing resources according to visual stimuli, thus improving the computational efficiency and reducing the difficulty of analysis. Considering of that, a method of ship target detection in remote sensing images based on visual attention model was proposed in this paper. The experimental results show that the proposed method can reduce the computational complexity while improving the detection accuracy, and improve the detection efficiency of ship targets in remote sensing images.
Distributed information system (water fact sheet)

USGS Publications Warehouse

Harbaugh, A.W.

1986-01-01

During 1982-85, the Water Resources Division (WRD) of the U.S. Geological Survey (USGS) installed over 70 large minicomputers in offices across the country to support its mission in the science of hydrology. These computers are connected by a communications network that allows information to be shared among computers in each office. The computers and network together are known as the Distributed Information System (DIS). The computers are accessed through the use of more than 1500 terminals and minicomputers. The WRD has three fundamentally different needs for computing: data management; hydrologic analysis; and administration. Data management accounts for 50% of the computational workload of WRD because hydrologic data are collected in all 50 states, Puerto Rico, and the Pacific trust territories. Hydrologic analysis consists of 40% of the computational workload of WRD. Cost accounting, payroll, personnel records, and planning for WRD programs occupies an estimated 10% of the computer workload. The DIS communications network is shown on a map. (Lantz-PTT)
FPGA-Based Multimodal Embedded Sensor System Integrating Low- and Mid-Level Vision

PubMed Central

Botella, Guillermo; Martín H., José Antonio; Santos, Matilde; Meyer-Baese, Uwe

2011-01-01

Motion estimation is a low-level vision task that is especially relevant due to its wide range of applications in the real world. Many of the best motion estimation algorithms include some of the features that are found in mammalians, which would demand huge computational resources and therefore are not usually available in real-time. In this paper we present a novel bioinspired sensor based on the synergy between optical flow and orthogonal variant moments. The bioinspired sensor has been designed for Very Large Scale Integration (VLSI) using properties of the mammalian cortical motion pathway. This sensor combines low-level primitives (optical flow and image moments) in order to produce a mid-level vision abstraction layer. The results are described trough experiments showing the validity of the proposed system and an analysis of the computational resources and performance of the applied algorithms. PMID:22164069
FPGA-based multimodal embedded sensor system integrating low- and mid-level vision.

PubMed

Botella, Guillermo; Martín H, José Antonio; Santos, Matilde; Meyer-Baese, Uwe

2011-01-01

Motion estimation is a low-level vision task that is especially relevant due to its wide range of applications in the real world. Many of the best motion estimation algorithms include some of the features that are found in mammalians, which would demand huge computational resources and therefore are not usually available in real-time. In this paper we present a novel bioinspired sensor based on the synergy between optical flow and orthogonal variant moments. The bioinspired sensor has been designed for Very Large Scale Integration (VLSI) using properties of the mammalian cortical motion pathway. This sensor combines low-level primitives (optical flow and image moments) in order to produce a mid-level vision abstraction layer. The results are described trough experiments showing the validity of the proposed system and an analysis of the computational resources and performance of the applied algorithms.
Towards a Unified Architecture for Data-Intensive Seismology in VERCE

NASA Astrophysics Data System (ADS)

Klampanos, I.; Spinuso, A.; Trani, L.; Krause, A.; Garcia, C. R.; Atkinson, M.

2013-12-01

Modern seismology involves managing, storing and processing large datasets, typically geographically distributed across organisations. Performing computational experiments using these data generates more data, which in turn have to be managed, further analysed and frequently be made available within or outside the scientific community. As part of the EU-funded project VERCE (http://verce.eu), we research and develop a number of use-cases, interfacing technologies to satisfy the data-intensive requirements of modern seismology. Our solution seeks to support: (1) familiar programming environments to develop and execute experiments, in particular via Python/ObsPy, (2) a unified view of heterogeneous computing resources, public or private, through the adoption of workflows, (3) monitoring the experiments and validating the data products at varying granularities, via a comprehensive provenance system, (4) reproducibility of experiments and consistency in collaboration, via a shared registry of processing units and contextual metadata (computing resources, data, etc.) Here, we provide a brief account of these components and their roles in the proposed architecture. Our design integrates heterogeneous distributed systems, while allowing researchers to retain current practices and control data handling and execution via higher-level abstractions. At the core of our solution lies the workflow language Dispel. While Dispel can be used to express workflows at fine detail, it may also be used as part of meta- or job-submission workflows. User interaction can be provided through a visual editor or through custom applications on top of parameterisable workflows, which is the approach VERCE follows. According to our design, the scientist may use versions of Dispel/workflow processing elements offered by the VERCE library or override them introducing custom scientific code, using ObsPy. This approach has the advantage that, while the scientist uses a familiar tool, the resulting workflow can be executed on a number of underlying stream-processing engines, such as STORM or OGSA-DAI, transparently. While making efficient use of arbitrarily distributed resources and large data-sets is of priority, such processing requires adequate provenance tracking and monitoring. Hiding computation and orchestration details via a workflow system, allows us to embed provenance harvesting where appropriate without impeding the user's regular working patterns. Our provenance model is based on the W3C PROV standard and can provide information of varying granularity regarding execution, systems and data consumption/production. A video demonstrating a prototype provenance exploration tool can be found at http://bit.ly/15t0Fz0. Keeping experimental methodology and results open and accessible, as well as encouraging reproducibility and collaboration, is of central importance to modern science. As our users are expected to be based at different geographical locations, to have access to different computing resources and to employ customised scientific codes, the use of a shared registry of workflow components, implementations, data and computing resources is critical.
Utilization and viability of biologically-inspired algorithms in a dynamic multiagent camera surveillance system

NASA Astrophysics Data System (ADS)

Mundhenk, Terrell N.; Dhavale, Nitin; Marmol, Salvador; Calleja, Elizabeth; Navalpakkam, Vidhya; Bellman, Kirstie; Landauer, Chris; Arbib, Michael A.; Itti, Laurent

2003-10-01

In view of the growing complexity of computational tasks and their design, we propose that certain interactive systems may be better designed by utilizing computational strategies based on the study of the human brain. Compared with current engineering paradigms, brain theory offers the promise of improved self-organization and adaptation to the current environment, freeing the programmer from having to address those issues in a procedural manner when designing and implementing large-scale complex systems. To advance this hypothesis, we discus a multi-agent surveillance system where 12 agent CPUs each with its own camera, compete and cooperate to monitor a large room. To cope with the overload of image data streaming from 12 cameras, we take inspiration from the primate"s visual system, which allows the animal to operate a real-time selection of the few most conspicuous locations in visual input. This is accomplished by having each camera agent utilize the bottom-up, saliency-based visual attention algorithm of Itti and Koch (Vision Research 2000;40(10-12):1489-1506) to scan the scene for objects of interest. Real time operation is achieved using a distributed version that runs on a 16-CPU Beowulf cluster composed of the agent computers. The algorithm guides cameras to track and monitor salient objects based on maps of color, orientation, intensity, and motion. To spread camera view points or create cooperation in monitoring highly salient targets, camera agents bias each other by increasing or decreasing the weight of different feature vectors in other cameras, using mechanisms similar to excitation and suppression that have been documented in electrophysiology, psychophysics and imaging studies of low-level visual processing. In addition, if cameras need to compete for computing resources, allocation of computational time is weighed based upon the history of each camera. A camera agent that has a history of seeing more salient targets is more likely to obtain computational resources. The system demonstrates the viability of biologically inspired systems in a real time tracking. In future work we plan on implementing additional biological mechanisms for cooperative management of both the sensor and processing resources in this system that include top down biasing for target specificity as well as novelty and the activity of the tracked object in relation to sensitive features of the environment.
Resource Aware Intelligent Network Services (RAINS) Final Technical Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lehman, Tom; Yang, Xi

The Resource Aware Intelligent Network Services (RAINS) project conducted research and developed technologies in the area of cyber infrastructure resource modeling and computation. The goal of this work was to provide a foundation to enable intelligent, software defined services which spanned the network AND the resources which connect to the network. A Multi-Resource Service Plane (MRSP) was defined, which allows resource owners/managers to locate and place themselves from a topology and service availability perspective within the dynamic networked cyberinfrastructure ecosystem. The MRSP enables the presentation of integrated topology views and computation results which can include resources across the spectrum ofmore » compute, storage, and networks. The RAINS project developed MSRP includes the following key components: i) Multi-Resource Service (MRS) Ontology/Multi-Resource Markup Language (MRML), ii) Resource Computation Engine (RCE), iii) Modular Driver Framework (to allow integration of a variety of external resources). The MRS/MRML is a general and extensible modeling framework that allows for resource owners to model, or describe, a wide variety of resource types. All resources are described using three categories of elements: Resources, Services, and Relationships between the elements. This modeling framework defines a common method for the transformation of cyber infrastructure resources into data in the form of MRML models. In order to realize this infrastructure datification, the RAINS project developed a model based computation system, i.e. “RAINS Computation Engine (RCE)”. The RCE has the ability to ingest, process, integrate, and compute based on automatically generated MRML models. The RCE interacts with the resources thru system drivers which are specific to the type of external network or resource controller. The RAINS project developed a modular and pluggable driver system which facilities a variety of resource controllers to automatically generate, maintain, and distribute MRML based resource descriptions. Once all of the resource topologies are absorbed by the RCE, a connected graph of the full distributed system topology is constructed, which forms the basis for computation and workflow processing. The RCE includes a Modular Computation Element (MCE) framework which allows for tailoring of the computation process to the specific set of resources under control, and the services desired. The input and output of an MCE are both model data based on MRS/MRML ontology and schema. Some of the RAINS project accomplishments include: Development of general and extensible multi-resource modeling framework; Design of a Resource Computation Engine (RCE) system which includes the following key capabilities; Absorb a variety of multi-resource model types and build integrated models; Novel architecture which uses model based communications across the full stack for all Flexible provision of abstract or intent based user facing interfaces; Workflow processing based on model descriptions; Release of the RCE as an open source software; Deployment of RCE in the University of Maryland/Mid-Atlantic Crossroad ScienceDMZ in prototype mode with a plan under way to transition to production; Deployment at the Argonne National Laboratory DTN Facility in prototype mode; Selection of RCE by the DOE SENSE (SDN for End-to-end Networked Science at the Exascale) project as the basis for their orchestration service.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources

DOE PAGES

Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar; ...

2016-01-01

New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Optimization of tomographic reconstruction workflows on geographically distributed resources

PubMed Central

Bicer, Tekin; Gürsoy, Doǧa; Kettimuthu, Rajkumar; De Carlo, Francesco; Foster, Ian T.

2016-01-01

New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modeling of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Moreover, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks. PMID:27359149
Optimization of tomographic reconstruction workflows on geographically distributed resources

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bicer, Tekin; Gursoy, Doga; Kettimuthu, Rajkumar

New technological advancements in synchrotron light sources enable data acquisitions at unprecedented levels. This emergent trend affects not only the size of the generated data but also the need for larger computational resources. Although beamline scientists and users have access to local computational resources, these are typically limited and can result in extended execution times. Applications that are based on iterative processing as in tomographic reconstruction methods require high-performance compute clusters for timely analysis of data. Here, time-sensitive analysis and processing of Advanced Photon Source data on geographically distributed resources are focused on. Two main challenges are considered: (i) modelingmore » of the performance of tomographic reconstruction workflows and (ii) transparent execution of these workflows on distributed resources. For the former, three main stages are considered: (i) data transfer between storage and computational resources, (i) wait/queue time of reconstruction jobs at compute resources, and (iii) computation of reconstruction tasks. These performance models allow evaluation and estimation of the execution time of any given iterative tomographic reconstruction workflow that runs on geographically distributed resources. For the latter challenge, a workflow management system is built, which can automate the execution of workflows and minimize the user interaction with the underlying infrastructure. The system utilizes Globus to perform secure and efficient data transfer operations. The proposed models and the workflow management system are evaluated by using three high-performance computing and two storage resources, all of which are geographically distributed. Workflows were created with different computational requirements using two compute-intensive tomographic reconstruction algorithms. Experimental evaluation shows that the proposed models and system can be used for selecting the optimum resources, which in turn can provide up to 3.13× speedup (on experimented resources). Furthermore, the error rates of the models range between 2.1 and 23.3% (considering workflow execution times), where the accuracy of the model estimations increases with higher computational demands in reconstruction tasks.« less
Enabling opportunistic resources for CMS Computing Operations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hufnagel, Dirk

With the increased pressure on computing brought by the higher energy and luminosity from the LHC in Run 2, CMS Computing Operations expects to require the ability to utilize opportunistic resources resources not owned by, or a priori configured for CMS to meet peak demands. In addition to our dedicated resources we look to add computing resources from non CMS grids, cloud resources, and national supercomputing centers. CMS uses the HTCondor/glideinWMS job submission infrastructure for all its batch processing, so such resources will need to be transparently integrated into its glideinWMS pool. Bosco and parrot wrappers are used to enablemore » access and bring the CMS environment into these non CMS resources. Finally, we describe our strategy to supplement our native capabilities with opportunistic resources and our experience so far using them.« less
Enabling opportunistic resources for CMS Computing Operations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hufnagel, Dick

With the increased pressure on computing brought by the higher energy and luminosity from the LHC in Run 2, CMS Computing Operations expects to require the ability to utilize “opportunistic” resources — resources not owned by, or a priori configured for CMS — to meet peak demands. In addition to our dedicated resources we look to add computing resources from non CMS grids, cloud resources, and national supercomputing centers. CMS uses the HTCondor/glideinWMS job submission infrastructure for all its batch processing, so such resources will need to be transparently integrated into its glideinWMS pool. Bosco and parrot wrappers are usedmore » to enable access and bring the CMS environment into these non CMS resources. Here we describe our strategy to supplement our native capabilities with opportunistic resources and our experience so far using them.« less
Enabling opportunistic resources for CMS Computing Operations

DOE PAGES

Hufnagel, Dirk

2015-12-23

With the increased pressure on computing brought by the higher energy and luminosity from the LHC in Run 2, CMS Computing Operations expects to require the ability to utilize opportunistic resources resources not owned by, or a priori configured for CMS to meet peak demands. In addition to our dedicated resources we look to add computing resources from non CMS grids, cloud resources, and national supercomputing centers. CMS uses the HTCondor/glideinWMS job submission infrastructure for all its batch processing, so such resources will need to be transparently integrated into its glideinWMS pool. Bosco and parrot wrappers are used to enablemore » access and bring the CMS environment into these non CMS resources. Finally, we describe our strategy to supplement our native capabilities with opportunistic resources and our experience so far using them.« less
Back to basics: naked-eye astronomical observation

NASA Astrophysics Data System (ADS)

Barclay, Charles

2003-09-01

For pupils of both sexes and all ages from about six upwards, the subject of Astronomy holds many fascinations - the rapid changes in knowledge, the large resource of available IT packages and above all the beautiful pictures from Hubble and the large Earth-based telescopes. This article, however, stresses the excitement and importance of naked-eye (unaided) first-hand observation, where light pollution allows, and suggests some techniques that may be used to enthuse and introduce youngsters to the glory of the night sky without recourse to computer screens.
Key Technology Research on Open Architecture for The Sharing of Heterogeneous Geographic Analysis Models

NASA Astrophysics Data System (ADS)

Yue, S. S.; Wen, Y. N.; Lv, G. N.; Hu, D.

2013-10-01

In recent years, the increasing development of cloud computing technologies laid critical foundation for efficiently solving complicated geographic issues. However, it is still difficult to realize the cooperative operation of massive heterogeneous geographical models. Traditional cloud architecture is apt to provide centralized solution to end users, while all the required resources are often offered by large enterprises or special agencies. Thus, it's a closed framework from the perspective of resource utilization. Solving comprehensive geographic issues requires integrating multifarious heterogeneous geographical models and data. In this case, an open computing platform is in need, with which the model owners can package and deploy their models into cloud conveniently, while model users can search, access and utilize those models with cloud facility. Based on this concept, the open cloud service strategies for the sharing of heterogeneous geographic analysis models is studied in this article. The key technology: unified cloud interface strategy, sharing platform based on cloud service, and computing platform based on cloud service are discussed in detail, and related experiments are conducted for further verification.
Intrusion Prevention and Detection in Grid Computing - The ALICE Case

NASA Astrophysics Data System (ADS)

Gomez, Andres; Lara, Camilo; Kebschull, Udo

2015-12-01

Grids allow users flexible on-demand usage of computing resources through remote communication networks. A remarkable example of a Grid in High Energy Physics (HEP) research is used in the ALICE experiment at European Organization for Nuclear Research CERN. Physicists can submit jobs used to process the huge amount of particle collision data produced by the Large Hadron Collider (LHC). Grids face complex security challenges. They are interesting targets for attackers seeking for huge computational resources. Since users can execute arbitrary code in the worker nodes on the Grid sites, special care should be put in this environment. Automatic tools to harden and monitor this scenario are required. Currently, there is no integrated solution for such requirement. This paper describes a new security framework to allow execution of job payloads in a sandboxed context. It also allows process behavior monitoring to detect intrusions, even when new attack methods or zero day vulnerabilities are exploited, by a Machine Learning approach. We plan to implement the proposed framework as a software prototype that will be tested as a component of the ALICE Grid middleware.
Building high-performance system for processing a daily large volume of Chinese satellites imagery

NASA Astrophysics Data System (ADS)

Deng, Huawu; Huang, Shicun; Wang, Qi; Pan, Zhiqiang; Xin, Yubin

2014-10-01

The number of Earth observation satellites from China increases dramatically recently and those satellites are acquiring a large volume of imagery daily. As the main portal of image processing and distribution from those Chinese satellites, the China Centre for Resources Satellite Data and Application (CRESDA) has been working with PCI Geomatics during the last three years to solve two issues in this regard: processing the large volume of data (about 1,500 scenes or 1 TB per day) in a timely manner and generating geometrically accurate orthorectified products. After three-year research and development, a high performance system has been built and successfully delivered. The high performance system has a service oriented architecture and can be deployed to a cluster of computers that may be configured with high end computing power. The high performance is gained through, first, making image processing algorithms into parallel computing by using high performance graphic processing unit (GPU) cards and multiple cores from multiple CPUs, and, second, distributing processing tasks to a cluster of computing nodes. While achieving up to thirty (and even more) times faster in performance compared with the traditional practice, a particular methodology was developed to improve the geometric accuracy of images acquired from Chinese satellites (including HJ-1 A/B, ZY-1-02C, ZY-3, GF-1, etc.). The methodology consists of fully automatic collection of dense ground control points (GCP) from various resources and then application of those points to improve the photogrammetric model of the images. The delivered system is up running at CRESDA for pre-operational production and has been and is generating good return on investment by eliminating a great amount of manual labor and increasing more than ten times of data throughput daily with fewer operators. Future work, such as development of more performance-optimized algorithms, robust image matching methods and application workflows, is identified to improve the system in the coming years.
Arctic Boreal Vulnerability Experiment (ABoVE) Science Cloud

NASA Astrophysics Data System (ADS)

Duffy, D.; Schnase, J. L.; McInerney, M.; Webster, W. P.; Sinno, S.; Thompson, J. H.; Griffith, P. C.; Hoy, E.; Carroll, M.

2014-12-01

The effects of climate change are being revealed at alarming rates in the Arctic and Boreal regions of the planet. NASA's Terrestrial Ecology Program has launched a major field campaign to study these effects over the next 5 to 8 years. The Arctic Boreal Vulnerability Experiment (ABoVE) will challenge scientists to take measurements in the field, study remote observations, and even run models to better understand the impacts of a rapidly changing climate for areas of Alaska and western Canada. The NASA Center for Climate Simulation (NCCS) at the Goddard Space Flight Center (GSFC) has partnered with the Terrestrial Ecology Program to create a science cloud designed for this field campaign - the ABoVE Science Cloud. The cloud combines traditional high performance computing with emerging technologies to create an environment specifically designed for large-scale climate analytics. The ABoVE Science Cloud utilizes (1) virtualized high-speed InfiniBand networks, (2) a combination of high-performance file systems and object storage, and (3) virtual system environments tailored for data intensive, science applications. At the center of the architecture is a large object storage environment, much like a traditional high-performance file system, that supports data proximal processing using technologies like MapReduce on a Hadoop Distributed File System (HDFS). Surrounding the storage is a cloud of high performance compute resources with many processing cores and large memory coupled to the storage through an InfiniBand network. Virtual systems can be tailored to a specific scientist and provisioned on the compute resources with extremely high-speed network connectivity to the storage and to other virtual systems. In this talk, we will present the architectural components of the science cloud and examples of how it is being used to meet the needs of the ABoVE campaign. In our experience, the science cloud approach significantly lowers the barriers and risks to organizations that require high performance computing solutions and provides the NCCS with the agility required to meet our customers' rapidly increasing and evolving requirements.

Dawn Usage, Scheduling, and Governance Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Louis, S

2009-11-02

This document describes Dawn use, scheduling, and governance concerns. Users started running full-machine science runs in early April 2009 during the initial open shakedown period. Scheduling Dawn while in the Open Computing Facility (OCF) was controlled and coordinated via phone calls, emails, and a small number of controlled banks. With Dawn moving to the Secure Computing Facility (SCF) in fall of 2009, a more detailed scheduling and governance model is required. The three major objectives are: (1) Ensure Dawn resources are allocated on a program priority-driven basis; (2) Utilize Dawn resources on the job mixes for which they were intended;more » and (3) Minimize idle cycles through use of partitions, banks and proper job mix. The SCF workload for Dawn will be inherently different than Purple or BG/L, and therefore needs a different approach. Dawn's primary function is to permit adequate access for tri-lab code development in preparation for Sequoia, and in particular for weapons multi-physics codes in support of UQ. A second purpose is to provide time allocations for large-scale science runs and for UQ suite calculations to advance SSP program priorities. This proposed governance model will be the basis for initial time allocation of Dawn computing resources for the science and UQ workloads that merit priority on this class of resource, either because they cannot be reasonably attempted on any other resources due to size of problem, or because of the unavailability of sizable allocations on other ASC capability or capacity platforms. This proposed model intends to make the most effective use of Dawn as possible, but without being overly constrained by more formal proposal processes such as those now used for Purple CCCs.« less
Elastic Extension of a CMS Computing Centre Resources on External Clouds

NASA Astrophysics Data System (ADS)

Codispoti, G.; Di Maria, R.; Aiftimiei, C.; Bonacorsi, D.; Calligola, P.; Ciaschini, V.; Costantini, A.; Dal Pra, S.; DeGirolamo, D.; Grandi, C.; Michelotto, D.; Panella, M.; Peco, G.; Sapunenko, V.; Sgaravatto, M.; Taneja, S.; Zizzi, G.

2016-10-01

After the successful LHC data taking in Run-I and in view of the future runs, the LHC experiments are facing new challenges in the design and operation of the computing facilities. The computing infrastructure for Run-II is dimensioned to cope at most with the average amount of data recorded. The usage peaks, as already observed in Run-I, may however originate large backlogs, thus delaying the completion of the data reconstruction and ultimately the data availability for physics analysis. In order to cope with the production peaks, CMS - along the lines followed by other LHC experiments - is exploring the opportunity to access Cloud resources provided by external partners or commercial providers. Specific use cases have already been explored and successfully exploited during Long Shutdown 1 (LS1) and the first part of Run 2. In this work we present the proof of concept of the elastic extension of a CMS site, specifically the Bologna Tier-3, on an external OpenStack infrastructure. We focus on the “Cloud Bursting” of a CMS Grid site using a newly designed LSF configuration that allows the dynamic registration of new worker nodes to LSF. In this approach, the dynamically added worker nodes instantiated on the OpenStack infrastructure are transparently accessed by the LHC Grid tools and at the same time they serve as an extension of the farm for the local usage. The amount of resources allocated thus can be elastically modeled to cope up with the needs of CMS experiment and local users. Moreover, a direct access/integration of OpenStack resources to the CMS workload management system is explored. In this paper we present this approach, we report on the performances of the on-demand allocated resources, and we discuss the lessons learned and the next steps.
The Numerical Propulsion System Simulation: An Overview

NASA Technical Reports Server (NTRS)

Lytle, John K.

2000-01-01

Advances in computational technology and in physics-based modeling are making large-scale, detailed simulations of complex systems possible within the design environment. For example, the integration of computing, communications, and aerodynamics has reduced the time required to analyze major propulsion system components from days and weeks to minutes and hours. This breakthrough has enabled the detailed simulation of major propulsion system components to become a routine part of designing systems, providing the designer with critical information about the components early in the design process. This paper describes the development of the numerical propulsion system simulation (NPSS), a modular and extensible framework for the integration of multicomponent and multidisciplinary analysis tools using geographically distributed resources such as computing platforms, data bases, and people. The analysis is currently focused on large-scale modeling of complete aircraft engines. This will provide the product developer with a "virtual wind tunnel" that will reduce the number of hardware builds and tests required during the development of advanced aerospace propulsion systems.
Fast computation of quadrupole and hexadecapole approximations in microlensing with a single point-source evaluation

NASA Astrophysics Data System (ADS)

Cassan, Arnaud

2017-07-01

The exoplanet detection rate from gravitational microlensing has grown significantly in recent years thanks to a great enhancement of resources and improved observational strategy. Current observatories include ground-based wide-field and/or robotic world-wide networks of telescopes, as well as space-based observatories such as satellites Spitzer or Kepler/K2. This results in a large quantity of data to be processed and analysed, which is a challenge for modelling codes because of the complexity of the parameter space to be explored and the intensive computations required to evaluate the models. In this work, I present a method that allows to compute the quadrupole and hexadecapole approximations of the finite-source magnification with more efficiency than previously available codes, with routines about six times and four times faster, respectively. The quadrupole takes just about twice the time of a point-source evaluation, which advocates for generalizing its use to large portions of the light curves. The corresponding routines are available as open-source python codes.
Performance analysis of a large-grain dataflow scheduling paradigm

NASA Technical Reports Server (NTRS)

Young, Steven D.; Wills, Robert W.

1993-01-01

A paradigm for scheduling computations on a network of multiprocessors using large-grain data flow scheduling at run time is described and analyzed. The computations to be scheduled must follow a static flow graph, while the schedule itself will be dynamic (i.e., determined at run time). Many applications characterized by static flow exist, and they include real-time control and digital signal processing. With the advent of computer-aided software engineering (CASE) tools for capturing software designs in dataflow-like structures, macro-dataflow scheduling becomes increasingly attractive, if not necessary. For parallel implementations, using the macro-dataflow method allows the scheduling to be insulated from the application designer and enables the maximum utilization of available resources. Further, by allowing multitasking, processor utilizations can approach 100 percent while they maintain maximum speedup. Extensive simulation studies are performed on 4-, 8-, and 16-processor architectures that reflect the effects of communication delays, scheduling delays, algorithm class, and multitasking on performance and speedup gains.
Toward a Data Scalable Solution for Facilitating Discovery of Science Resources

DOE Office of Scientific and Technical Information (OSTI.GOV)

Weaver, Jesse R.; Castellana, Vito G.; Morari, Alessandro

Science is increasingly motivated by the need to process larger quantities of data. It is facing severe challenges in data collection, management, and processing, so much so that the computational demands of “data scaling” are competing with, and in many fields surpassing, the traditional objective of decreasing processing time. Example domains with large datasets include astronomy, biology, genomics, climate/weather, and material sciences. This paper presents a real-world use case in which we wish to answer queries pro- vided by domain scientists in order to facilitate discovery of relevant science resources. The problem is that the metadata for these science resourcesmore » is very large and is growing quickly, rapidly increasing the need for a data scaling solution. We propose a system – SGEM – designed for answering graph-based queries over large datasets on cluster architectures, and we re- port performance results for queries on the current RDESC dataset of nearly 1.4 billion triples, and on the well-known BSBM SPARQL query benchmark.« less
Intermediate Palomar Transient Factory: Realtime Image Subtraction Pipeline

DOE PAGES

Cao, Yi; Nugent, Peter E.; Kasliwal, Mansi M.

2016-09-28

A fast-turnaround pipeline for realtime data reduction plays an essential role in discovering and permitting followup observations to young supernovae and fast-evolving transients in modern time-domain surveys. In this paper, we present the realtime image subtraction pipeline in the intermediate Palomar Transient Factory. By using highperformance computing, efficient databases, and machine-learning algorithms, this pipeline manages to reliably deliver transient candidates within 10 minutes of images being taken. Our experience in using high-performance computing resources to process big data in astronomy serves as a trailblazer to dealing with data from large-scale time-domain facilities in the near future.
Intermediate Palomar Transient Factory: Realtime Image Subtraction Pipeline

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cao, Yi; Nugent, Peter E.; Kasliwal, Mansi M.

A fast-turnaround pipeline for realtime data reduction plays an essential role in discovering and permitting followup observations to young supernovae and fast-evolving transients in modern time-domain surveys. In this paper, we present the realtime image subtraction pipeline in the intermediate Palomar Transient Factory. By using highperformance computing, efficient databases, and machine-learning algorithms, this pipeline manages to reliably deliver transient candidates within 10 minutes of images being taken. Our experience in using high-performance computing resources to process big data in astronomy serves as a trailblazer to dealing with data from large-scale time-domain facilities in the near future.
BRYNTRN: A baryon transport model

NASA Technical Reports Server (NTRS)

Wilson, John W.; Townsend, Lawrence W.; Nealy, John E.; Chun, Sang Y.; Hong, B. S.; Buck, Warren W.; Lamkin, S. L.; Ganapol, Barry D.; Khan, Ferdous; Cucinotta, Francis A.

1989-01-01

The development of an interaction data base and a numerical solution to the transport of baryons through an arbitrary shield material based on a straight ahead approximation of the Boltzmann equation are described. The code is most accurate for continuous energy boundary values, but gives reasonable results for discrete spectra at the boundary using even a relatively coarse energy grid (30 points) and large spatial increments (1 cm in H2O). The resulting computer code is self-contained, efficient and ready to use. The code requires only a very small fraction of the computer resources required for Monte Carlo codes.
Facilitating Navigation Through Large Archives

NASA Technical Reports Server (NTRS)

Shelton, Robert O.; Smith, Stephanie L.; Troung, Dat; Hodgson, Terry R.

2005-01-01

Automated Visual Access (AVA) is a computer program that effectively makes a large collection of information visible in a manner that enables a user to quickly and efficiently locate information resources, with minimal need for conventional keyword searches and perusal of complex hierarchical directory systems. AVA includes three key components: (1) a taxonomy that comprises a collection of words and phrases, clustered according to meaning, that are used to classify information resources; (2) a statistical indexing and scoring engine; and (3) a component that generates a graphical user interface that uses the scoring data to generate a visual map of resources and topics. The top level of an AVA display is a pictorial representation of an information archive. The user enters the depicted archive by either clicking on a depiction of subject area cluster, selecting a topic from a list, or entering a query into a text box. The resulting display enables the user to view candidate information entities at various levels of detail. Resources are grouped spatially by topic with greatest generality at the top layer and increasing detail with depth. The user can zoom in or out of specific sites or into greater or lesser content detail.
Optimal estimation and scheduling in aquifer management using the rapid feedback control method

NASA Astrophysics Data System (ADS)

Ghorbanidehno, Hojat; Kokkinaki, Amalia; Kitanidis, Peter K.; Darve, Eric

2017-12-01

Management of water resources systems often involves a large number of parameters, as in the case of large, spatially heterogeneous aquifers, and a large number of "noisy" observations, as in the case of pressure observation in wells. Optimizing the operation of such systems requires both searching among many possible solutions and utilizing new information as it becomes available. However, the computational cost of this task increases rapidly with the size of the problem to the extent that textbook optimization methods are practically impossible to apply. In this paper, we present a new computationally efficient technique as a practical alternative for optimally operating large-scale dynamical systems. The proposed method, which we term Rapid Feedback Controller (RFC), provides a practical approach for combined monitoring, parameter estimation, uncertainty quantification, and optimal control for linear and nonlinear systems with a quadratic cost function. For illustration, we consider the case of a weakly nonlinear uncertain dynamical system with a quadratic objective function, specifically a two-dimensional heterogeneous aquifer management problem. To validate our method, we compare our results with the linear quadratic Gaussian (LQG) method, which is the basic approach for feedback control. We show that the computational cost of the RFC scales only linearly with the number of unknowns, a great improvement compared to the basic LQG control with a computational cost that scales quadratically. We demonstrate that the RFC method can obtain the optimal control values at a greatly reduced computational cost compared to the conventional LQG algorithm with small and controllable losses in the accuracy of the state and parameter estimation.
Computing arrival times of firefighting resources for initial attack

Treesearch

Romain M. Mees

1978-01-01

Dispatching of firefighting resources requires instantaneous or precalculated decisions. A FORTRAN computer program has been developed that can provide a list of resources in order of computed arrival time for initial attack on a fire. The program requires an accurate description of the existing road system and a list of all resources available on a planning unit....
Efficient preparation of large-block-code ancilla states for fault-tolerant quantum computation

NASA Astrophysics Data System (ADS)

Zheng, Yi-Cong; Lai, Ching-Yi; Brun, Todd A.

2018-03-01

Fault-tolerant quantum computation (FTQC) schemes that use multiqubit large block codes can potentially reduce the resource overhead to a great extent. A major obstacle is the requirement for a large number of clean ancilla states of different types without correlated errors inside each block. These ancilla states are usually logical stabilizer states of the data-code blocks, which are generally difficult to prepare if the code size is large. Previously, we have proposed an ancilla distillation protocol for Calderbank-Shor-Steane (CSS) codes by classical error-correcting codes. It was assumed that the quantum gates in the distillation circuit were perfect; however, in reality, noisy quantum gates may introduce correlated errors that are not treatable by the protocol. In this paper, we show that additional postselection by another classical error-detecting code can be applied to remove almost all correlated errors. Consequently, the revised protocol is fully fault tolerant and capable of preparing a large set of stabilizer states sufficient for FTQC using large block codes. At the same time, the yield rate can be boosted from O (t-2) to O (1 ) in practice for an [[n ,k ,d =2 t +1
OpenMP parallelization of a gridded SWAT (SWATG)

NASA Astrophysics Data System (ADS)

Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin

2017-12-01

Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
A Review of Computer Science Resources for Learning and Teaching with K-12 Computing Curricula: An Australian Case Study

ERIC Educational Resources Information Center

Falkner, Katrina; Vivian, Rebecca

2015-01-01

To support teachers to implement Computer Science curricula into classrooms from the very first year of school, teachers, schools and organisations seek quality curriculum resources to support implementation and teacher professional development. Until now, many Computer Science resources and outreach initiatives have targeted K-12 school-age…
From transistor to trapped-ion computers for quantum chemistry.

PubMed

Yung, M-H; Casanova, J; Mezzacapo, A; McClean, J; Lamata, L; Aspuru-Guzik, A; Solano, E

2014-01-07

Over the last few decades, quantum chemistry has progressed through the development of computational methods based on modern digital computers. However, these methods can hardly fulfill the exponentially-growing resource requirements when applied to large quantum systems. As pointed out by Feynman, this restriction is intrinsic to all computational models based on classical physics. Recently, the rapid advancement of trapped-ion technologies has opened new possibilities for quantum control and quantum simulations. Here, we present an efficient toolkit that exploits both the internal and motional degrees of freedom of trapped ions for solving problems in quantum chemistry, including molecular electronic structure, molecular dynamics, and vibronic coupling. We focus on applications that go beyond the capacity of classical computers, but may be realizable on state-of-the-art trapped-ion systems. These results allow us to envision a new paradigm of quantum chemistry that shifts from the current transistor to a near-future trapped-ion-based technology.
Computer graphics for management: An abstract of capabilities and applications of the EIS system

NASA Technical Reports Server (NTRS)

Solem, B. J.

1975-01-01

The Executive Information Services (EIS) system, developed as a computer-based, time-sharing tool for making and implementing management decisions, and including computer graphics capabilities, was described. The following resources are available through the EIS languages: centralized corporate/gov't data base, customized and working data bases, report writing, general computational capability, specialized routines, modeling/programming capability, and graphics. Nearly all EIS graphs can be created by a single, on-line instruction. A large number of options are available, such as selection of graphic form, line control, shading, placement on the page, multiple images on a page, control of scaling and labeling, plotting of cum data sets, optical grid lines, and stack charts. The following are examples of areas in which the EIS system may be used: research, estimating services, planning, budgeting, and performance measurement, national computer hook-up negotiations.
From transistor to trapped-ion computers for quantum chemistry

PubMed Central

Yung, M.-H.; Casanova, J.; Mezzacapo, A.; McClean, J.; Lamata, L.; Aspuru-Guzik, A.; Solano, E.

2014-01-01

Over the last few decades, quantum chemistry has progressed through the development of computational methods based on modern digital computers. However, these methods can hardly fulfill the exponentially-growing resource requirements when applied to large quantum systems. As pointed out by Feynman, this restriction is intrinsic to all computational models based on classical physics. Recently, the rapid advancement of trapped-ion technologies has opened new possibilities for quantum control and quantum simulations. Here, we present an efficient toolkit that exploits both the internal and motional degrees of freedom of trapped ions for solving problems in quantum chemistry, including molecular electronic structure, molecular dynamics, and vibronic coupling. We focus on applications that go beyond the capacity of classical computers, but may be realizable on state-of-the-art trapped-ion systems. These results allow us to envision a new paradigm of quantum chemistry that shifts from the current transistor to a near-future trapped-ion-based technology. PMID:24395054
Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation

NASA Technical Reports Server (NTRS)

Stocker, John C.; Golomb, Andrew M.

2011-01-01

Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.
Resource Balancing Control Allocation

NASA Technical Reports Server (NTRS)

Frost, Susan A.; Bodson, Marc

2010-01-01

Next generation aircraft with a large number of actuators will require advanced control allocation methods to compute the actuator commands needed to follow desired trajectories while respecting system constraints. Previously, algorithms were proposed to minimize the l1 or l2 norms of the tracking error and of the control effort. The paper discusses the alternative choice of using the l1 norm for minimization of the tracking error and a normalized l(infinity) norm, or sup norm, for minimization of the control effort. The algorithm computes the norm of the actuator deflections scaled by the actuator limits. Minimization of the control effort then translates into the minimization of the maximum actuator deflection as a percentage of its range of motion. The paper shows how the problem can be solved effectively by converting it into a linear program and solving it using a simplex algorithm. Properties of the algorithm are investigated through examples. In particular, the min-max criterion results in a type of resource balancing, where the resources are the control surfaces and the algorithm balances these resources to achieve the desired command. A study of the sensitivity of the algorithms to the data is presented, which shows that the normalized l(infinity) algorithm has the lowest sensitivity, although high sensitivities are observed whenever the limits of performance are reached.

Biometric Methods for Secure Communications in Body Sensor Networks: Resource-Efficient Key Management and Signal-Level Data Scrambling

NASA Astrophysics Data System (ADS)

Bui, Francis Minhthang; Hatzinakos, Dimitrios

2007-12-01

As electronic communications become more prevalent, mobile and universal, the threats of data compromises also accordingly loom larger. In the context of a body sensor network (BSN), which permits pervasive monitoring of potentially sensitive medical data, security and privacy concerns are particularly important. It is a challenge to implement traditional security infrastructures in these types of lightweight networks since they are by design limited in both computational and communication resources. A key enabling technology for secure communications in BSN's has emerged to be biometrics. In this work, we present two complementary approaches which exploit physiological signals to address security issues: (1) a resource-efficient key management system for generating and distributing cryptographic keys to constituent sensors in a BSN; (2) a novel data scrambling method, based on interpolation and random sampling, that is envisioned as a potential alternative to conventional symmetric encryption algorithms for certain types of data. The former targets the resource constraints in BSN's, while the latter addresses the fuzzy variability of biometric signals, which has largely precluded the direct application of conventional encryption. Using electrocardiogram (ECG) signals as biometrics, the resulting computer simulations demonstrate the feasibility and efficacy of these methods for delivering secure communications in BSN's.
Operating Dedicated Data Centers - Is It Cost-Effective?

NASA Astrophysics Data System (ADS)

Ernst, M.; Hogue, R.; Hollowell, C.; Strecker-Kellog, W.; Wong, A.; Zaytsev, A.

2014-06-01

The advent of cloud computing centres such as Amazon's EC2 and Google's Computing Engine has elicited comparisons with dedicated computing clusters. Discussions on appropriate usage of cloud resources (both academic and commercial) and costs have ensued. This presentation discusses a detailed analysis of the costs of operating and maintaining the RACF (RHIC and ATLAS Computing Facility) compute cluster at Brookhaven National Lab and compares them with the cost of cloud computing resources under various usage scenarios. An extrapolation of likely future cost effectiveness of dedicated computing resources is also presented.
Final Report for ALCC Allocation: Predictive Simulation of Complex Flow in Wind Farms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barone, Matthew F.; Ananthan, Shreyas; Churchfield, Matt

This report documents work performed using ALCC computing resources granted under a proposal submitted in February 2016, with the resource allocation period spanning the period July 2016 through June 2017. The award allocation was 10.7 million processor-hours at the National Energy Research Scientific Computing Center. The simulations performed were in support of two projects: the Atmosphere to Electrons (A2e) project, supported by the DOE EERE office; and the Exascale Computing Project (ECP), supported by the DOE Office of Science. The project team for both efforts consists of staff scientists and postdocs from Sandia National Laboratories and the National Renewable Energymore » Laboratory. At the heart of these projects is the open-source computational-fluid-dynamics (CFD) code, Nalu. Nalu solves the low-Mach-number Navier-Stokes equations using an unstructured- grid discretization. Nalu leverages the open-source Trilinos solver library and the Sierra Toolkit (STK) for parallelization and I/O. This report documents baseline computational performance of the Nalu code on problems of direct relevance to the wind plant physics application - namely, Large Eddy Simulation (LES) of an atmospheric boundary layer (ABL) flow and wall-modeled LES of a flow past a static wind turbine rotor blade. Parallel performance of Nalu and its constituent solver routines residing in the Trilinos library has been assessed previously under various campaigns. However, both Nalu and Trilinos have been, and remain, in active development and resources have not been available previously to rigorously track code performance over time. With the initiation of the ECP, it is important to establish and document baseline code performance on the problems of interest. This will allow the project team to identify and target any deficiencies in performance, as well as highlight any performance bottlenecks as we exercise the code on a greater variety of platforms and at larger scales. The current study is rather modest in scale, examining performance on problem sizes of O(100 million) elements and core counts up to 8k cores. This will be expanded as more computational resources become available to the projects.« less
PanDA for ATLAS distributed computing in the next decade

NASA Astrophysics Data System (ADS)

Barreiro Megino, F. H.; De, K.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Padolski, S.; Panitkin, S.; Wenaus, T.; ATLAS Collaboration

2017-10-01

The Production and Distributed Analysis (PanDA) system has been developed to meet ATLAS production and analysis requirements for a data-driven workload management system capable of operating at the Large Hadron Collider (LHC) data processing scale. Heterogeneous resources used by the ATLAS experiment are distributed worldwide at hundreds of sites, thousands of physicists analyse the data remotely, the volume of processed data is beyond the exabyte scale, dozens of scientific applications are supported, while data processing requires more than a few billion hours of computing usage per year. PanDA performed very well over the last decade including the LHC Run 1 data taking period. However, it was decided to upgrade the whole system concurrently with the LHC’s first long shutdown in order to cope with rapidly changing computing infrastructure. After two years of reengineering efforts, PanDA has embedded capabilities for fully dynamic and flexible workload management. The static batch job paradigm was discarded in favor of a more automated and scalable model. Workloads are dynamically tailored for optimal usage of resources, with the brokerage taking network traffic and forecasts into account. Computing resources are partitioned based on dynamic knowledge of their status and characteristics. The pilot has been re-factored around a plugin structure for easier development and deployment. Bookkeeping is handled with both coarse and fine granularities for efficient utilization of pledged or opportunistic resources. An in-house security mechanism authenticates the pilot and data management services in off-grid environments such as volunteer computing and private local clusters. The PanDA monitor has been extensively optimized for performance and extended with analytics to provide aggregated summaries of the system as well as drill-down to operational details. There are as well many other challenges planned or recently implemented, and adoption by non-LHC experiments such as bioinformatics groups successfully running Paleomix (microbial genome and metagenomes) payload on supercomputers. In this paper we will focus on the new and planned features that are most important to the next decade of distributed computing workload management.
A Simple XML Producer-Consumer Protocol

NASA Technical Reports Server (NTRS)

Smith, Warren; Gunter, Dan; Quesnel, Darcy; Biegel, Bryan (Technical Monitor)

2001-01-01

There are many different projects from government, academia, and industry that provide services for delivering events in distributed environments. The problem with these event services is that they are not general enough to support all uses and they speak different protocols so that they cannot interoperate. We require such interoperability when we, for example, wish to analyze the performance of an application in a distributed environment. Such an analysis might require performance information from the application, computer systems, networks, and scientific instruments. In this work we propose and evaluate a standard XML-based protocol for the transmission of events in distributed systems. One recent trend in government and academic research is the development and deployment of computational grids. Computational grids are large-scale distributed systems that typically consist of high-performance compute, storage, and networking resources. Examples of such computational grids are the DOE Science Grid, the NASA Information Power Grid (IPG), and the NSF Partnerships for Advanced Computing Infrastructure (PACIs). The major effort to deploy these grids is in the area of developing the software services to allow users to execute applications on these large and diverse sets of resources. These services include security, execution of remote applications, managing remote data, access to information about resources and services, and so on. There are several toolkits for providing these services such as Globus, Legion, and Condor. As part of these efforts to develop computational grids, the Global Grid Forum is working to standardize the protocols and APIs used by various grid services. This standardization will allow interoperability between the client and server software of the toolkits that are providing the grid services. The goal of the Performance Working Group of the Grid Forum is to standardize protocols and representations related to the storage and distribution of performance data. These standard protocols and representations must support tasks such as profiling parallel applications, monitoring the status of computers and networks, and monitoring the performance of services provided by a computational grid. This paper describes a proposed protocol and data representation for the exchange of events in a distributed system. The protocol exchanges messages formatted in XML and it can be layered atop any low-level communication protocol such as TCP or UDP Further, we describe Java and C++ implementations of this protocol and discuss their performance. The next section will provide some further background information. Section 3 describes the main communication patterns of our protocol. Section 4 describes how we represent events and related information using XML. Section 5 describes our protocol and Section 6 discusses the performance of two implementations of the protocol. Finally, an appendix provides the XML Schema definition of our protocol and event information.
Computing the Envelope for Stepwise-Constant Resource Allocations

NASA Technical Reports Server (NTRS)

Muscettola, Nicola; Clancy, Daniel (Technical Monitor)

2002-01-01

Computing tight resource-level bounds is a fundamental problem in the construction of flexible plans with resource utilization. In this paper we describe an efficient algorithm that builds a resource envelope, the tightest possible such bound. The algorithm is based on transforming the temporal network of resource consuming and producing events into a flow network with nodes equal to the events and edges equal to the necessary predecessor links between events. A staged maximum flow problem on the network is then used to compute the time of occurrence and the height of each step of the resource envelope profile. Each stage has the same computational complexity of solving a maximum flow problem on the entire flow network. This makes this method computationally feasible and promising for use in the inner loop of flexible-time scheduling algorithms.
Post Graduations in Technologies and Computing Applied to Education: From F2F Classes to Multimedia Online Open Courses

ERIC Educational Resources Information Center

Marques, Bertil P.; Carvalho, Piedade; Escudeiro, Paula; Barata, Ana; Silva, Ana; Queiros, Sandra

2017-01-01

Promoted by the significant increase of large scale internet access, many audiences have turned to the web and to its resources for learning and inspiration, with diverse sets of skills and intents. In this context, Multimedia Online Open Courses (MOOC) consist in learning models supported on user-friendly web tools that allow anyone with minimum…
Globus | Informatics Technology for Cancer Research (ITCR)

Cancer.gov

Globus software services provide secure cancer research data transfer, synchronization, and sharing in distributed environments at large scale. These services can be integrated into applications and research data gateways, leveraging Globus identity management, single sign-on, search, and authorization capabilities. Globus Genomics integrates Globus with the Galaxy genomics workflow engine and Amazon Web Services to enable cancer genomics analysis that can elastically scale compute resources with demand.
Demands of Social Change as a Function of the Political Context, Institutional Filters, and Psychosocial Resources

ERIC Educational Resources Information Center

Tomasik, Martin J.; Silbereisen, Rainer K.

2009-01-01

Individually experienced demands of current social change in the domains of work and family were assessed in a large sample of adults from two Western and two Eastern federal states of Germany. For each domain of life, a cumulated index was computed representing the load with highly endorsed demands and this was compared across political regions,…
Theoretical and experimental study of a new algorithm for factoring numbers

NASA Astrophysics Data System (ADS)

Tamma, Vincenzo

The security of codes, for example in credit card and government information, relies on the fact that the factorization of a large integer N is a rather costly process on a classical digital computer. Such a security is endangered by Shor's algorithm which employs entangled quantum systems to find, with a polynomial number of resources, the period of a function which is connected with the factors of N. We can surely expect a possible future realization of such a method for large numbers, but so far the period of Shor's function has been only computed for the number 15. Inspired by Shor's idea, our work aims to methods of factorization based on the periodicity measurement of a given continuous periodic "factoring function" which is physically implementable using an analogue computer. In particular, we have focused on both the theoretical and the experimental analysis of Gauss sums with continuous arguments leading to a new factorization algorithm. The procedure allows, for the first time, to factor several numbers by measuring the periodicity of Gauss sums performing first-order "factoring" interfer ence processes. We experimentally implemented this idea by exploiting polychromatic optical interference in the visible range with a multi-path interferometer, and achieved the factorization of seven digit numbers. The physical principle behind this "factoring" interference procedure can be potentially exploited also on entangled systems, as multi-photon entangled states, in order to achieve a polynomial scaling in the number of resources.
MultiPhyl: a high-throughput phylogenomics webserver using distributed computing

PubMed Central

Keane, Thomas M.; Naughton, Thomas J.; McInerney, James O.

2007-01-01

With the number of fully sequenced genomes increasing steadily, there is greater interest in performing large-scale phylogenomic analyses from large numbers of individual gene families. Maximum likelihood (ML) has been shown repeatedly to be one of the most accurate methods for phylogenetic construction. Recently, there have been a number of algorithmic improvements in maximum-likelihood-based tree search methods. However, it can still take a long time to analyse the evolutionary history of many gene families using a single computer. Distributed computing refers to a method of combining the computing power of multiple computers in order to perform some larger overall calculation. In this article, we present the first high-throughput implementation of a distributed phylogenetics platform, MultiPhyl, capable of using the idle computational resources of many heterogeneous non-dedicated machines to form a phylogenetics supercomputer. MultiPhyl allows a user to upload hundreds or thousands of amino acid or nucleotide alignments simultaneously and perform computationally intensive tasks such as model selection, tree searching and bootstrapping of each of the alignments using many desktop machines. The program implements a set of 88 amino acid models and 56 nucleotide maximum likelihood models and a variety of statistical methods for choosing between alternative models. A MultiPhyl webserver is available for public use at: http://www.cs.nuim.ie/distributed/multiphyl.php. PMID:17553837
Application of high level wavefunction methods in quantum mechanics/molecular mechanics hybrid schemes.

PubMed

Mata, Ricardo A

2010-05-21

In this Perspective, several developments in the field of quantum mechanics/molecular mechanics (QM/MM) approaches are reviewed. Emphasis is placed on the use of correlated wavefunction theory and new state of the art methods for the treatment of large quantum systems. Until recently, computational chemistry approaches to large/complex chemical problems have seldom been considered as tools for quantitative predictions. However, due to the tremendous development of computational resources and new quantum chemical methods, it is nowadays possible to describe the electronic structure of biomolecules at levels of theory which a decade ago were only possible for system sizes of up to 20 atoms. These advances are here outlined in the context of QM/MM. The article concludes with a short outlook on upcoming developments and possible bottlenecks for future applications.
A Primer on Infectious Disease Bacterial Genomics

PubMed Central

Petkau, Aaron; Knox, Natalie; Graham, Morag; Van Domselaar, Gary

2016-01-01

SUMMARY The number of large-scale genomics projects is increasing due to the availability of affordable high-throughput sequencing (HTS) technologies. The use of HTS for bacterial infectious disease research is attractive because one whole-genome sequencing (WGS) run can replace multiple assays for bacterial typing, molecular epidemiology investigations, and more in-depth pathogenomic studies. The computational resources and bioinformatics expertise required to accommodate and analyze the large amounts of data pose new challenges for researchers embarking on genomics projects for the first time. Here, we present a comprehensive overview of a bacterial genomics projects from beginning to end, with a particular focus on the planning and computational requirements for HTS data, and provide a general understanding of the analytical concepts to develop a workflow that will meet the objectives and goals of HTS projects. PMID:28590251
Multilevel UQ strategies for large-scale multiphysics applications: PSAAP II solar receiver

NASA Astrophysics Data System (ADS)

Jofre, Lluis; Geraci, Gianluca; Iaccarino, Gianluca

2017-06-01

Uncertainty quantification (UQ) plays a fundamental part in building confidence in predictive science. Of particular interest is the case of modeling and simulating engineering applications where, due to the inherent complexity, many uncertainties naturally arise, e.g. domain geometry, operating conditions, errors induced by modeling assumptions, etc. In this regard, one of the pacing items, especially in high-fidelity computational fluid dynamics (CFD) simulations, is the large amount of computing resources typically required to propagate incertitude through the models. Upcoming exascale supercomputers will significantly increase the available computational power. However, UQ approaches cannot entrust their applicability only on brute force Monte Carlo (MC) sampling; the large number of uncertainty sources and the presence of nonlinearities in the solution will make straightforward MC analysis unaffordable. Therefore, this work explores the multilevel MC strategy, and its extension to multi-fidelity and time convergence, to accelerate the estimation of the effect of uncertainties. The approach is described in detail, and its performance demonstrated on a radiated turbulent particle-laden flow case relevant to solar energy receivers (PSAAP II: Particle-laden turbulence in a radiation environment). Investigation funded by DoE's NNSA under PSAAP II.
Using Swarming Agents for Scalable Security in Large Network Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Crouse, Michael; White, Jacob L.; Fulp, Errin W.

2011-09-23

The difficulty of securing computer infrastructures increases as they grow in size and complexity. Network-based security solutions such as IDS and firewalls cannot scale because of exponentially increasing computational costs inherent in detecting the rapidly growing number of threat signatures. Hostbased solutions like virus scanners and IDS suffer similar issues, and these are compounded when enterprises try to monitor these in a centralized manner. Swarm-based autonomous agent systems like digital ants and artificial immune systems can provide a scalable security solution for large network environments. The digital ants approach offers a biologically inspired design where each ant in the virtualmore » colony can detect atoms of evidence that may help identify a possible threat. By assembling the atomic evidences from different ant types the colony may detect the threat. This decentralized approach can require, on average, fewer computational resources than traditional centralized solutions; however there are limits to its scalability. This paper describes how dividing a large infrastructure into smaller managed enclaves allows the digital ant framework to effectively operate in larger environments. Experimental results will show that using smaller enclaves allows for more consistent distribution of agents and results in faster response times.« less
Computing Properties of Hadrons, Nuclei and Nuclear Matter from Quantum Chromodynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Savage, Martin J.

This project was part of a coordinated software development effort which the nuclear physics lattice QCD community pursues in order to ensure that lattice calculations can make optimal use of present, and forthcoming leadership-class and dedicated hardware, including those of the national laboratories, and prepares for the exploitation of future computational resources in the exascale era. The UW team improved and extended software libraries used in lattice QCD calculations related to multi-nucleon systems, enhanced production running codes related to load balancing multi-nucleon production on large-scale computing platforms, and developed SQLite (addressable database) interfaces to efficiently archive and analyze multi-nucleon datamore » and developed a Mathematica interface for the SQLite databases.« less
A FPGA-based architecture for real-time image matching

NASA Astrophysics Data System (ADS)

Wang, Jianhui; Zhong, Sheng; Xu, Wenhui; Zhang, Weijun; Cao, Zhiguo

2013-10-01

Image matching is a fundamental task in computer vision. It is used to establish correspondence between two images taken at different viewpoint or different time from the same scene. However, its large computational complexity has been a challenge to most embedded systems. This paper proposes a single FPGA-based image matching system, which consists of SIFT feature detection, BRIEF descriptor extraction and BRIEF matching. It optimizes the FPGA architecture for the SIFT feature detection to reduce the FPGA resources utilization. Moreover, we implement BRIEF description and matching on FPGA also. The proposed system can implement image matching at 30fps (frame per second) for 1280x720 images. Its processing speed can meet the demand of most real-life computer vision applications.
Information processing using a single dynamical node as complex system

PubMed Central

Appeltant, L.; Soriano, M.C.; Van der Sande, G.; Danckaert, J.; Massar, S.; Dambre, J.; Schrauwen, B.; Mirasso, C.R.; Fischer, I.

2011-01-01

Novel methods for information processing are highly desired in our information-driven society. Inspired by the brain's ability to process information, the recently introduced paradigm known as 'reservoir computing' shows that complex networks can efficiently perform computation. Here we introduce a novel architecture that reduces the usually required large number of elements to a single nonlinear node with delayed feedback. Through an electronic implementation, we experimentally and numerically demonstrate excellent performance in a speech recognition benchmark. Complementary numerical studies also show excellent performance for a time series prediction benchmark. These results prove that delay-dynamical systems, even in their simplest manifestation, can perform efficient information processing. This finding paves the way to feasible and resource-efficient technological implementations of reservoir computing. PMID:21915110
Computer-assisted qualitative data analysis software.

PubMed

Cope, Diane G

2014-05-01

Advances in technology have provided new approaches for data collection methods and analysis for researchers. Data collection is no longer limited to paper-and-pencil format, and numerous methods are now available through Internet and electronic resources. With these techniques, researchers are not burdened with entering data manually and data analysis is facilitated by software programs. Quantitative research is supported by the use of computer software and provides ease in the management of large data sets and rapid analysis of numeric statistical methods. New technologies are emerging to support qualitative research with the availability of computer-assisted qualitative data analysis software (CAQDAS).CAQDAS will be presented with a discussion of advantages, limitations, controversial issues, and recommendations for this type of software use.
Use of graphics in the design office at the Military Aircraft Division of the British Aircraft Corporation

NASA Technical Reports Server (NTRS)

Coles, W. A.

1975-01-01

The CAD/CAM interactive computer graphics system was described; uses to which it has been put were shown, and current developments of the system were outlined. The system supports batch, time sharing, and fully interactive graphic processing. Engineers using the system may switch between these methods of data processing and problem solving to make the best use of the available resources. It is concluded that the introduction of on-line computing in the form of teletypes, storage tubes, and fully interactive graphics has resulted in large increases in productivity and reduced timescales in the geometric computing, numerical lofting and part programming areas, together with a greater utilization of the system in the technical departments.

Distributed Accounting on the Grid

NASA Technical Reports Server (NTRS)

Thigpen, William; Hacker, Thomas J.; McGinnis, Laura F.; Athey, Brian D.

2001-01-01

By the late 1990s, the Internet was adequately equipped to move vast amounts of data between HPC (High Performance Computing) systems, and efforts were initiated to link together the national infrastructure of high performance computational and data storage resources together into a general computational utility 'grid', analogous to the national electrical power grid infrastructure. The purpose of the Computational grid is to provide dependable, consistent, pervasive, and inexpensive access to computational resources for the computing community in the form of a computing utility. This paper presents a fully distributed view of Grid usage accounting and a methodology for allocating Grid computational resources for use on a Grid computing system.
Distributed Problem Solving: Adaptive Networks with a Computer Intermediary Resource. Intelligent Executive Computer Communication

DTIC Science & Technology

1991-06-01

Proceedings of The National Conference on Artificial Intelligence , pages 181-184, The American Association for Aritificial Intelligence , Pittsburgh...Intermediary Resource: Intelligent Executive Computer Communication John Lyman and Carla J. Conaway University of California at Los Angeles for Contracting...Include Security Classification) Interim Report: Distributed Problem Solving: Adaptive Networks With a Computer Intermediary Resource: Intelligent
MIDAS, prototype Multivariate Interactive Digital Analysis System for large area earth resources surveys. Volume 1: System description

NASA Technical Reports Server (NTRS)

Christenson, D.; Gordon, M.; Kistler, R.; Kriegler, F.; Lampert, S.; Marshall, R.; Mclaughlin, R.

1977-01-01

A third-generation, fast, low cost, multispectral recognition system (MIDAS) able to keep pace with the large quantity and high rates of data acquisition from large regions with present and projected sensots is described. The program can process a complete ERTS frame in forty seconds and provide a color map of sixteen constituent categories in a few minutes. A principle objective of the MIDAS program is to provide a system well interfaced with the human operator and thus to obtain large overall reductions in turn-around time and significant gains in throughput. The hardware and software generated in the overall program is described. The system contains a midi-computer to control the various high speed processing elements in the data path, a preprocessor to condition data, and a classifier which implements an all digital prototype multivariate Gaussian maximum likelihood or a Bayesian decision algorithm. Sufficient software was developed to perform signature extraction, control the preprocessor, compute classifier coefficients, control the classifier operation, operate the color display and printer, and diagnose operation.
DEM Based Modeling: Grid or TIN? The Answer Depends

NASA Astrophysics Data System (ADS)

Ogden, F. L.; Moreno, H. A.

2015-12-01

The availability of petascale supercomputing power has enabled process-based hydrological simulations on large watersheds and two-way coupling with mesoscale atmospheric models. Of course with increasing watershed scale come corresponding increases in watershed complexity, including wide ranging water management infrastructure and objectives, and ever increasing demands for forcing data. Simulations of large watersheds using grid-based models apply a fixed resolution over the entire watershed. In large watersheds, this means an enormous number of grids, or coarsening of the grid resolution to reduce memory requirements. One alternative to grid-based methods is the triangular irregular network (TIN) approach. TINs provide the flexibility of variable resolution, which allows optimization of computational resources by providing high resolution where necessary and low resolution elsewhere. TINs also increase required effort in model setup, parameter estimation, and coupling with forcing data which are often gridded. This presentation discusses the costs and benefits of the use of TINs compared to grid-based methods, in the context of large watershed simulations within the traditional gridded WRF-HYDRO framework and the new TIN-based ADHydro high performance computing watershed simulator.
Fast probabilistic file fingerprinting for big data

PubMed Central

2013-01-01

Background Biological data acquisition is raising new challenges, both in data analysis and handling. Not only is it proving hard to analyze the data at the rate it is generated today, but simply reading and transferring data files can be prohibitively slow due to their size. This primarily concerns logistics within and between data centers, but is also important for workstation users in the analysis phase. Common usage patterns, such as comparing and transferring files, are proving computationally expensive and are tying down shared resources. Results We present an efficient method for calculating file uniqueness for large scientific data files, that takes less computational effort than existing techniques. This method, called Probabilistic Fast File Fingerprinting (PFFF), exploits the variation present in biological data and computes file fingerprints by sampling randomly from the file instead of reading it in full. Consequently, it has a flat performance characteristic, correlated with data variation rather than file size. We demonstrate that probabilistic fingerprinting can be as reliable as existing hashing techniques, with provably negligible risk of collisions. We measure the performance of the algorithm on a number of data storage and access technologies, identifying its strengths as well as limitations. Conclusions Probabilistic fingerprinting may significantly reduce the use of computational resources when comparing very large files. Utilisation of probabilistic fingerprinting techniques can increase the speed of common file-related workflows, both in the data center and for workbench analysis. The implementation of the algorithm is available as an open-source tool named pfff, as a command-line tool as well as a C library. The tool can be downloaded from http://biit.cs.ut.ee/pfff. PMID:23445565
Computational biology in the cloud: methods and new insights from computing at scale.

PubMed

Kasson, Peter M

2013-01-01

The past few years have seen both explosions in the size of biological data sets and the proliferation of new, highly flexible on-demand computing capabilities. The sheer amount of information available from genomic and metagenomic sequencing, high-throughput proteomics, experimental and simulation datasets on molecular structure and dynamics affords an opportunity for greatly expanded insight, but it creates new challenges of scale for computation, storage, and interpretation of petascale data. Cloud computing resources have the potential to help solve these problems by offering a utility model of computing and storage: near-unlimited capacity, the ability to burst usage, and cheap and flexible payment models. Effective use of cloud computing on large biological datasets requires dealing with non-trivial problems of scale and robustness, since performance-limiting factors can change substantially when a dataset grows by a factor of 10,000 or more. New computing paradigms are thus often needed. The use of cloud platforms also creates new opportunities to share data, reduce duplication, and to provide easy reproducibility by making the datasets and computational methods easily available.
Grid accounting service: state and future development

NASA Astrophysics Data System (ADS)

Levshina, T.; Sehgal, C.; Bockelman, B.; Weitzel, D.; Guru, A.

2014-06-01

During the last decade, large-scale federated distributed infrastructures have been continually developed and expanded. One of the crucial components of a cyber-infrastructure is an accounting service that collects data related to resource utilization and identity of users using resources. The accounting service is important for verifying pledged resource allocation per particular groups and users, providing reports for funding agencies and resource providers, and understanding hardware provisioning requirements. It can also be used for end-to-end troubleshooting as well as billing purposes. In this work we describe Gratia, a federated accounting service jointly developed at Fermilab and Holland Computing Center at University of Nebraska-Lincoln. The Open Science Grid, Fermilab, HCC, and several other institutions have used Gratia in production for several years. The current development activities include expanding Virtual Machines provisioning information, XSEDE allocation usage accounting, and Campus Grids resource utilization. We also identify the direction of future work: improvement and expansion of Cloud accounting, persistent and elastic storage space allocation, and the incorporation of WAN and LAN network metrics.
CyberShake: Running Seismic Hazard Workflows on Distributed HPC Resources

NASA Astrophysics Data System (ADS)

Callaghan, S.; Maechling, P. J.; Graves, R. W.; Gill, D.; Olsen, K. B.; Milner, K. R.; Yu, J.; Jordan, T. H.

2013-12-01

As part of its program of earthquake system science research, the Southern California Earthquake Center (SCEC) has developed a simulation platform, CyberShake, to perform physics-based probabilistic seismic hazard analysis (PSHA) using 3D deterministic wave propagation simulations. CyberShake performs PSHA by simulating a tensor-valued wavefield of Strain Green Tensors, and then using seismic reciprocity to calculate synthetic seismograms for about 415,000 events per site of interest. These seismograms are processed to compute ground motion intensity measures, which are then combined with probabilities from an earthquake rupture forecast to produce a site-specific hazard curve. Seismic hazard curves for hundreds of sites in a region can be used to calculate a seismic hazard map, representing the seismic hazard for a region. We present a recently completed PHSA study in which we calculated four CyberShake seismic hazard maps for the Southern California area to compare how CyberShake hazard results are affected by different SGT computational codes (AWP-ODC and AWP-RWG) and different community velocity models (Community Velocity Model - SCEC (CVM-S4) v11.11 and Community Velocity Model - Harvard (CVM-H) v11.9). We present our approach to running workflow applications on distributed HPC resources, including systems without support for remote job submission. We show how our approach extends the benefits of scientific workflows, such as job and data management, to large-scale applications on Track 1 and Leadership class open-science HPC resources. We used our distributed workflow approach to perform CyberShake Study 13.4 on two new NSF open-science HPC computing resources, Blue Waters and Stampede, executing over 470 million tasks to calculate physics-based hazard curves for 286 locations in the Southern California region. For each location, we calculated seismic hazard curves with two different community velocity models and two different SGT codes, resulting in over 1100 hazard curves. We will report on the performance of this CyberShake study, four times larger than previous studies. Additionally, we will examine the challenges we face applying these workflow techniques to additional open-science HPC systems and discuss whether our workflow solutions continue to provide value to our large-scale PSHA calculations.
Mobile-Cloud Assisted Video Summarization Framework for Efficient Management of Remote Sensing Data Generated by Wireless Capsule Sensors

PubMed Central

Mehmood, Irfan; Sajjad, Muhammad; Baik, Sung Wook

2014-01-01

Wireless capsule endoscopy (WCE) has great advantages over traditional endoscopy because it is portable and easy to use, especially in remote monitoring health-services. However, during the WCE process, the large amount of captured video data demands a significant deal of computation to analyze and retrieve informative video frames. In order to facilitate efficient WCE data collection and browsing task, we present a resource- and bandwidth-aware WCE video summarization framework that extracts the representative keyframes of the WCE video contents by removing redundant and non-informative frames. For redundancy elimination, we use Jeffrey-divergence between color histograms and inter-frame Boolean series-based correlation of color channels. To remove non-informative frames, multi-fractal texture features are extracted to assist the classification using an ensemble-based classifier. Owing to the limited WCE resources, it is impossible for the WCE system to perform computationally intensive video summarization tasks. To resolve computational challenges, mobile-cloud architecture is incorporated, which provides resizable computing capacities by adaptively offloading video summarization tasks between the client and the cloud server. The qualitative and quantitative results are encouraging and show that the proposed framework saves information transmission cost and bandwidth, as well as the valuable time of data analysts in browsing remote sensing data. PMID:25225874
Design and Analysis of Self-Adapted Task Scheduling Strategies in Wireless Sensor Networks

PubMed Central

Guo, Wenzhong; Xiong, Naixue; Chao, Han-Chieh; Hussain, Sajid; Chen, Guolong

2011-01-01

In a wireless sensor network (WSN), the usage of resources is usually highly related to the execution of tasks which consume a certain amount of computing and communication bandwidth. Parallel processing among sensors is a promising solution to provide the demanded computation capacity in WSNs. Task allocation and scheduling is a typical problem in the area of high performance computing. Although task allocation and scheduling in wired processor networks has been well studied in the past, their counterparts for WSNs remain largely unexplored. Existing traditional high performance computing solutions cannot be directly implemented in WSNs due to the limitations of WSNs such as limited resource availability and the shared communication medium. In this paper, a self-adapted task scheduling strategy for WSNs is presented. First, a multi-agent-based architecture for WSNs is proposed and a mathematical model of dynamic alliance is constructed for the task allocation problem. Then an effective discrete particle swarm optimization (PSO) algorithm for the dynamic alliance (DPSO-DA) with a well-designed particle position code and fitness function is proposed. A mutation operator which can effectively improve the algorithm’s ability of global search and population diversity is also introduced in this algorithm. Finally, the simulation results show that the proposed solution can achieve significant better performance than other algorithms. PMID:22163971
Extended outlook: description, utilization, and daily applications of cloud technology in radiology.

PubMed

Gerard, Perry; Kapadia, Neil; Chang, Patricia T; Acharya, Jay; Seiler, Michael; Lefkovitz, Zvi

2013-12-01

The purpose of this article is to discuss the concept of cloud technology, its role in medical applications and radiology, the role of the radiologist in using and accessing these vast resources of information, and privacy concerns and HIPAA compliance strategies. Cloud computing is the delivery of shared resources, software, and information to computers and other devices as a metered service. This technology has a promising role in the sharing of patient medical information and appears to be particularly suited for application in radiology, given the field's inherent need for storage and access to large amounts of data. The radiology cloud has significant strengths, such as providing centralized storage and access, reducing unnecessary repeat radiologic studies, and potentially allowing radiologic second opinions more easily. There are significant cost advantages to cloud computing because of a decreased need for infrastructure and equipment by the institution. Private clouds may be used to ensure secure storage of data and compliance with HIPAA. In choosing a cloud service, there are important aspects, such as disaster recovery plans, uptime, and security audits, that must be considered. Given that the field of radiology has become almost exclusively digital in recent years, the future of secure storage and easy access to imaging studies lies within cloud computing technology.
Mobile-cloud assisted video summarization framework for efficient management of remote sensing data generated by wireless capsule sensors.

PubMed

Mehmood, Irfan; Sajjad, Muhammad; Baik, Sung Wook

2014-09-15

Wireless capsule endoscopy (WCE) has great advantages over traditional endoscopy because it is portable and easy to use, especially in remote monitoring health-services. However, during the WCE process, the large amount of captured video data demands a significant deal of computation to analyze and retrieve informative video frames. In order to facilitate efficient WCE data collection and browsing task, we present a resource- and bandwidth-aware WCE video summarization framework that extracts the representative keyframes of the WCE video contents by removing redundant and non-informative frames. For redundancy elimination, we use Jeffrey-divergence between color histograms and inter-frame Boolean series-based correlation of color channels. To remove non-informative frames, multi-fractal texture features are extracted to assist the classification using an ensemble-based classifier. Owing to the limited WCE resources, it is impossible for the WCE system to perform computationally intensive video summarization tasks. To resolve computational challenges, mobile-cloud architecture is incorporated, which provides resizable computing capacities by adaptively offloading video summarization tasks between the client and the cloud server. The qualitative and quantitative results are encouraging and show that the proposed framework saves information transmission cost and bandwidth, as well as the valuable time of data analysts in browsing remote sensing data.
An Analysis of Cloud Computing with Amazon Web Services for the Atmospheric Science Data Center

NASA Astrophysics Data System (ADS)

Gleason, J. L.; Little, M. M.

2013-12-01

NASA science and engineering efforts rely heavily on compute and data handling systems. The nature of NASA science data is such that it is not restricted to NASA users, instead it is widely shared across a globally distributed user community including scientists, educators, policy decision makers, and the public. Therefore NASA science computing is a candidate use case for cloud computing where compute resources are outsourced to an external vendor. Amazon Web Services (AWS) is a commercial cloud computing service developed to use excess computing capacity at Amazon, and potentially provides an alternative to costly and potentially underutilized dedicated acquisitions whenever NASA scientists or engineers require additional data processing. AWS desires to provide a simplified avenue for NASA scientists and researchers to share large, complex data sets with external partners and the public. AWS has been extensively used by JPL for a wide range of computing needs and was previously tested on a NASA Agency basis during the Nebula testing program. Its ability to support the Langley Science Directorate needs to be evaluated by integrating it with real world operational needs across NASA and the associated maturity that would come with that. The strengths and weaknesses of this architecture and its ability to support general science and engineering applications has been demonstrated during the previous testing. The Langley Office of the Chief Information Officer in partnership with the Atmospheric Sciences Data Center (ASDC) has established a pilot business interface to utilize AWS cloud computing resources on a organization and project level pay per use model. This poster discusses an effort to evaluate the feasibility of the pilot business interface from a project level perspective by specifically using a processing scenario involving the Clouds and Earth's Radiant Energy System (CERES) project.
Computer-Based Resource Accounting Model for Automobile Technology Impact Assessment

DOT National Transportation Integrated Search

1976-10-01

A computer-implemented resource accounting model has been developed for assessing resource impacts of future automobile technology options. The resources tracked are materials, energy, capital, and labor. The model has been used in support of the Int...
System Resource Allocations | High-Performance Computing | NREL

Science.gov Websites

Allocations System Resource Allocations To use NREL's high-performance computing (HPC) resources : Compute hours on NREL HPC Systems including Peregrine and Eagle Storage space (in Terabytes) on Peregrine , Eagle and Gyrfalcon. Allocations are principally done in response to an annual call for allocation
Computers as learning resources in the health sciences: impact and issues.

PubMed Central

Ellis, L B; Hannigan, G G

1986-01-01

Starting with two computer terminals in 1972, the Health Sciences Learning Resources Center of the University of Minnesota Bio-Medical Library expanded its instructional facilities to ten terminals and thirty-five microcomputers by 1985. Computer use accounted for 28% of total center circulation. The impact of these resources on health sciences curricula is described and issues related to use, support, and planning are raised and discussed. Judged by their acceptance and educational value, computers are successful health sciences learning resources at the University of Minnesota. PMID:3518843
An emulator for minimizing finite element analysis implementation resources

NASA Technical Reports Server (NTRS)

Melosh, R. J.; Utku, S.; Salama, M.; Islam, M.

1982-01-01

A finite element analysis emulator providing a basis for efficiently establishing an optimum computer implementation strategy when many calculations are involved is described. The SCOPE emulator determines computer resources required as a function of the structural model, structural load-deflection equation characteristics, the storage allocation plan, and computer hardware capabilities. Thereby, it provides data for trading analysis implementation options to arrive at a best strategy. The models contained in SCOPE lead to micro-operation computer counts of each finite element operation as well as overall computer resource cost estimates. Application of SCOPE to the Memphis-Arkansas bridge analysis provides measures of the accuracy of resource assessments. Data indicate that predictions are within 17.3 percent for calculation times and within 3.2 percent for peripheral storage resources for the ELAS code.
Dynamic virtual machine allocation policy in cloud computing complying with service level agreement using CloudSim

NASA Astrophysics Data System (ADS)

Aneri, Parikh; Sumathy, S.

2017-11-01

Cloud computing provides services over the internet and provides application resources and data to the users based on their demand. Base of the Cloud Computing is consumer provider model. Cloud provider provides resources which consumer can access using cloud computing model in order to build their application based on their demand. Cloud data center is a bulk of resources on shared pool architecture for cloud user to access. Virtualization is the heart of the Cloud computing model, it provides virtual machine as per application specific configuration and those applications are free to choose their own configuration. On one hand, there is huge number of resources and on other hand it has to serve huge number of requests effectively. Therefore, resource allocation policy and scheduling policy play very important role in allocation and managing resources in this cloud computing model. This paper proposes the load balancing policy using Hungarian algorithm. Hungarian Algorithm provides dynamic load balancing policy with a monitor component. Monitor component helps to increase cloud resource utilization by managing the Hungarian algorithm by monitoring its state and altering its state based on artificial intelligent. CloudSim used in this proposal is an extensible toolkit and it simulates cloud computing environment.
A Fast Approach to Automatic Detection of Brain Lesions

PubMed Central

Koley, Subhranil; Chakraborty, Chandan; Mainero, Caterina; Fischl, Bruce; Aganj, Iman

2017-01-01

Template matching is a popular approach to computer-aided detection of brain lesions from magnetic resonance (MR) images. The outcomes are often sufficient for localizing lesions and assisting clinicians in diagnosis. However, processing large MR volumes with three-dimensional (3D) templates is demanding in terms of computational resources, hence the importance of the reduction of computational complexity of template matching, particularly in situations in which time is crucial (e.g. emergent stroke). In view of this, we make use of 3D Gaussian templates with varying radii and propose a new method to compute the normalized cross-correlation coefficient as a similarity metric between the MR volume and the template to detect brain lesions. Contrary to the conventional fast Fourier transform (FFT) based approach, whose runtime grows as O(N logN) with the number of voxels, the proposed method computes the cross-correlation in O(N). We show through our experiments that the proposed method outperforms the FFT approach in terms of computational time, and retains comparable accuracy. PMID:29082383
OpenCL-based vicinity computation for 3D multiresolution mesh compression

NASA Astrophysics Data System (ADS)

Hachicha, Soumaya; Elkefi, Akram; Ben Amar, Chokri

2017-03-01

3D multiresolution mesh compression systems are still widely addressed in many domains. These systems are more and more requiring volumetric data to be processed in real-time. Therefore, the performance is becoming constrained by material resources usage and an overall reduction in the computational time. In this paper, our contribution entirely lies on computing, in real-time, triangles neighborhood of 3D progressive meshes for a robust compression algorithm based on the scan-based wavelet transform(WT) technique. The originality of this latter algorithm is to compute the WT with minimum memory usage by processing data as they are acquired. However, with large data, this technique is considered poor in term of computational complexity. For that, this work exploits the GPU to accelerate the computation using OpenCL as a heterogeneous programming language. Experiments demonstrate that, aside from the portability across various platforms and the flexibility guaranteed by the OpenCL-based implementation, this method can improve performance gain in speedup factor of 5 compared to the sequential CPU implementation.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.