parallel file system: Topics by Science.gov

Sample records for parallel file system

The Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.
File concepts for parallel I/O

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1989-01-01

The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.
The Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
RAMA: A file system for massively parallel computers

NASA Technical Reports Server (NTRS)

Miller, Ethan L.; Katz, Randy H.

1993-01-01

This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.
Characterizing parallel file-access patterns on a large-scale multiprocessor

NASA Technical Reports Server (NTRS)

Purakayastha, A.; Ellis, Carla; Kotz, David; Nieuwejaar, Nils; Best, Michael L.

1995-01-01

High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill this void by measuring real file-system workloads on various production parallel machines. In particular, we present results from the CM-5 at the National Center for Supercomputing Applications. Our results are unique because we collect information about nearly every individual I/O request from the mix of jobs running on the machine. Analysis of the traces leads to various recommendations for parallel file-system design.
Cooperative storage of shared files in a parallel computing system with dynamic block size

DOEpatents

Bent, John M.; Faibish, Sorin; Grider, Gary

2015-11-10

Improved techniques are provided for parallel writing of data to a shared object in a parallel computing system. A method is provided for storing data generated by a plurality of parallel processes to a shared object in a parallel computing system. The method is performed by at least one of the processes and comprises: dynamically determining a block size for storing the data; exchanging a determined amount of the data with at least one additional process to achieve a block of the data having the dynamically determined block size; and writing the block of the data having the dynamically determined block size to a file system. The determined block size comprises, e.g., a total amount of the data to be stored divided by the number of parallel processes. The file system comprises, for example, a log structured virtual parallel file system, such as a Parallel Log-Structured File System (PLFS).
Performance of the Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.
pcircle - A Suite of Scalable Parallel File System Tools

DOE Office of Scientific and Technical Information (OSTI.GOV)

WANG, FEIYI

2015-10-01

Most of the software related to file system are written for conventional local file system, they are serialized and can't take advantage of the benefit of a large scale parallel file system. "pcircle" software builds on top of ubiquitous MPI in cluster computing environment and "work-stealing" pattern to provide a scalable, high-performance suite of file system tools. In particular - it implemented parallel data copy and parallel data checksumming, with advanced features such as async progress report, checkpoint and restart, as well as integrity checking.
Small file aggregation in a parallel computing system

DOEpatents

Faibish, Sorin; Bent, John M.; Tzelnic, Percy; Grider, Gary; Zhang, Jingwang

2014-09-02

Techniques are provided for small file aggregation in a parallel computing system. An exemplary method for storing a plurality of files generated by a plurality of processes in a parallel computing system comprises aggregating the plurality of files into a single aggregated file; and generating metadata for the single aggregated file. The metadata comprises an offset and a length of each of the plurality of files in the single aggregated file. The metadata can be used to unpack one or more of the files from the single aggregated file.
Methods and apparatus for multi-resolution replication of files in a parallel computing system using semantic information

DOEpatents

Faibish, Sorin; Bent, John M.; Tzelnic, Percy; Grider, Gary; Torres, Aaron

2015-10-20

Techniques are provided for storing files in a parallel computing system using different resolutions. A method is provided for storing at least one file generated by a distributed application in a parallel computing system. The file comprises one or more of a complete file and a sub-file. The method comprises the steps of obtaining semantic information related to the file; generating a plurality of replicas of the file with different resolutions based on the semantic information; and storing the file and the plurality of replicas of the file in one or more storage nodes of the parallel computing system. The different resolutions comprise, for example, a variable number of bits and/or a different sub-set of data elements from the file. A plurality of the sub-files can be merged to reproduce the file.
Storing files in a parallel computing system based on user-specified parser function

DOEpatents

Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Manzanares, Adam; Torres, Aaron

2014-10-21

Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.
Methods and apparatus for capture and storage of semantic information with sub-files in a parallel computing system

DOEpatents

Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Torres, Aaron

2015-02-03

Techniques are provided for storing files in a parallel computing system using sub-files with semantically meaningful boundaries. A method is provided for storing at least one file generated by a distributed application in a parallel computing system. The file comprises one or more of a complete file and a plurality of sub-files. The method comprises the steps of obtaining a user specification of semantic information related to the file; providing the semantic information as a data structure description to a data formatting library write function; and storing the semantic information related to the file with one or more of the sub-files in one or more storage nodes of the parallel computing system. The semantic information provides a description of data in the file. The sub-files can be replicated based on semantically meaningful boundaries.
Cloud object store for checkpoints of high performance computing applications using decoupling middleware

DOEpatents

Bent, John M.; Faibish, Sorin; Grider, Gary

2016-04-19

Cloud object storage is enabled for checkpoints of high performance computing applications using a middleware process. A plurality of files, such as checkpoint files, generated by a plurality of processes in a parallel computing system are stored by obtaining said plurality of files from said parallel computing system; converting said plurality of files to objects using a log structured file system middleware process; and providing said objects for storage in a cloud object storage system. The plurality of processes may run, for example, on a plurality of compute nodes. The log structured file system middleware process may be embodied, for example, as a Parallel Log-Structured File System (PLFS). The log structured file system middleware process optionally executes on a burst buffer node.
Integration experiences and performance studies of A COTS parallel archive systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Hsing-bung; Scott, Cody; Grider, Bary

2010-01-01

Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf(COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and lessmore » robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, ls, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petaflop/s computing system, LANL's Roadrunner, and demonstrated its capability to address requirements of future archival storage systems.« less
Integration experiments and performance studies of a COTS parallel archive system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Hsing-bung; Scott, Cody; Grider, Gary

2010-06-16

Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf (COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching andmore » less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, Is, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petafiop/s computing system, LANL's Roadrunner machine, and demonstrated its capability to address requirements of future archival storage systems.« less
Cloud object store for archive storage of high performance computing data using decoupling middleware

DOEpatents

Bent, John M.; Faibish, Sorin; Grider, Gary

2015-06-30

Cloud object storage is enabled for archived data, such as checkpoints and results, of high performance computing applications using a middleware process. A plurality of archived files, such as checkpoint files and results, generated by a plurality of processes in a parallel computing system are stored by obtaining the plurality of archived files from the parallel computing system; converting the plurality of archived files to objects using a log structured file system middleware process; and providing the objects for storage in a cloud object storage system. The plurality of processes may run, for example, on a plurality of compute nodes. The log structured file system middleware process may be embodied, for example, as a Parallel Log-Structured File System (PLFS). The log structured file system middleware process optionally executes on a burst buffer node.
Checkpoint-Restart in User Space

DOE Office of Scientific and Technical Information (OSTI.GOV)

CRUISE implements a user-space file system that stores data in main memory and transparently spills over to other storage, like local flash memory or the parallel file system, as needed. CRUISE also exposes file contents fo remote direct memory access, allowing external tools to copy files to the parallel file system in the background with reduced CPU interruption.
Storing files in a parallel computing system using list-based index to identify replica files

DOE Office of Scientific and Technical Information (OSTI.GOV)

Faibish, Sorin; Bent, John M.; Tzelnic, Percy

Improved techniques are provided for storing files in a parallel computing system using a list-based index to identify file replicas. A file and at least one replica of the file are stored in one or more storage nodes of the parallel computing system. An index for the file comprises at least one list comprising a pointer to a storage location of the file and a storage location of the at least one replica of the file. The file comprises one or more of a complete file and one or more sub-files. The index may also comprise a checksum value formore » one or more of the file and the replica(s) of the file. The checksum value can be evaluated to validate the file and/or the file replica(s). A query can be processed using the list.« less
Storing files in a parallel computing system based on user or application specification

DOE Office of Scientific and Technical Information (OSTI.GOV)

Faibish, Sorin; Bent, John M.; Nick, Jeffrey M.

2016-03-29

Techniques are provided for storing files in a parallel computing system based on a user-specification. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a specification from the distributed application indicating how the plurality of files should be stored; and storing one or more of the plurality of files in one or more storage nodes of a multi-tier storage system based on the specification. The plurality of files comprise a plurality of complete files and/or a plurality of sub-files. The specification can optionally be processed by a daemon executing on onemore » or more nodes in a multi-tier storage system. The specification indicates how the plurality of files should be stored, for example, identifying one or more storage nodes where the plurality of files should be stored.« less
Tuning HDF5 subfiling performance on parallel file systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Byna, Suren; Chaarawi, Mohamad; Koziol, Quincey

Subfiling is a technique used on parallel file systems to reduce locking and contention issues when multiple compute nodes interact with the same storage target node. Subfiling provides a compromise between the single shared file approach that instigates the lock contention problems on parallel file systems and having one file per process, which results in generating a massive and unmanageable number of files. In this paper, we evaluate and tune the performance of recently implemented subfiling feature in HDF5. In specific, we explain the implementation strategy of subfiling feature in HDF5, provide examples of using the feature, and evaluate andmore » tune parallel I/O performance of this feature with parallel file systems of the Cray XC40 system at NERSC (Cori) that include a burst buffer storage and a Lustre disk-based storage. We also evaluate I/O performance on the Cray XC30 system, Edison, at NERSC. Our results show performance benefits of 1.2X to 6X performance advantage with subfiling compared to writing a single shared HDF5 file. We present our exploration of configurations, such as the number of subfiles and the number of Lustre storage targets to storing files, as optimization parameters to obtain superior I/O performance. Based on this exploration, we discuss recommendations for achieving good I/O performance as well as limitations with using the subfiling feature.« less

Flexibility and Performance of Parallel File Systems

NASA Technical Reports Server (NTRS)

Kotz, David; Nieuwejaar, Nils

1996-01-01

As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port. We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (API's). We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.
Parallel file system with metadata distributed across partitioned key-value store c

DOEpatents

Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron

2017-09-19

Improved techniques are provided for storing metadata associated with a plurality of sub-files associated with a single shared file in a parallel file system. The shared file is generated by a plurality of applications executing on a plurality of compute nodes. A compute node implements a Parallel Log Structured File System (PLFS) library to store at least one portion of the shared file generated by an application executing on the compute node and metadata for the at least one portion of the shared file on one or more object storage servers. The compute node is also configured to implement a partitioned data store for storing a partition of the metadata for the shared file, wherein the partitioned data store communicates with partitioned data stores on other compute nodes using a message passing interface. The partitioned data store can be implemented, for example, using Multidimensional Data Hashing Indexing Middleware (MDHIM).
Dynamic file-access characteristics of a production parallel scientific workload

NASA Technical Reports Server (NTRS)

Kotz, David; Nieuwejaar, Nils

1994-01-01

Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor file systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation differs from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.
File-access characteristics of parallel scientific workloads

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David; Purakayastha, Apratim; Best, Michael; Ellis, Carla Schlatter

1995-01-01

Phenomenal improvements in the computational performance of multiprocessors have not been matched by comparable gains in I/O system performance. This imbalance has resulted in I/O becoming a significant bottleneck for many scientific applications. One key to overcoming this bottleneck is improving the performance of parallel file systems. The design of a high-performance parallel file system requires a comprehensive understanding of the expected workload. Unfortunately, until recently, no general workload studies of parallel file systems have been conducted. The goal of the CHARISMA project was to remedy this problem by characterizing the behavior of several production workloads, on different machines, at the level of individual reads and writes. The first set of results from the CHARISMA project describe the workloads observed on an Intel iPSC/860 and a Thinking Machines CM-5. This paper is intended to compare and contrast these two workloads for an understanding of their essential similarities and differences, isolating common trends and platform-dependent variances. Using this comparison, we are able to gain more insight into the general principles that should guide parallel file-system design.
Characterizing parallel file-access patterns on a large-scale multiprocessor

NASA Technical Reports Server (NTRS)

Purakayastha, Apratim; Ellis, Carla Schlatter; Kotz, David; Nieuwejaar, Nils; Best, Michael

1994-01-01

Rapid increases in the computational speeds of multiprocessors have not been matched by corresponding performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-volume data transfer between the I/O subsystem and thousands of processors. Design of such high-performance parallel file systems depends on a thorough grasp of the expected workload. So far there have been no comprehensive usage studies of multiprocessor file systems. Our CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines. The results of our trace analysis lead to recommendations for parallel file system design. First the file system should support efficient concurrent access to many files, and I/O requests from many jobs under varying load conditions. Second, it must efficiently manage large files kept open for long periods. Third, it should expect to see small requests predominantly sequential access patterns, application-wide synchronous access, no concurrent file-sharing between jobs appreciable byte and block sharing between processes within jobs, and strong interprocess locality. Finally, the trace data suggest that node-level write caches and collective I/O request interfaces may be useful in certain environments.
A Next-Generation Parallel File System Environment for the OLCF

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dillow, David A; Fuller, Douglas; Gunasekaran, Raghul

2012-01-01

When deployed in 2008/2009 the Spider system at the Oak Ridge National Laboratory s Leadership Computing Facility (OLCF) was the world s largest scale Lustre parallel file system. Envisioned as a shared parallel file system capable of delivering both the bandwidth and capacity requirements of the OLCF s diverse computational environment, Spider has since become a blueprint for shared Lustre environments deployed worldwide. Designed to support the parallel I/O requirements of the Jaguar XT5 system and other smallerscale platforms at the OLCF, the upgrade to the Titan XK6 heterogeneous system will begin to push the limits of Spider s originalmore » design by mid 2013. With a doubling in total system memory and a 10x increase in FLOPS, Titan will require both higher bandwidth and larger total capacity. Our goal is to provide a 4x increase in total I/O bandwidth from over 240GB=sec today to 1TB=sec and a doubling in total capacity. While aggregate bandwidth and total capacity remain important capabilities, an equally important goal in our efforts is dramatically increasing metadata performance, currently the Achilles heel of parallel file systems at leadership. We present in this paper an analysis of our current I/O workloads, our operational experiences with the Spider parallel file systems, the high-level design of our Spider upgrade, and our efforts in developing benchmarks that synthesize our performance requirements based on our workload characterization studies.« less
Tuning HDF5 for Lustre File Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Howison, Mark; Koziol, Quincey; Knaak, David

2010-09-24

HDF5 is a cross-platform parallel I/O library that is used by a wide variety of HPC applications for the flexibility of its hierarchical object-database representation of scientific data. We describe our recent work to optimize the performance of the HDF5 and MPI-IO libraries for the Lustre parallel file system. We selected three different HPC applications to represent the diverse range of I/O requirements, and measured their performance on three different systems to demonstrate the robustness of our optimizations across different file system configurations and to validate our optimization strategy. We demonstrate that the combined optimizations improve HDF5 parallel I/O performancemore » by up to 33 times in some cases running close to the achievable peak performance of the underlying file system and demonstrate scalable performance up to 40,960-way concurrency.« less
High-Performance, Multi-Node File Copies and Checksums for Clustered File Systems

NASA Technical Reports Server (NTRS)

Kolano, Paul Z.; Ciotti, Robert B.

2012-01-01

Modern parallel file systems achieve high performance using a variety of techniques, such as striping files across multiple disks to increase aggregate I/O bandwidth and spreading disks across multiple servers to increase aggregate interconnect bandwidth. To achieve peak performance from such systems, it is typically necessary to utilize multiple concurrent readers/writers from multiple systems to overcome various singlesystem limitations, such as number of processors and network bandwidth. The standard cp and md5sum tools of GNU coreutils found on every modern Unix/Linux system, however, utilize a single execution thread on a single CPU core of a single system, and hence cannot take full advantage of the increased performance of clustered file systems. Mcp and msum are drop-in replacements for the standard cp and md5sum programs that utilize multiple types of parallelism and other optimizations to achieve maximum copy and checksum performance on clustered file systems. Multi-threading is used to ensure that nodes are kept as busy as possible. Read/write parallelism allows individual operations of a single copy to be overlapped using asynchronous I/O. Multinode cooperation allows different nodes to take part in the same copy/checksum. Split-file processing allows multiple threads to operate concurrently on the same file. Finally, hash trees allow inherently serial checksums to be performed in parallel. Mcp and msum provide significant performance improvements over standard cp and md5sum using multiple types of parallelism and other optimizations. The total speed-ups from all improvements are significant. Mcp improves cp performance over 27x, msum improves md5sum performance almost 19x, and the combination of mcp and msum improves verified copies via cp and md5sum by almost 22x. These improvements come in the form of drop-in replacements for cp and md5sum, so are easily used and are available for download as open source software at http://mutil.sourceforge.net.
Parallel checksumming of data chunks of a shared data object using a log-structured file system

DOEpatents

Bent, John M.; Faibish, Sorin; Grider, Gary

2016-09-06

Checksum values are generated and used to verify the data integrity. A client executing in a parallel computing system stores a data chunk to a shared data object on a storage node in the parallel computing system. The client determines a checksum value for the data chunk; and provides the checksum value with the data chunk to the storage node that stores the shared object. The data chunk can be stored on the storage node with the corresponding checksum value as part of the shared object. The storage node may be part of a Parallel Log-Structured File System (PLFS), and the client may comprise, for example, a Log-Structured File System client on a compute node or burst buffer. The checksum value can be evaluated when the data chunk is read from the storage node to verify the integrity of the data that is read.
Rethinking key–value store for parallel I/O optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kougkas, Anthony; Eslami, Hassan; Sun, Xian-He

2015-01-26

Key-value stores are being widely used as the storage system for large-scale internet services and cloud storage systems. However, they are rarely used in HPC systems, where parallel file systems are the dominant storage solution. In this study, we examine the architecture differences and performance characteristics of parallel file systems and key-value stores. We propose using key-value stores to optimize overall Input/Output (I/O) performance, especially for workloads that parallel file systems cannot handle well, such as the cases with intense data synchronization or heavy metadata operations. We conducted experiments with several synthetic benchmarks, an I/O benchmark, and a real application.more » We modeled the performance of these two systems using collected data from our experiments, and we provide a predictive method to identify which system offers better I/O performance given a specific workload. The results show that we can optimize the I/O performance in HPC systems by utilizing key-value stores.« less
KNBD: A Remote Kernel Block Server for Linux

NASA Technical Reports Server (NTRS)

Becker, Jeff

1999-01-01

I am developing a prototype of a Linux remote disk block server whose purpose is to serve as a lower level component of a parallel file system. Parallel file systems are an important component of high performance supercomputers and clusters. Although supercomputer vendors such as SGI and IBM have their own custom solutions, there has been a void and hence a demand for such a system on Beowulf-type PC Clusters. Recently, the Parallel Virtual File System (PVFS) project at Clemson University has begun to address this need (1). Although their system provides much of the functionality of (and indeed was inspired by) the equivalent file systems in the commercial supercomputer market, their system is all in user-space. Migrating their 10 services to the kernel could provide a performance boost, by obviating the need for expensive system calls. Thanks to Pavel Machek, the Linux kernel has provided the network block device (2) with kernels 2.1.101 and later. You can configure this block device to redirect reads and writes to a remote machine's disk. This can be used as a building block for constructing a striped file system across several nodes.
Adding Data Management Services to Parallel File Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brandt, Scott

2015-03-04

The objective of this project, called DAMASC for “Data Management in Scientific Computing”, is to coalesce data management with parallel file system management to present a declarative interface to scientists for managing, querying, and analyzing extremely large data sets efficiently and predictably. Managing extremely large data sets is a key challenge of exascale computing. The overhead, energy, and cost of moving massive volumes of data demand designs where computation is close to storage. In current architectures, compute/analysis clusters access data in a physically separate parallel file system and largely leave it scientist to reduce data movement. Over the past decadesmore » the high-end computing community has adopted middleware with multiple layers of abstractions and specialized file formats such as NetCDF-4 and HDF5. These abstractions provide a limited set of high-level data processing functions, but have inherent functionality and performance limitations: middleware that provides access to the highly structured contents of scientific data files stored in the (unstructured) file systems can only optimize to the extent that file system interfaces permit; the highly structured formats of these files often impedes native file system performance optimizations. We are developing Damasc, an enhanced high-performance file system with native rich data management services. Damasc will enable efficient queries and updates over files stored in their native byte-stream format while retaining the inherent performance of file system data storage via declarative queries and updates over views of underlying files. Damasc has four key benefits for the development of data-intensive scientific code: (1) applications can use important data-management services, such as declarative queries, views, and provenance tracking, that are currently available only within database systems; (2) the use of these services becomes easier, as they are provided within a familiar file-based ecosystem; (3) common optimizations, e.g., indexing and caching, are readily supported across several file formats, avoiding effort duplication; and (4) performance improves significantly, as data processing is integrated more tightly with data storage. Our key contributions are: SciHadoop which explores changes to MapReduce assumption by taking advantage of semantics of structured data while preserving MapReduce’s failure and resource management; DataMods which extends common abstractions of parallel file systems so they become programmable such that they can be extended to natively support a variety of data models and can be hooked into emerging distributed runtimes such as Stanford’s Legion; and Miso which combines Hadoop and relational data warehousing to minimize time to insight, taking into account the overhead of ingesting data into data warehousing.« less
Data Partitioning and Load Balancing in Parallel Disk Systems

NASA Technical Reports Server (NTRS)

Scheuermann, Peter; Weikum, Gerhard; Zabback, Peter

1997-01-01

Parallel disk systems provide opportunities for exploiting I/O parallelism in two possible waves, namely via inter-request and intra-request parallelism. In this paper we discuss the main issues in performance tuning of such systems, namely striping and load balancing, and show their relationship to response time and throughput. We outline the main components of an intelligent, self-reliant file system that aims to optimize striping by taking into account the requirements of the applications and performs load balancing by judicious file allocation and dynamic redistributions of the data when access patterns change. Our system uses simple but effective heuristics that incur only little overhead. We present performance experiments based on synthetic workloads and real-life traces.
Nemesis I: Parallel Enhancements to ExodusII

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hennigan, Gary L.; John, Matthew S.; Shadid, John N.

2006-03-28

NEMESIS I is an enhancement to the EXODUS II finite element database model used to store and retrieve data for unstructured parallel finite element analyses. NEMESIS I adds data structures which facilitate the partitioning of a scalar (standard serial) EXODUS II file onto parallel disk systems found on many parallel computers. Since the NEMESIS I application programming interface (APl)can be used to append information to an existing EXODUS II files can be used on files which contain NEMESIS I information. The NEMESIS I information is written and read via C or C++ callable functions which compromise the NEMESIS I API.
Storage of sparse files using parallel log-structured file system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bent, John M.; Faibish, Sorin; Grider, Gary

A sparse file is stored without holes by storing a data portion of the sparse file using a parallel log-structured file system; and generating an index entry for the data portion, the index entry comprising a logical offset, physical offset and length of the data portion. The holes can be restored to the sparse file upon a reading of the sparse file. The data portion can be stored at a logical end of the sparse file. Additional storage efficiency can optionally be achieved by (i) detecting a write pattern for a plurality of the data portions and generating a singlemore » patterned index entry for the plurality of the patterned data portions; and/or (ii) storing the patterned index entries for a plurality of the sparse files in a single directory, wherein each entry in the single directory comprises an identifier of a corresponding sparse file.« less
Lessons Learned in Deploying the World s Largest Scale Lustre File System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dillow, David A; Fuller, Douglas; Wang, Feiyi

2010-01-01

The Spider system at the Oak Ridge National Laboratory's Leadership Computing Facility (OLCF) is the world's largest scale Lustre parallel file system. Envisioned as a shared parallel file system capable of delivering both the bandwidth and capacity requirements of the OLCF's diverse computational environment, the project had a number of ambitious goals. To support the workloads of the OLCF's diverse computational platforms, the aggregate performance and storage capacity of Spider exceed that of our previously deployed systems by a factor of 6x - 240 GB/sec, and 17x - 10 Petabytes, respectively. Furthermore, Spider supports over 26,000 clients concurrently accessing themore » file system, which exceeds our previously deployed systems by nearly 4x. In addition to these scalability challenges, moving to a center-wide shared file system required dramatically improved resiliency and fault-tolerance mechanisms. This paper details our efforts in designing, deploying, and operating Spider. Through a phased approach of research and development, prototyping, deployment, and transition to operations, this work has resulted in a number of insights into large-scale parallel file system architectures, from both the design and the operational perspectives. We present in this paper our solutions to issues such as network congestion, performance baselining and evaluation, file system journaling overheads, and high availability in a system with tens of thousands of components. We also discuss areas of continued challenges, such as stressed metadata performance and the need for file system quality of service alongside with our efforts to address them. Finally, operational aspects of managing a system of this scale are discussed along with real-world data and observations.« less
[PVFS 2000: An operational parallel file system for Beowulf

NASA Technical Reports Server (NTRS)

Ligon, Walt

2004-01-01

The approach has been to develop Parallel Virtual File System version 2 (PVFS2) , retaining the basic philosophy of the original file system but completely rewriting the code. It shows the architecture of the server and client components. BMI - BMI is the network abstraction layer. It is designed with a common driver and modules for each protocol supported. The interface is non-blocking, and provides mechanisms for optimizations including pinning user buffers. Currently TCP/IP and GM(Myrinet) modules have been implemented. Trove -Trove is the storage abstraction layer. It provides for storing both data spaces and name/value pairs. Trove can also be implemented using different underlying storage mechanisms including native files, raw disk partitions, SQL and other databases. The current implementation uses native files for data spaces and Berkeley db for name/value pairs.
The Scalable Checkpoint/Restart Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moody, A.

The Scalable Checkpoint/Restart (SCR) library provides an interface that codes may use to worite our and read in application-level checkpoints in a scalable fashion. In the current implementation, checkpoint files are cached in local storage (hard disk or RAM disk) on the compute nodes. This technique provides scalable aggregate bandwidth and uses storage resources that are fully dedicated to the job. This approach addresses the two common drawbacks of checkpointing a large-scale application to a shared parallel file system, namely, limited bandwidth and file system contention. In fact, on current platforms, SCR scales linearly with the number of compute nodes.more » It has been benchmarked as high as 720GB/s on 1094 nodes of Atlas, which is nearly two orders of magnitude faster thanthe parallel file system.« less
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware

PubMed Central

Zheng, Da; Burns, Randal; Szalay, Alexander S.

2013-01-01

We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads. PMID:24402052
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware.

PubMed

Zheng, Da; Burns, Randal; Szalay, Alexander S

2013-01-01

We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads.

File-System Workload on a Scientific Multiprocessor

NASA Technical Reports Server (NTRS)

Kotz, David; Nieuwejaar, Nils

1995-01-01

Many scientific applications have intense computational and I/O requirements. Although multiprocessors have permitted astounding increases in computational performance, the formidable I/O needs of these applications cannot be met by current multiprocessors a their I/O subsystems. To prevent I/O subsystems from forever bottlenecking multiprocessors and limiting the range of feasible applications, new I/O subsystems must be designed. The successful design of computer systems (both hardware and software) depends on a thorough understanding of their intended use. A system designer optimizes the policies and mechanisms for the cases expected to most common in the user's workload. In the case of multiprocessor file systems, however, designers have been forced to build file systems based only on speculation about how they would be used, extrapolating from file-system characterizations of general-purpose workloads on uniprocessor and distributed systems or scientific workloads on vector supercomputers (see sidebar on related work). To help these system designers, in June 1993 we began the Charisma Project, so named because the project sought to characterize 1/0 in scientific multiprocessor applications from a variety of production parallel computing platforms and sites. The Charisma project is unique in recording individual read and write requests-in live, multiprogramming, parallel workloads (rather than from selected or nonparallel applications). In this article, we present the first results from the project: a characterization of the file-system workload an iPSC/860 multiprocessor running production, parallel scientific applications at NASA's Ames Research Center.
Optimizing Input/Output Using Adaptive File System Policies

NASA Technical Reports Server (NTRS)

Madhyastha, Tara M.; Elford, Christopher L.; Reed, Daniel A.

1996-01-01

Parallel input/output characterization studies and experiments with flexible resource management algorithms indicate that adaptivity is crucial to file system performance. In this paper we propose an automatic technique for selecting and refining file system policies based on application access patterns and execution environment. An automatic classification framework allows the file system to select appropriate caching and pre-fetching policies, while performance sensors provide feedback used to tune policy parameters for specific system environments. To illustrate the potential performance improvements possible using adaptive file system policies, we present results from experiments involving classification-based and performance-based steering.
An Approach Using Parallel Architecture to Storage DICOM Images in Distributed File System

NASA Astrophysics Data System (ADS)

Soares, Tiago S.; Prado, Thiago C.; Dantas, M. A. R.; de Macedo, Douglas D. J.; Bauer, Michael A.

2012-02-01

Telemedicine is a very important area in medical field that is expanding daily motivated by many researchers interested in improving medical applications. In Brazil was started in 2005, in the State of Santa Catarina has a developed server called the CyclopsDCMServer, which the purpose to embrace the HDF for the manipulation of medical images (DICOM) using a distributed file system. Since then, many researches were initiated in order to seek better performance. Our approach for this server represents an additional parallel implementation in I/O operations since HDF version 5 has an essential feature for our work which supports parallel I/O, based upon the MPI paradigm. Early experiments using four parallel nodes, provide good performance when compare to the serial HDF implemented in the CyclopsDCMServer.
A model for optimizing file access patterns using spatio-temporal parallelism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boonthanome, Nouanesengsy; Patchett, John; Geveci, Berk

2013-01-01

For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. In this paper, we introduce a model that can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. Read times ultimately depend on how the file is stored and the access pattern used to read the file. The file access pattern will be dictated by the type of parallel decomposition used. We employ spatio-temporal parallelism, which combines both spatial and temporal parallelism, to provide greater flexibility to possible filemore » access patterns. Using our model, we were able to configure the spatio-temporal parallelism to design optimized read access patterns that resulted in a speedup factor of approximately 400 over traditional file access patterns.« less
Parallel compression of data chunks of a shared data object using a log-structured file system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bent, John M.; Faibish, Sorin; Grider, Gary

2016-10-25

Techniques are provided for parallel compression of data chunks being written to a shared object. A client executing on a compute node or a burst buffer node in a parallel computing system stores a data chunk generated by the parallel computing system to a shared data object on a storage node by compressing the data chunk; and providing the data compressed data chunk to the storage node that stores the shared object. The client and storage node may employ Log-Structured File techniques. The compressed data chunk can be de-compressed by the client when the data chunk is read. A storagemore » node stores a data chunk as part of a shared object by receiving a compressed version of the data chunk from a compute node; and storing the compressed version of the data chunk to the shared data object on the storage node.« less
Collectively loading programs in a multiple program multiple data environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.

Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
Experimental Analysis of File Transfer Rates over Wide-Area Dedicated Connections

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rao, Nageswara S.; Liu, Qiang; Sen, Satyabrata

2016-12-01

File transfers over dedicated connections, supported by large parallel file systems, have become increasingly important in high-performance computing and big data workflows. It remains a challenge to achieve peak rates for such transfers due to the complexities of file I/O, host, and network transport subsystems, and equally importantly, their interactions. We present extensive measurements of disk-to-disk file transfers using Lustre and XFS file systems mounted on multi-core servers over a suite of 10 Gbps emulated connections with 0-366 ms round trip times. Our results indicate that large buffer sizes and many parallel flows do not always guarantee high transfer rates.more » Furthermore, large variations in the measured rates necessitate repeated measurements to ensure confidence in inferences based on them. We propose a new method to efficiently identify the optimal joint file I/O and network transport parameters using a small number of measurements. We show that for XFS and Lustre with direct I/O, this method identifies configurations achieving 97% of the peak transfer rate while probing only 12% of the parameter space.« less
Prefetching in file systems for MIMD multiprocessors

NASA Technical Reports Server (NTRS)

Kotz, David F.; Ellis, Carla Schlatter

1990-01-01

The question of whether prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered. Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that (1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, (2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O (input/output) operation, and (3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study). The authors explore why it is not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in the environment.
Parallel File System I/O Performance Testing On LANL Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wiens, Isaac Christian; Green, Jennifer Kathleen

2016-08-18

These are slides from a presentation on parallel file system I/O performance testing on LANL clusters. I/O is a known bottleneck for HPC applications. Performance optimization of I/O is often required. This summer project entailed integrating IOR under Pavilion and automating the results analysis. The slides cover the following topics: scope of the work, tools utilized, IOR-Pavilion test workflow, build script, IOR parameters, how parameters are passed to IOR, *run_ior: functionality, Python IOR-Output Parser, Splunk data format, Splunk dashboard and features, and future work.
Next Generation Parallelization Systems for Processing and Control of PDS Image Node Assets

NASA Astrophysics Data System (ADS)

Verma, R.

2017-06-01

We present next-generation parallelization tools to help Planetary Data System (PDS) Imaging Node (IMG) better monitor, process, and control changes to nearly 650 million file assets and over a dozen machines on which they are referenced or stored.
''Towards a High-Performance and Robust Implementation of MPI-IO on Top of GPFS''

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prost, J.P.; Tremann, R.; Blackwore, R.

2000-01-11

MPI-IO/GPFS is a prototype implementation of the I/O chapter of the Message Passing Interface (MPI) 2 standard. It uses the IBM General Parallel File System (GPFS), with prototyped extensions, as the underlying file system. this paper describes the features of this prototype which support its high performance and robustness. The use of hints at the file system level and at the MPI-IO level allows tailoring the use of the file system to the application needs. Error handling in collective operations provides robust error reporting and deadlock prevention in case of returning errors.
nem_spread Ver. 5.10

DOE Office of Scientific and Technical Information (OSTI.GOV)

HENNIGAN, GARY; SHADID, JOHN; SJAARDEMA, GREGORY

2009-06-08

Nem_spread reads it's input command file (default name nem_spread.inp), takes the named ExodusII geometry definition and spreads out the geometry (and optionally results) contained in that file out to a parallel disk system. The decomposition is taken from a scalar Nemesis load balance file generated by the companion utility nem_slice.
Life and dynamic capacity modeling for aircraft transmissions

NASA Technical Reports Server (NTRS)

Savage, Michael

1991-01-01

A computer program to simulate the dynamic capacity and life of parallel shaft aircraft transmissions is presented. Five basic configurations can be analyzed: single mesh, compound, parallel, reverted, and single plane reductions. In execution, the program prompts the user for the data file prefix name, takes input from a ASCII file, and writes its output to a second ASCII file with the same prefix name. The input data file includes the transmission configuration, the input shaft torque and speed, and descriptions of the transmission geometry and the component gears and bearings. The program output file describes the transmission, its components, their capabilities, locations, and loads. It also lists the dynamic capability, ninety percent reliability, and mean life of each component and the transmission as a system. Here, the program, its input and output files, and the theory behind the operation of the program are described.
Log-less metadata management on metadata server for parallel file systems.

PubMed

Liao, Jianwei; Xiao, Guoqiang; Peng, Xiaoning

2014-01-01

This paper presents a novel metadata management mechanism on the metadata server (MDS) for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally.
Log-Less Metadata Management on Metadata Server for Parallel File Systems

PubMed Central

Xiao, Guoqiang; Peng, Xiaoning

2014-01-01

This paper presents a novel metadata management mechanism on the metadata server (MDS) for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally. PMID:24892093
Development of the Large-Scale Statistical Analysis System of Satellites Observations Data with Grid Datafarm Architecture

NASA Astrophysics Data System (ADS)

Yamamoto, K.; Murata, K.; Kimura, E.; Honda, R.

2006-12-01

In the Solar-Terrestrial Physics (STP) field, the amount of satellite observation data has been increasing every year. It is necessary to solve the following three problems to achieve large-scale statistical analyses of plenty of data. (i) More CPU power and larger memory and disk size are required. However, total powers of personal computers are not enough to analyze such amount of data. Super-computers provide a high performance CPU and rich memory area, but they are usually separated from the Internet or connected only for the purpose of programming or data file transfer. (ii) Most of the observation data files are managed at distributed data sites over the Internet. Users have to know where the data files are located. (iii) Since no common data format in the STP field is available now, users have to prepare reading program for each data by themselves. To overcome the problems (i) and (ii), we constructed a parallel and distributed data analysis environment based on the Gfarm reference implementation of the Grid Datafarm architecture. The Gfarm shares both computational resources and perform parallel distributed processings. In addition, the Gfarm provides the Gfarm filesystem which can be as virtual directory tree among nodes. The Gfarm environment is composed of three parts; a metadata server to manage distributed files information, filesystem nodes to provide computational resources and a client to throw a job into metadata server and manages data processing schedulings. In the present study, both data files and data processes are parallelized on the Gfarm with 6 file system nodes: CPU clock frequency of each node is Pentium V 1GHz, 256MB memory and40GB disk. To evaluate performances of the present Gfarm system, we scanned plenty of data files, the size of which is about 300MB for each, in three processing methods: sequential processing in one node, sequential processing by each node and parallel processing by each node. As a result, in comparison between the number of files and the elapsed time, parallel and distributed processing shorten the elapsed time to 1/5 than sequential processing. On the other hand, sequential processing times were shortened in another experiment, whose file size is smaller than 100KB. In this case, the elapsed time to scan one file is within one second. It implies that disk swap took place in case of parallel processing by each node. We note that the operation became unstable when the number of the files exceeded 1000. To overcome the problem (iii), we developed an original data class. This class supports our reading of data files with various data formats since it converts them into an original data format since it defines schemata for every type of data and encapsulates the structure of data files. In addition, since this class provides a function of time re-sampling, users can easily convert multiple data (array) with different time resolution into the same time resolution array. Finally, using the Gfarm, we achieved a high performance environment for large-scale statistical data analyses. It should be noted that the present method is effective only when one data file size is large enough. At present, we are restructuring the new Gfarm environment with 8 nodes: CPU is Athlon 64 x2 Dual Core 2GHz, 2GB memory and 1.2TB disk (using RAID0) for each node. Our original class is to be implemented on the new Gfarm environment. In the present talk, we show the latest results with applying the present system for data analyses with huge number of satellite observation data files.
ATLAS software configuration and build tool optimisation

NASA Astrophysics Data System (ADS)

Rybkin, Grigory; Atlas Collaboration

2014-06-01

ATLAS software code base is over 6 million lines organised in about 2000 packages. It makes use of some 100 external software packages, is developed by more than 400 developers and used by more than 2500 physicists from over 200 universities and laboratories in 6 continents. To meet the challenge of configuration and building of this software, the Configuration Management Tool (CMT) is used. CMT expects each package to describe its build targets, build and environment setup parameters, dependencies on other packages in a text file called requirements, and each project (group of packages) to describe its policies and dependencies on other projects in a text project file. Based on the effective set of configuration parameters read from the requirements files of dependent packages and project files, CMT commands build the packages, generate the environment for their use, or query the packages. The main focus was on build time performance that was optimised within several approaches: reduction of the number of reads of requirements files that are now read once per package by a CMT build command that generates cached requirements files for subsequent CMT build commands; introduction of more fine-grained build parallelism at package task level, i.e., dependent applications and libraries are compiled in parallel; code optimisation of CMT commands used for build; introduction of package level build parallelism, i. e., parallelise the build of independent packages. By default, CMT launches NUMBER-OF-PROCESSORS build commands in parallel. The other focus was on CMT commands optimisation in general that made them approximately 2 times faster. CMT can generate a cached requirements file for the environment setup command, which is especially useful for deployment on distributed file systems like AFS or CERN VMFS. The use of parallelism, caching and code optimisation significantly-by several times-reduced software build time, environment setup time, increased the efficiency of multi-core computing resources utilisation, and considerably improved software developer and user experience.
Using the Parallel Computing Toolbox with MATLAB on the Peregrine System |

Science.gov Websites

parallel pool took %g seconds.\\n', toc) % "single program multiple data" spmd fprintf('Worker %d says Hello World!\\n', labindex) end delete(gcp); % close the parallel pool exit To run the script on a compute node, create the file helloWorld.sub: #!/bin/bash #PBS -l walltime=05:00 #PBS -l nodes=1 #PBS -N
Template based parallel checkpointing in a massively parallel computer system

DOEpatents

Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

2009-01-13

A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ilsche, Thomas; Schuchart, Joseph; Cope, Joseph

Event tracing is an important tool for understanding the performance of parallel applications. As concurrency increases in leadership-class computing systems, the quantity of performance log data can overload the parallel file system, perturbing the application being observed. In this work we present a solution for event tracing at leadership scales. We enhance the I/O forwarding system software to aggregate and reorganize log data prior to writing to the storage system, significantly reducing the burden on the underlying file system for this type of traffic. Furthermore, we augment the I/O forwarding system with a write buffering capability to limit the impactmore » of artificial perturbations from log data accesses on traced applications. To validate the approach, we modify the Vampir tracing tool to take advantage of this new capability and show that the approach increases the maximum traced application size by a factor of 5x to more than 200,000 processors.« less

MPI-IO: A Parallel File I/O Interface for MPI Version 0.3

NASA Technical Reports Server (NTRS)

Corbett, Peter; Feitelson, Dror; Hsu, Yarsun; Prost, Jean-Pierre; Snir, Marc; Fineberg, Sam; Nitzberg, Bill; Traversat, Bernard; Wong, Parkson

1995-01-01

Thanks to MPI [9], writing portable message passing parallel programs is almost a reality. One of the remaining problems is file I/0. Although parallel file systems support similar interfaces, the lack of a standard makes developing a truly portable program impossible. Further, the closest thing to a standard, the UNIX file interface, is ill-suited to parallel computing. Working together, IBM Research and NASA Ames have drafted MPI-I0, a proposal to address the portable parallel I/0 problem. In a nutshell, this proposal is based on the idea that I/0 can be modeled as message passing: writing to a file is like sending a message, and reading from a file is like receiving a message. MPI-IO intends to leverage the relatively wide acceptance of the MPI interface in order to create a similar I/0 interface. The above approach can be materialized in different ways. The current proposal represents the result of extensive discussions (and arguments), but is by no means finished. Many changes can be expected as additional participants join the effort to define an interface for portable I/0. This document is organized as follows. The remainder of this section includes a discussion of some issues that have shaped the style of the interface. Section 2 presents an overview of MPI-IO as it is currently defined. It specifies what the interface currently supports and states what would need to be added to the current proposal to make the interface more complete and robust. The next seven sections contain the interface definition itself. Section 3 presents definitions and conventions. Section 4 contains functions for file control, most notably open. Section 5 includes functions for independent I/O, both blocking and nonblocking. Section 6 includes functions for collective I/O, both blocking and nonblocking. Section 7 presents functions to support system-maintained file pointers, and shared file pointers. Section 8 presents constructors that can be used to define useful filetypes (the role of filetypes is explained in Section 2 below). Section 9 presents how the error handling mechanism of MPI is supported by the MPI-IO interface. All this is followed by a set of appendices, which contain information about issues that have not been totally resolved yet, and about design considerations. The reader can find there the motivation behind some of our design choices. More information on this would definitely be welcome and will be included in a further release of this document. The first appendix contains a description of MPI-I0's 'hints' structure which is used when opening a file. Appendix B is a discussion of various issues in the support for file pointers. Appendix C explains what we mean in talking about atomic access. Appendix D provides detailed examples of filetype constructors, and Appendix E contains a collection of arguments for and against various design decisions.
Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns.

PubMed

Pan, Shaoming; Li, Yongkai; Xu, Zhengquan; Chong, Yanwen

2015-01-01

Declustering techniques are widely used in distributed environments to reduce query response time through parallel I/O by splitting large files into several small blocks and then distributing those blocks among multiple storage nodes. Unfortunately, however, many small geospatial image data files cannot be further split for distributed storage. In this paper, we propose a complete theoretical system for the distributed storage of small geospatial image data files based on mining the access patterns of geospatial image data using their historical access log information. First, an algorithm is developed to construct an access correlation matrix based on the analysis of the log information, which reveals the patterns of access to the geospatial image data. Then, a practical heuristic algorithm is developed to determine a reasonable solution based on the access correlation matrix. Finally, a number of comparative experiments are presented, demonstrating that our algorithm displays a higher total parallel access probability than those of other algorithms by approximately 10-15% and that the performance can be further improved by more than 20% by simultaneously applying a copy storage strategy. These experiments show that the algorithm can be applied in distributed environments to help realize parallel I/O and thereby improve system performance.
High performance data transfer

NASA Astrophysics Data System (ADS)

Cottrell, R.; Fang, C.; Hanushevsky, A.; Kreuger, W.; Yang, W.

2017-10-01

The exponentially increasing need for high speed data transfer is driven by big data, and cloud computing together with the needs of data intensive science, High Performance Computing (HPC), defense, the oil and gas industry etc. We report on the Zettar ZX software. This has been developed since 2013 to meet these growing needs by providing high performance data transfer and encryption in a scalable, balanced, easy to deploy and use way while minimizing power and space utilization. In collaboration with several commercial vendors, Proofs of Concept (PoC) consisting of clusters have been put together using off-the- shelf components to test the ZX scalability and ability to balance services using multiple cores, and links. The PoCs are based on SSD flash storage that is managed by a parallel file system. Each cluster occupies 4 rack units. Using the PoCs, between clusters we have achieved almost 200Gbps memory to memory over two 100Gbps links, and 70Gbps parallel file to parallel file with encryption over a 5000 mile 100Gbps link.
Data General Corporation Advanced Operating System/Virtual Storage (AOS/ VS). Revision 7.60

DTIC Science & Technology

1989-02-22

control list for each directory and data file. An access control list includes the users who can and cannot access files as well as the access...and any required data, it can -5- February 22, 1989 Final Evaluation Report Data General AOS/VS SYSTEM OVERVIEW operate asynchronously and in parallel...memory. The IOC can perform the data transfer without further interventiin from the CPU. The I/O channels interface with the processor or system
LVFS: A Scalable Petabye/Exabyte Data Storage System

NASA Astrophysics Data System (ADS)

Golpayegani, N.; Halem, M.; Masuoka, E. J.; Ye, G.; Devine, N. K.

2013-12-01

Managing petabytes of data with hundreds of millions of files is the first step necessary towards an effective big data computing and collaboration environment in a distributed system. We describe here the MODAPS LAADS Virtual File System (LVFS), a new storage architecture which replaces the previous MODAPS operational Level 1 Land Atmosphere Archive Distribution System (LAADS) NFS based approach to storing and distributing datasets from several instruments, such as MODIS, MERIS, and VIIRS. LAADS is responsible for the distribution of over 4 petabytes of data and over 300 million files across more than 500 disks. We present here the first LVFS big data comparative performance results and new capabilities not previously possible with the LAADS system. We consider two aspects in addressing inefficiencies of massive scales of data. First, is dealing in a reliable and resilient manner with the volume and quantity of files in such a dataset, and, second, minimizing the discovery and lookup times for accessing files in such large datasets. There are several popular file systems that successfully deal with the first aspect of the problem. Their solution, in general, is through distribution, replication, and parallelism of the storage architecture. The Hadoop Distributed File System (HDFS), Parallel Virtual File System (PVFS), and Lustre are examples of such file systems that deal with petabyte data volumes. The second aspect deals with data discovery among billions of files, the largest bottleneck in reducing access time. The metadata of a file, generally represented in a directory layout, is stored in ways that are not readily scalable. This is true for HDFS, PVFS, and Lustre as well. Recent experimental file systems, such as Spyglass or Pantheon, have attempted to address this problem through redesign of the metadata directory architecture. LVFS takes a radically different architectural approach by eliminating the need for a separate directory within the file system. The LVFS system replaces the NFS disk mounting approach of LAADS and utilizes the already existing highly optimized metadata database server, which is applicable to most scientific big data intensive compute systems. Thus, LVFS ties the existing storage system with the existing metadata infrastructure system which we believe leads to a scalable exabyte virtual file system. The uniqueness of the implemented design is not limited to LAADS but can be employed with most scientific data processing systems. By utilizing the Filesystem In Userspace (FUSE), a kernel module available in many operating systems, LVFS was able to replace the NFS system while staying POSIX compliant. As a result, the LVFS system becomes scalable to exabyte sizes owing to the use of highly scalable database servers optimized for metadata storage. The flexibility of the LVFS design allows it to organize data on the fly in different ways, such as by region, date, instrument or product without the need for duplication, symbolic links, or any other replication methods. We proposed here a strategic reference architecture that addresses the inefficiencies of scientific petabyte/exabyte file system access through the dynamic integration of the observing system's large metadata file.
Characterizing output bottlenecks in a supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xie, Bing; Chase, Jeffrey; Dillow, David A

2012-01-01

Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic,more » contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.« less
The Spider Center Wide File System; From Concept to Reality

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shipman, Galen M; Dillow, David A; Oral, H Sarp

2009-01-01

The Leadership Computing Facility (LCF) at Oak Ridge National Laboratory (ORNL) has a diverse portfolio of computational resources ranging from a petascale XT4/XT5 simulation system (Jaguar) to numerous other systems supporting development, visualization, and data analytics. In order to support vastly different I/O needs of these systems Spider, a Lustre-based center wide file system was designed and deployed to provide over 240 GB/s of aggregate throughput with over 10 Petabytes of formatted capacity. A multi-stage InfiniBand network, dubbed as Scalable I/O Network (SION), with over 889 GB/s of bisectional bandwidth was deployed as part of Spider to provide connectivity tomore » our simulation, development, visualization, and other platforms. To our knowledge, while writing this paper, Spider is the largest and fastest POSIX-compliant parallel file system in production. This paper will detail the overall architecture of the Spider system, challenges in deploying and initial testings of a file system of this scale, and novel solutions to these challenges which offer key insights into file system design in the future.« less
A Lightweight, High-performance I/O Management Package for Data-intensive Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jun Wang

2007-07-17

File storage systems are playing an increasingly important role in high-performance computing as the performance gap between CPU and disk increases. It could take a long time to develop an entire system from scratch. Solutions will have to be built as extensions to existing systems. If new portable, customized software components are plugged into these systems, better sustained high I/O performance and higher scalability will be achieved, and the development cycle of next-generation of parallel file systems will be shortened. The overall research objective of this ECPI development plan aims to develop a lightweight, customized, high-performance I/O management package namedmore » LightI/O to extend and leverage current parallel file systems used by DOE. During this period, We have developed a novel component in LightI/O and prototype them into PVFS2, and evaluate the resultant prototype—extended PVFS2 system on data-intensive applications. The preliminary results indicate the extended PVFS2 delivers better performance and reliability to users. A strong collaborative effort between the PI at the University of Nebraska Lincoln and the DOE collaborators—Drs Rob Ross and Rajeev Thakur at Argonne National Laboratory who are leading the PVFS2 group makes the project more promising.« less
Distributed File System Utilities to Manage Large DatasetsVersion 0.5

DOE Office of Scientific and Technical Information (OSTI.GOV)

2014-05-21

FileUtils provides a suite of tools to manage large datasets typically created by large parallel MPI applications. They are written in C and use standard POSIX I/Ocalls. The current suite consists of tools to copy, compare, remove, and list. The tools provide dramatic speedup over existing Linux tools, which often run as a single process.
Optimized distributed systems achieve significant performance improvement on sorted merging of massive VCF files.

PubMed

Sun, Xiaobo; Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng; Qin, Zhaohui S

2018-06-01

Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)-based high-performance computing (HPC) implementation, and the popular VCFTools. Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems.
Optimized distributed systems achieve significant performance improvement on sorted merging of massive VCF files

PubMed Central

Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng

2018-01-01

Abstract Background Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. Findings In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)–based high-performance computing (HPC) implementation, and the popular VCFTools. Conclusions Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems. PMID:29762754
Terascale Cluster for Advanced Turbulent Combustion Simulations

DTIC Science & Technology

2008-07-25

the system We have given the name CATS (for Combustion And Turbulence Simulator) to the terascale system that was obtained through this grant. CATS ...lnfiniBand interconnect. CATS includes an interactive login node and a file server, each holding in excess of 1 terabyte of file storage. The 35 active...compute nodes of CATS enable us to run up to 140-core parallel MPI batch jobs; one node is reserved to run the scheduler. CATS is operated and
Developing a Hadoop-based Middleware for Handling Multi-dimensional NetCDF

NASA Astrophysics Data System (ADS)

Li, Z.; Yang, C. P.; Schnase, J. L.; Duffy, D.; Lee, T. J.

2014-12-01

Climate observations and model simulations are collecting and generating vast amounts of climate data, and these data are ever-increasing and being accumulated in a rapid speed. Effectively managing and analyzing these data are essential for climate change studies. Hadoop, a distributed storage and processing framework for large data sets, has attracted increasing attentions in dealing with the Big Data challenge. The maturity of Infrastructure as a Service (IaaS) of cloud computing further accelerates the adoption of Hadoop in solving Big Data problems. However, Hadoop is designed to process unstructured data such as texts, documents and web pages, and cannot effectively handle the scientific data format such as array-based NetCDF files and other binary data format. In this paper, we propose to build a Hadoop-based middleware for transparently handling big NetCDF data by 1) designing a distributed climate data storage mechanism based on POSIX-enabled parallel file system to enable parallel big data processing with MapReduce, as well as support data access by other systems; 2) modifying the Hadoop framework to transparently processing NetCDF data in parallel without sequencing or converting the data into other file formats, or loading them to HDFS; and 3) seamlessly integrating Hadoop, cloud computing and climate data in a highly scalable and fault-tolerance framework.
NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations

NASA Astrophysics Data System (ADS)

Valiev, M.; Bylaska, E. J.; Govind, N.; Kowalski, K.; Straatsma, T. P.; Van Dam, H. J. J.; Wang, D.; Nieplocha, J.; Apra, E.; Windus, T. L.; de Jong, W. A.

2010-09-01

The latest release of NWChem delivers an open-source computational chemistry package with extensive capabilities for large scale simulations of chemical and biological systems. Utilizing a common computational framework, diverse theoretical descriptions can be used to provide the best solution for a given scientific problem. Scalable parallel implementations and modular software design enable efficient utilization of current computational architectures. This paper provides an overview of NWChem focusing primarily on the core theoretical modules provided by the code and their parallel performance. Program summaryProgram title: NWChem Catalogue identifier: AEGI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Open Source Educational Community License No. of lines in distributed program, including test data, etc.: 11 709 543 No. of bytes in distributed program, including test data, etc.: 680 696 106 Distribution format: tar.gz Programming language: Fortran 77, C Computer: all Linux based workstations and parallel supercomputers, Windows and Apple machines Operating system: Linux, OS X, Windows Has the code been vectorised or parallelized?: Code is parallelized Classification: 2.1, 2.2, 3, 7.3, 7.7, 16.1, 16.2, 16.3, 16.10, 16.13 Nature of problem: Large-scale atomistic simulations of chemical and biological systems require efficient and reliable methods for ground and excited solutions of many-electron Hamiltonian, analysis of the potential energy surface, and dynamics. Solution method: Ground and excited solutions of many-electron Hamiltonian are obtained utilizing density-functional theory, many-body perturbation approach, and coupled cluster expansion. These solutions or a combination thereof with classical descriptions are then used to analyze potential energy surface and perform dynamical simulations. Additional comments: Full documentation is provided in the distribution file. This includes an INSTALL file giving details of how to build the package. A set of test runs is provided in the examples directory. The distribution file for this program is over 90 Mbytes and therefore is not delivered directly when download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Running time depends on the size of the chemical system, complexity of the method, number of cpu's and the computational task. It ranges from several seconds for serial DFT energy calculations on a few atoms to several hours for parallel coupled cluster energy calculations on tens of atoms or ab-initio molecular dynamics simulation on hundreds of atoms.
New Parallel computing framework for radiation transport codes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kostin, M.A.; /Michigan State U., NSCL; Mokhov, N.V.

A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility canmore » be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
Designing for Peta-Scale in the LSST Database

NASA Astrophysics Data System (ADS)

Kantor, J.; Axelrod, T.; Becla, J.; Cook, K.; Nikolaev, S.; Gray, J.; Plante, R.; Nieto-Santisteban, M.; Szalay, A.; Thakar, A.

2007-10-01

The Large Synoptic Survey Telescope (LSST), a proposed ground-based 8.4 m telescope with a 10 deg^2 field of view, will generate 15 TB of raw images every observing night. When calibration and processed data are added, the image archive, catalogs, and meta-data will grow 15 PB yr^{-1} on average. The LSST Data Management System (DMS) must capture, process, store, index, replicate, and provide open access to this data. Alerts must be triggered within 30 s of data acquisition. To do this in real-time at these data volumes will require advances in data management, database, and file system techniques. This paper describes the design of the LSST DMS and emphasizes features for peta-scale data. The LSST DMS will employ a combination of distributed database and file systems, with schema, partitioning, and indexing oriented for parallel operations. Image files are stored in a distributed file system with references to, and meta-data from, each file stored in the databases. The schema design supports pipeline processing, rapid ingest, and efficient query. Vertical partitioning reduces disk input/output requirements, horizontal partitioning allows parallel data access using arrays of servers and disks. Indexing is extensive, utilizing both conventional RAM-resident indexes and column-narrow, row-deep tag tables/covering indices that are extracted from tables that contain many more attributes. The DMS Data Access Framework is encapsulated in a middleware framework to provide a uniform service interface to all framework capabilities. This framework will provide the automated work-flow, replication, and data analysis capabilities necessary to make data processing and data quality analysis feasible at this scale.
Parallel, Asynchronous Executive (PAX): System concepts, facilities, and architecture

NASA Technical Reports Server (NTRS)

Jones, W. H.

1983-01-01

The Parallel, Asynchronous Executive (PAX) is a software operating system simulation that allows many computers to work on a single problem at the same time. PAX is currently implemented on a UNIVAC 1100/42 computer system. Independent UNIVAC runstreams are used to simulate independent computers. Data are shared among independent UNIVAC runstreams through shared mass-storage files. PAX has achieved the following: (1) applied several computing processes simultaneously to a single, logically unified problem; (2) resolved most parallel processor conflicts by careful work assignment; (3) resolved by means of worker requests to PAX all conflicts not resolved by work assignment; (4) provided fault isolation and recovery mechanisms to meet the problems of an actual parallel, asynchronous processing machine. Additionally, one real-life problem has been constructed for the PAX environment. This is CASPER, a collection of aerodynamic and structural dynamic problem simulation routines. CASPER is not discussed in this report except to provide examples of parallel-processing techniques.
The Automated Instrumentation and Monitoring System (AIMS) reference manual

NASA Technical Reports Server (NTRS)

Yan, Jerry; Hontalas, Philip; Listgarten, Sherry

1993-01-01

Whether a researcher is designing the 'next parallel programming paradigm,' another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of execution traces can help computer designers and software architects to uncover system behavior and to take advantage of specific application characteristics and hardware features. A software tool kit that facilitates performance evaluation of parallel applications on multiprocessors is described. The Automated Instrumentation and Monitoring System (AIMS) has four major software components: a source code instrumentor which automatically inserts active event recorders into the program's source code before compilation; a run time performance-monitoring library, which collects performance data; a trace file animation and analysis tool kit which reconstructs program execution from the trace file; and a trace post-processor which compensate for data collection overhead. Besides being used as prototype for developing new techniques for instrumenting, monitoring, and visualizing parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware test beds to evaluate their impact on user productivity. Currently, AIMS instrumentors accept FORTRAN and C parallel programs written for Intel's NX operating system on the iPSC family of multi computers. A run-time performance-monitoring library for the iPSC/860 is included in this release. We plan to release monitors for other platforms (such as PVM and TMC's CM-5) in the near future. Performance data collected can be graphically displayed on workstations (e.g. Sun Sparc and SGI) supporting X-Windows (in particular, Xl IR5, Motif 1.1.3).
Distributed Virtual System (DIVIRS) Project

NASA Technical Reports Server (NTRS)

Schorr, Herbert; Neuman, B. Clifford

1993-01-01

As outlined in our continuation proposal 92-ISI-50R (revised) on contract NCC 2-539, we are (1) developing software, including a system manager and a job manager, that will manage available resources and that will enable programmers to program parallel applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; (2) developing communications routines that support the abstractions implemented in item one; (3) continuing the development of file and information systems based on the virtual system model; and (4) incorporating appropriate security measures to allow the mechanisms developed in items 1 through 3 to be used on an open network. The goal throughout our work is to provide a uniform model that can be applied to both parallel and distributed systems. We believe that multiprocessor systems should exist in the context of distributed systems, allowing them to be more easily shared by those that need them. Our work provides the mechanisms through which nodes on multiprocessors are allocated to jobs running within the distributed system and the mechanisms through which files needed by those jobs can be located and accessed.
DIstributed VIRtual System (DIVIRS) project

NASA Technical Reports Server (NTRS)

Schorr, Herbert; Neuman, B. Clifford

1994-01-01

As outlined in our continuation proposal 92-ISI-. OR (revised) on NASA cooperative agreement NCC2-539, we are (1) developing software, including a system manager and a job manager, that will manage available resources and that will enable programmers to develop and execute parallel applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; (2) developing communications routines that support the abstractions implemented in item one; (3) continuing the development of file and information systems based on the Virtual System Model; and (4) incorporating appropriate security measures to allow the mechanisms developed in items 1 through 3 to be used on an open network. The goal throughout our work is to provide a uniform model that can be applied to both parallel and distributed systems. We believe that multiprocessor systems should exist in the context of distributed systems, allowing them to be more easily shared by those that need them. Our work provides the mechanisms through which nodes on multiprocessors are allocated to jobs running within the distributed system and the mechanisms through which files needed by those jobs can be located and accessed.

DIstributed VIRtual System (DIVIRS) project

NASA Technical Reports Server (NTRS)

Schorr, Herbert; Neuman, Clifford B.

1995-01-01

As outlined in our continuation proposal 92-ISI-50R (revised) on NASA cooperative agreement NCC2-539, we are (1) developing software, including a system manager and a job manager, that will manage available resources and that will enable programmers to develop and execute parallel applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; (2) developing communications routines that support the abstractions implemented in item one; (3) continuing the development of file and information systems based on the Virtual System Model; and (4) incorporating appropriate security measures to allow the mechanisms developed in items 1 through 3 to be used on an open network. The goal throughout our work is to provide a uniform model that can be applied to both parallel and distributed systems. We believe that multiprocessor systems should exist in the context of distributed systems, allowing them to be more easily shared by those that need them. Our work provides the mechanisms through which nodes on multiprocessors are allocated to jobs running within the distributed system and the mechanisms through which files needed by those jobs can be located and accessed.
Distributed Virtual System (DIVIRS) project

NASA Technical Reports Server (NTRS)

Schorr, Herbert; Neuman, B. Clifford

1993-01-01

As outlined in the continuation proposal 92-ISI-50R (revised) on NASA cooperative agreement NCC 2-539, the investigators are developing software, including a system manager and a job manager, that will manage available resources and that will enable programmers to develop and execute parallel applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; developing communications routines that support the abstractions implemented; continuing the development of file and information systems based on the Virtual System Model; and incorporating appropriate security measures to allow the mechanisms developed to be used on an open network. The goal throughout the work is to provide a uniform model that can be applied to both parallel and distributed systems. The authors believe that multiprocessor systems should exist in the context of distributed systems, allowing them to be more easily shared by those that need them. The work provides the mechanisms through which nodes on multiprocessors are allocated to jobs running within the distributed system and the mechanisms through which files needed by those jobs can be located and accessed.
Parallel log structured file system collective buffering to achieve a compact representation of scientific and/or dimensional data

DOEpatents

Grider, Gary A.; Poole, Stephen W.

2015-09-01

Collective buffering and data pattern solutions are provided for storage, retrieval, and/or analysis of data in a collective parallel processing environment. For example, a method can be provided for data storage in a collective parallel processing environment. The method comprises receiving data to be written for a plurality of collective processes within a collective parallel processing environment, extracting a data pattern for the data to be written for the plurality of collective processes, generating a representation describing the data pattern, and saving the data and the representation.
Mission Operations Center (MOC) - Precipitation Processing System (PPS) Interface Software System (MPISS)

NASA Technical Reports Server (NTRS)

Ferrara, Jeffrey; Calk, William; Atwell, William; Tsui, Tina

2013-01-01

MPISS is an automatic file transfer system that implements a combination of standard and mission-unique transfer protocols required by the Global Precipitation Measurement Mission (GPM) Precipitation Processing System (PPS) to control the flow of data between the MOC and the PPS. The primary features of MPISS are file transfers (both with and without PPS specific protocols), logging of file transfer and system events to local files and a standard messaging bus, short term storage of data files to facilitate retransmissions, and generation of file transfer accounting reports. The system includes a graphical user interface (GUI) to control the system, allow manual operations, and to display events in real time. The PPS specific protocols are an enhanced version of those that were developed for the Tropical Rainfall Measuring Mission (TRMM). All file transfers between the MOC and the PPS use the SSH File Transfer Protocol (SFTP). For reports and data files generated within the MOC, no additional protocols are used when transferring files to the PPS. For observatory data files, an additional handshaking protocol of data notices and data receipts is used. MPISS generates and sends to the PPS data notices containing data start and stop times along with a checksum for the file for each observatory data file transmitted. MPISS retrieves the PPS generated data receipts that indicate the success or failure of the PPS to ingest the data file and/or notice. MPISS retransmits the appropriate files as indicated in the receipt when required. MPISS also automatically retrieves files from the PPS. The unique feature of this software is the use of both standard and PPS specific protocols in parallel. The advantage of this capability is that it supports users that require the PPS protocol as well as those that do not require it. The system is highly configurable to accommodate the needs of future users.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gibson, Garth

Petascale computing infrastructures for scientific discovery make petascale demands on information storage capacity, performance, concurrency, reliability, availability, and manageability. The Petascale Data Storage Institute focuses on the data storage problems found in petascale scientific computing environments, with special attention to community issues such as interoperability, community buy-in, and shared tools. The Petascale Data Storage Institute is a collaboration between researchers at Carnegie Mellon University, National Energy Research Scientific Computing Center, Pacific Northwest National Laboratory, Oak Ridge National Laboratory, Sandia National Laboratory, Los Alamos National Laboratory, University of Michigan, and the University of California at Santa Cruz. Because the Institute focusesmore » on low level files systems and storage systems, its role in improving SciDAC systems was one of supporting application middleware such as data management and system-level performance tuning. In retrospect, the Petascale Data Storage Institute’s most innovative and impactful contribution is the Parallel Log-structured File System (PLFS). Published in SC09, PLFS is middleware that operates in MPI-IO or embedded in FUSE for non-MPI applications. Its function is to decouple concurrently written files into a per-process log file, whose impact (the contents of the single file that the parallel application was concurrently writing) is determined on later reading, rather than during its writing. PLFS is transparent to the parallel application, offering a POSIX or MPI-IO interface, and it shows an order of magnitude speedup to the Chombo benchmark and two orders of magnitude to the FLASH benchmark. Moreover, LANL production applications see speedups of 5X to 28X, so PLFS has been put into production at LANL. Originally conceived and prototyped in a PDSI collaboration between LANL and CMU, it has grown to engage many other PDSI institutes, international partners like AWE, and has a large team at EMC supporting and enhancing it. PLFS is open sourced with a BSD license on sourceforge. Post PDSI funding comes from NNSA and industry sources. Moreover, PLFS has spin out half a dozen or more papers, partnered on research with multiple schools and vendors, and has projects to transparently 1) dis- tribute metadata over independent metadata servers, 2) exploit drastically non-POSIX Hadoop storage for HPC POSIX applications, 3) compress checkpoints on the fly, 4) batch delayed writes for write speed, 5) compress read-back indexes and parallelize their redistribution, 6) double-buffer writes in NAND Flash storage to decouple host blocking during checkpoint from disk write time in the storage system, 7) pack small files into a smaller number of bigger containers. There are two large scale open source Linux software projects that PDSI significantly incubated, though neither were initated in PDSI. These are 1) Ceph, a UCSC parallel object storage research project that has continued to be a vehicle for research, and has become a released part of Linux, and 2) Parallel NFS (pNFS) a portion of the IETF’s NFSv4.1 that brings the core data parallelism found in Lustre, PanFS, PVFS, and Ceph to the industry standard NFS, with released code in Linux 3.0, and its vendor offerings, with products from NetApp, EMC, BlueArc and RedHat. Both are fundamentally supported and advanced by vendor companies now, but were critcally transferred from research demonstration to viable product with funding from PDSI, in part. At this point Lustre remains the primary path to scalable IO in Exascale systems, but both Ceph and pNFS are viable alternatives with different fundamental advantages. Finally, research community building was a big success for PDSI. Through the HECFSIO workshops and HECURA project with NSF PDSI stimulated and helped to steer leveraged funding of over $25M. Through the Petascale (now Parallel) Data Storage Workshop series, www.pdsw.org, colocated with SCxy each year, PDSI created and incubated five offerings of this high-attendance workshop. The workshop has gone on without PDSI support with two more highly successfully workshops, rewriting its organizational structure to be community managed. More than 70 peer reviewed papers have been presented at PDSW workshops.« less
IOPA: I/O-aware parallelism adaption for parallel programs

PubMed Central

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads. PMID:28278236
IOPA: I/O-aware parallelism adaption for parallel programs.

PubMed

Liu, Tao; Liu, Yi; Qian, Chen; Qian, Depei

2017-01-01

With the development of multi-/many-core processors, applications need to be written as parallel programs to improve execution efficiency. For data-intensive applications that use multiple threads to read/write files simultaneously, an I/O sub-system can easily become a bottleneck when too many of these types of threads exist; on the contrary, too few threads will cause insufficient resource utilization and hurt performance. Therefore, programmers must pay much attention to parallelism control to find the appropriate number of I/O threads for an application. This paper proposes a parallelism control mechanism named IOPA that can adjust the parallelism of applications to adapt to the I/O capability of a system and balance computing resources and I/O bandwidth. The programming interface of IOPA is also provided to programmers to simplify parallel programming. IOPA is evaluated using multiple applications with both solid state and hard disk drives. The results show that the parallel applications using IOPA can achieve higher efficiency than those with a fixed number of threads.
Visualization and Tracking of Parallel CFD Simulations

NASA Technical Reports Server (NTRS)

Vaziri, Arsi; Kremenetsky, Mark

1995-01-01

We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.
Multiple Independent File Parallel I/O with HDF5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, M. C.

2016-07-13

The HDF5 library has supported the I/O requirements of HPC codes at Lawrence Livermore National Labs (LLNL) since the late 90’s. In particular, HDF5 used in the Multiple Independent File (MIF) parallel I/O paradigm has supported LLNL code’s scalable I/O requirements and has recently been gainfully used at scales as large as O(10 6) parallel tasks.
Compiler-Directed File Layout Optimization for Hierarchical Storage Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ding, Wei; Zhang, Yuanrui; Kandemir, Mahmut

File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performancemore » against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.« less
Compiler-Directed File Layout Optimization for Hierarchical Storage Systems

DOE PAGES

Ding, Wei; Zhang, Yuanrui; Kandemir, Mahmut; ...

2013-01-01

File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performancemore » against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.« less
Distributing an executable job load file to compute nodes in a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gooding, Thomas M.

Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
High Performance Input/Output for Parallel Computer Systems

NASA Technical Reports Server (NTRS)

Ligon, W. B.

1996-01-01

The goal of our project is to study the I/O characteristics of parallel applications used in Earth Science data processing systems such as Regional Data Centers (RDCs) or EOSDIS. Our approach is to study the runtime behavior of typical programs and the effect of key parameters of the I/O subsystem both under simulation and with direct experimentation on parallel systems. Our three year activity has focused on two items: developing a test bed that facilitates experimentation with parallel I/O, and studying representative programs from the Earth science data processing application domain. The Parallel Virtual File System (PVFS) has been developed for use on a number of platforms including the Tiger Parallel Architecture Workbench (TPAW) simulator, The Intel Paragon, a cluster of DEC Alpha workstations, and the Beowulf system (at CESDIS). PVFS provides considerable flexibility in configuring I/O in a UNIX- like environment. Access to key performance parameters facilitates experimentation. We have studied several key applications fiom levels 1,2 and 3 of the typical RDC processing scenario including instrument calibration and navigation, image classification, and numerical modeling codes. We have also considered large-scale scientific database codes used to organize image data.
Database Reorganization in Parallel Disk Arrays with I/O Service Stealing

NASA Technical Reports Server (NTRS)

Zabback, Peter; Onyuksel, Ibrahim; Scheuermann, Peter; Weikum, Gerhard

1996-01-01

We present a model for data reorganization in parallel disk systems that is geared towards load balancing in an environment with periodic access patterns. Data reorganization is performed by disk cooling, i.e. migrating files or extents from the hottest disks to the coldest ones. We develop an approximate queueing model for determining the effective arrival rates of cooling requests and discuss its use in assessing the costs versus benefits of cooling.
cljam: a library for handling DNA sequence alignment/map (SAM) with parallel processing.

PubMed

Takeuchi, Toshiki; Yamada, Atsuo; Aoki, Takashi; Nishimura, Kunihiro

2016-01-01

Next-generation sequencing can determine DNA bases and the results of sequence alignments are generally stored in files in the Sequence Alignment/Map (SAM) format and the compressed binary version (BAM) of it. SAMtools is a typical tool for dealing with files in the SAM/BAM format. SAMtools has various functions, including detection of variants, visualization of alignments, indexing, extraction of parts of the data and loci, and conversion of file formats. It is written in C and can execute fast. However, SAMtools requires an additional implementation to be used in parallel with, for example, OpenMP (Open Multi-Processing) libraries. For the accumulation of next-generation sequencing data, a simple parallelization program, which can support cloud and PC cluster environments, is required. We have developed cljam using the Clojure programming language, which simplifies parallel programming, to handle SAM/BAM data. Cljam can run in a Java runtime environment (e.g., Windows, Linux, Mac OS X) with Clojure. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The cljam code is written in Clojure and has fewer lines than other similar tools.
78 FR 6382 - Self-Regulatory Organizations; The NASDAQ Stock Market LLC; Notice of Filing and Immediate...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-01-30

... shares other than 100.\\8\\ Moreover, the concept of listing and trading parallel options products of... terms of the Options Disclosure Document. With regard to the impact of this proposal on system capacity... Authority (``OPRA'') have the necessary systems capacity to handle the potential additional traffic...
78 FR 6391 - Self-Regulatory Organizations; NASDAQ OMX BX, Inc.; Notice of Filing and Immediate Effectiveness...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-01-30

... shares other than 100.\\8\\ Moreover, the concept of listing and trading parallel options products of... system capacity, the Exchange has analyzed its capacity and represents that it and the Options Price Reporting Authority (``OPRA'') have the necessary systems capacity to handle the potential additional...
Measurements of file transfer rates over dedicated long-haul connections

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rao, Nageswara S; Settlemyer, Bradley W; Imam, Neena

2016-01-01

Wide-area file transfers are an integral part of several High-Performance Computing (HPC) scenarios. Dedicated network connections with high capacity, low loss rate and low competing traffic, are increasingly being provisioned over current HPC infrastructures to support such transfers. To gain insights into these file transfers, we collected transfer rate measurements for Lustre and xfs file systems between dedicated multi-core servers over emulated 10 Gbps connections with round trip times (rtt) in 0-366 ms range. Memory transfer throughput over these connections is measured using iperf, and file IO throughput on host systems is measured using xddprof. We consider two file systemmore » configurations: Lustre over IB network and xfs over SSD connected to PCI bus. Files are transferred using xdd across these connections, and the transfer rates are measured, which indicate the need to jointly optimize the connection and host file IO parameters to achieve peak transfer rates. In particular, these measurements indicate that (i) peak file transfer rate is lower than peak connection and host IO throughput, in some cases by as much as 50% or lower, (ii) xdd request sizes that achieve peak throughput for host file IO do not necessarily lead to peak file transfer rates, and (iii) parallelism in host IO and TCP transport does not always improve the file transfer rates.« less
MarFS, a Near-POSIX Interface to Cloud Objects

DOE Office of Scientific and Technical Information (OSTI.GOV)

Inman, Jeffrey Thornton; Vining, William Flynn; Ransom, Garrett Wilson

The engineering forces driving development of “cloud” storage have produced resilient, cost-effective storage systems that can scale to 100s of petabytes, with good parallel access and bandwidth. These features would make a good match for the vast storage needs of High-Performance Computing datacenters, but cloud storage gains some of its capability from its use of HTTP-style Representational State Transfer (REST) semantics, whereas most large datacenters have legacy applications that rely on POSIX file-system semantics. MarFS is an open-source project at Los Alamos National Laboratory that allows us to present cloud-style object-storage as a scalable near-POSIX file system. We have alsomore » developed a new storage architecture to improve bandwidth and scalability beyond what’s available in commodity object stores, while retaining their resilience and economy. Additionally, we present a scheme for scaling the POSIX interface to allow billions of files in a single directory and trillions of files in total.« less
MarFS, a Near-POSIX Interface to Cloud Objects

DOE PAGES

Inman, Jeffrey Thornton; Vining, William Flynn; Ransom, Garrett Wilson; ...

2017-01-01

The engineering forces driving development of “cloud” storage have produced resilient, cost-effective storage systems that can scale to 100s of petabytes, with good parallel access and bandwidth. These features would make a good match for the vast storage needs of High-Performance Computing datacenters, but cloud storage gains some of its capability from its use of HTTP-style Representational State Transfer (REST) semantics, whereas most large datacenters have legacy applications that rely on POSIX file-system semantics. MarFS is an open-source project at Los Alamos National Laboratory that allows us to present cloud-style object-storage as a scalable near-POSIX file system. We have alsomore » developed a new storage architecture to improve bandwidth and scalability beyond what’s available in commodity object stores, while retaining their resilience and economy. Additionally, we present a scheme for scaling the POSIX interface to allow billions of files in a single directory and trillions of files in total.« less

DOVIS 2.0: an efficient and easy to use parallel virtual screening tool based on AutoDock 4.0.

PubMed

Jiang, Xiaohui; Kumar, Kamal; Hu, Xin; Wallqvist, Anders; Reifman, Jaques

2008-09-08

Small-molecule docking is an important tool in studying receptor-ligand interactions and in identifying potential drug candidates. Previously, we developed a software tool (DOVIS) to perform large-scale virtual screening of small molecules in parallel on Linux clusters, using AutoDock 3.05 as the docking engine. DOVIS enables the seamless screening of millions of compounds on high-performance computing platforms. In this paper, we report significant advances in the software implementation of DOVIS 2.0, including enhanced screening capability, improved file system efficiency, and extended usability. To keep DOVIS up-to-date, we upgraded the software's docking engine to the more accurate AutoDock 4.0 code. We developed a new parallelization scheme to improve runtime efficiency and modified the AutoDock code to reduce excessive file operations during large-scale virtual screening jobs. We also implemented an algorithm to output docked ligands in an industry standard format, sd-file format, which can be easily interfaced with other modeling programs. Finally, we constructed a wrapper-script interface to enable automatic rescoring of docked ligands by arbitrarily selected third-party scoring programs. The significance of the new DOVIS 2.0 software compared with the previous version lies in its improved performance and usability. The new version makes the computation highly efficient by automating load balancing, significantly reducing excessive file operations by more than 95%, providing outputs that conform to industry standard sd-file format, and providing a general wrapper-script interface for rescoring of docked ligands. The new DOVIS 2.0 package is freely available to the public under the GNU General Public License.
Processing large remote sensing image data sets on Beowulf clusters

USGS Publications Warehouse

Steinwand, Daniel R.; Maddox, Brian; Beckmann, Tim; Schmidt, Gail

2003-01-01

High-performance computing is often concerned with the speed at which floating- point calculations can be performed. The architectures of many parallel computers and/or their network topologies are based on these investigations. Often, benchmarks resulting from these investigations are compiled with little regard to how a large dataset would move about in these systems. This part of the Beowulf study addresses that concern by looking at specific applications software and system-level modifications. Applications include an implementation of a smoothing filter for time-series data, a parallel implementation of the decision tree algorithm used in the Landcover Characterization project, a parallel Kriging algorithm used to fit point data collected in the field on invasive species to a regular grid, and modifications to the Beowulf project's resampling algorithm to handle larger, higher resolution datasets at a national scale. Systems-level investigations include a feasibility study on Flat Neighborhood Networks and modifications of that concept with Parallel File Systems.
PDB Editor: a user-friendly Java-based Protein Data Bank file editor with a GUI.

PubMed

Lee, Jonas; Kim, Sung Hou

2009-04-01

The Protein Data Bank file format is the format most widely used by protein crystallographers and biologists to disseminate and manipulate protein structures. Despite this, there are few user-friendly software packages available to efficiently edit and extract raw information from PDB files. This limitation often leads to many protein crystallographers wasting significant time manually editing PDB files. PDB Editor, written in Java Swing GUI, allows the user to selectively search, select, extract and edit information in parallel. Furthermore, the program is a stand-alone application written in Java which frees users from the hassles associated with platform/operating system-dependent installation and usage. PDB Editor can be downloaded from http://sourceforge.net/projects/pdbeditorjl/.
Purple L1 Milestone Review Panel GPFS Functionality and Performance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Loewe, W E

2006-12-01

The GPFS deliverable for the Purple system requires the functionality and performance necessary for ASC I/O needs. The functionality includes POSIX and MPIIO compatibility, and multi-TB file capability across the entire machine. The bandwidth performance required is 122.15 GB/s, as necessary for productive and defensive I/O requirements, and the metadata performance requirement is 5,000 file stats per second. To determine success for this deliverable, several tools are employed. For functionality testing of POSIX, 10TB-files, and high-node-count capability, the parallel file system bandwidth performance test IOR is used. IOR is an MPI-coordinated application that can write and then read to amore » single shared file or to an individual file per process and check the data integrity of the file(s). The MPIIO functionality is tested with the MPIIO test suite from the MPICH library. Bandwidth performance is tested using IOR for the required 122.15 GB/s sustained write. All IOR tests are performanced with data checking enabled. Metadata performance is tested after ''aging'' the file system with 80% data block usage and 20% inode usage. The fdtree metadata test is expected to create/remove a large directory/file structure in under 20 minutes time, akin to interactive metadata usage. Multiple (10) instances of ''ls -lR'', each performing over 100K stats, are run concurrently in different large directories to demonstrate 5,000 stats/sec.« less
Recent developments in user-job management with Ganga

NASA Astrophysics Data System (ADS)

Currie, R.; Elmsheuser, J.; Fay, R.; Owen, P. H.; Richards, A.; Slater, M.; Sutcliffe, W.; Williams, M.

2015-12-01

The Ganga project was originally developed for use by LHC experiments and has been used extensively throughout Run1 in both LHCb and ATLAS. This document describes some the most recent developments within the Ganga project. There have been improvements in the handling of large scale computational tasks in the form of a new GangaTasks infrastructure. Improvements in file handling through using a new IGangaFile interface makes handling files largely transparent to the end user. In addition to this the performance and usability of Ganga have both been addressed through the development of a new queues system allows for parallel processing of job related tasks.
On Data Transfers Over Wide-Area Dedicated Connections

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rao, Nageswara S.; Liu, Qiang

Dedicated wide-area network connections are employed in big data and high-performance computing scenarios, since the absence of cross-traffic promises to make it easier to analyze and optimize data transfers over them. However, nonlinear transport dynamics and end-system complexity due to multi-core hosts and distributed file systems make these tasks surprisingly challenging. We present an overview of methods to analyze memory and disk file transfers using extensive measurements over 10 Gbps physical and emulated connections with 0–366 ms round trip times (RTTs). For memory transfers, we derive performance profiles of TCP and UDT throughput as a function of RTT, which showmore » concave regions in contrast to entirely convex regions predicted by previous models. These highly desirable concave regions can be expanded by utilizing large buffers and more parallel flows. We also present Poincar´e maps and Lyapunov exponents of TCP and UDT throughputtraces that indicate complex throughput dynamics. For disk file transfers, we show that throughput can be optimized using a combination of parallel I/O and network threads under direct I/O mode. Our initial throughput measurements of Lustre filesystems mounted over long-haul connections using LNet routers show convex profiles indicative of I/O limits.« less
SDS: A Framework for Scientific Data Services

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dong, Bin; Byna, Surendra; Wu, Kesheng

2013-10-31

Large-scale scientific applications typically write their data to parallel file systems with organizations designed to achieve fast write speeds. Analysis tasks frequently read the data in a pattern that is different from the write pattern, and therefore experience poor I/O performance. In this paper, we introduce a prototype framework for bridging the performance gap between write and read stages of data access from parallel file systems. We call this framework Scientific Data Services, or SDS for short. This initial implementation of SDS focuses on reorganizing previously written files into data layouts that benefit read patterns, and transparently directs read callsmore » to the reorganized data. SDS follows a client-server architecture. The SDS Server manages partial or full replicas of reorganized datasets and serves SDS Clients' requests for data. The current version of the SDS client library supports HDF5 programming interface for reading data. The client library intercepts HDF5 calls and transparently redirects them to the reorganized data. The SDS client library also provides a querying interface for reading part of the data based on user-specified selective criteria. We describe the design and implementation of the SDS client-server architecture, and evaluate the response time of the SDS Server and the performance benefits of SDS.« less
Active Storage with Analytics Capabilities and I/O Runtime System for Petascale Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Choudhary, Alok

Computational scientists must understand results from experimental, observational and computational simulation generated data to gain insights and perform knowledge discovery. As systems approach the petascale range, problems that were unimaginable a few years ago are within reach. With the increasing volume and complexity of data produced by ultra-scale simulations and high-throughput experiments, understanding the science is largely hampered by the lack of comprehensive I/O, storage, acceleration of data manipulation, analysis, and mining tools. Scientists require techniques, tools and infrastructure to facilitate better understanding of their data, in particular the ability to effectively perform complex data analysis, statistical analysis and knowledgemore » discovery. The goal of this work is to enable more effective analysis of scientific datasets through the integration of enhancements in the I/O stack, from active storage support at the file system layer to MPI-IO and high-level I/O library layers. We propose to provide software components to accelerate data analytics, mining, I/O, and knowledge discovery for large-scale scientific applications, thereby increasing productivity of both scientists and the systems. Our approaches include 1) design the interfaces in high-level I/O libraries, such as parallel netCDF, for applications to activate data mining operations at the lower I/O layers; 2) Enhance MPI-IO runtime systems to incorporate the functionality developed as a part of the runtime system design; 3) Develop parallel data mining programs as part of runtime library for server-side file system in PVFS file system; and 4) Prototype an active storage cluster, which will utilize multicore CPUs, GPUs, and FPGAs to carry out the data mining workload.« less
Two-dimensional parallel array technology as a new approach to automated combinatorial solid-phase organic synthesis

PubMed

Brennan; Biddison; Frauendorf; Schwarcz; Keen; Ecker; Davis; Tinder; Swayze

1998-01-01

An automated, 96-well parallel array synthesizer for solid-phase organic synthesis has been designed and constructed. The instrument employs a unique reagent array delivery format, in which each reagent utilized has a dedicated plumbing system. An inert atmosphere is maintained during all phases of a synthesis, and temperature can be controlled via a thermal transfer plate which holds the injection molded reaction block. The reaction plate assembly slides in the X-axis direction, while eight nozzle blocks holding the reagent lines slide in the Y-axis direction, allowing for the extremely rapid delivery of any of 64 reagents to 96 wells. In addition, there are six banks of fixed nozzle blocks, which deliver the same reagent or solvent to eight wells at once, for a total of 72 possible reagents. The instrument is controlled by software which allows the straightforward programming of the synthesis of a larger number of compounds. This is accomplished by supplying a general synthetic procedure in the form of a command file, which calls upon certain reagents to be added to specific wells via lookup in a sequence file. The bottle position, flow rate, and concentration of each reagent is stored in a separate reagent table file. To demonstrate the utility of the parallel array synthesizer, a small combinatorial library of hydroxamic acids was prepared in high throughput mode for biological screening. Approximately 1300 compounds were prepared on a 10 μmole scale (3-5 mg) in a few weeks. The resulting crude compounds were generally >80% pure, and were utilized directly for high throughput screening in antibacterial assays. Several active wells were found, and the activity was verified by solution-phase synthesis of analytically pure material, indicating that the system described herein is an efficient means for the parallel synthesis of compounds for lead discovery. Copyright 1998 John Wiley & Sons, Inc.
Automated quality control in a file-based broadcasting workflow

NASA Astrophysics Data System (ADS)

Zhang, Lina

2014-04-01

Benefit from the development of information and internet technologies, television broadcasting is transforming from inefficient tape-based production and distribution to integrated file-based workflows. However, no matter how many changes have took place, successful broadcasting still depends on the ability to deliver a consistent high quality signal to the audiences. After the transition from tape to file, traditional methods of manual quality control (QC) become inadequate, subjective, and inefficient. Based on China Central Television's full file-based workflow in the new site, this paper introduces an automated quality control test system for accurate detection of hidden troubles in media contents. It discusses the system framework and workflow control when the automated QC is added. It puts forward a QC criterion and brings forth a QC software followed this criterion. It also does some experiments on QC speed by adopting parallel processing and distributed computing. The performance of the test system shows that the adoption of automated QC can make the production effective and efficient, and help the station to achieve a competitive advantage in the media market.
Network issues for large mass storage requirements

NASA Technical Reports Server (NTRS)

Perdue, James

1992-01-01

File Servers and Supercomputing environments need high performance networks to balance the I/O requirements seen in today's demanding computing scenarios. UltraNet is one solution which permits both high aggregate transfer rates and high task-to-task transfer rates as demonstrated in actual tests. UltraNet provides this capability as both a Server-to-Server and Server-to-Client access network giving the supercomputing center the following advantages highest performance Transport Level connections (to 40 MBytes/sec effective rates); matches the throughput of the emerging high performance disk technologies, such as RAID, parallel head transfer devices and software striping; supports standard network and file system applications using SOCKET's based application program interface such as FTP, rcp, rdump, etc.; supports access to the Network File System (NFS) and LARGE aggregate bandwidth for large NFS usage; provides access to a distributed, hierarchical data server capability using DISCOS UniTree product; supports file server solutions available from multiple vendors, including Cray, Convex, Alliant, FPS, IBM, and others.
Data Elevator

DOE Office of Scientific and Technical Information (OSTI.GOV)

BYNA, SUNRENDRA; DONG, BIN; WU, KESHENG

Data Elevator: Efficient Asynchronous Data Movement in Hierarchical Storage Systems Multi-layer storage subsystems, including SSD-based burst buffers and disk-based parallel file systems (PFS), are becoming part of HPC systems. However, software for this storage hierarchy is still in its infancy. Applications may have to explicitly move data among the storage layers. We propose Data Elevator for transparently and efficiently moving data between a burst buffer and a PFS. Users specify the final destination for their data, typically on PFS, Data Elevator intercepts the I/O calls, stages data on burst buffer, and then asynchronously transfers the data to their final destinationmore » in the background. This system allows extensive optimizations, such as overlapping read and write operations, choosing I/O modes, and aligning buffer boundaries. In tests with large-scale scientific applications, Data Elevator is as much as 4.2X faster than Cray DataWarp, the start-of-art software for burst buffer, and 4X faster than directly writing to PFS. The Data Elevator library uses HDF5's Virtual Object Layer (VOL) for intercepting parallel I/O calls that write data to PFS. The intercepted calls are redirected to the Data Elevator, which provides a handle to write the file in a faster and intermediate burst buffer system. Once the application finishes writing the data to the burst buffer, the Data Elevator job uses HDF5 to move the data to final destination in an asynchronous manner. Hence, using the Data Elevator library is currently useful for applications that call HDF5 for writing data files. Also, the Data Elevator depends on the HDF5 VOL functionality.« less
Striped tertiary storage arrays

NASA Technical Reports Server (NTRS)

Drapeau, Ann L.

1993-01-01

Data stripping is a technique for increasing the throughput and reducing the response time of large access to a storage system. In striped magnetic or optical disk arrays, a single file is striped or interleaved across several disks; in a striped tape system, files are interleaved across tape cartridges. Because a striped file can be accessed by several disk drives or tape recorders in parallel, the sustained bandwidth to the file is greater than in non-striped systems, where access to the file are restricted to a single device. It is argued that applying striping to tertiary storage systems will provide needed performance and reliability benefits. The performance benefits of striping for applications using large tertiary storage systems is discussed. It will introduce commonly available tape drives and libraries, and discuss their performance limitations, especially focusing on the long latency of tape accesses. This section will also describe an event-driven tertiary storage array simulator that is being used to understand the best ways of configuring these storage arrays. The reliability problems of magnetic tape devices are discussed, and plans for modeling the overall reliability of striped tertiary storage arrays to identify the amount of error correction required are described. Finally, work being done by other members of the Sequoia group to address latency of accesses, optimizing tertiary storage arrays that perform mostly writes, and compression is discussed.
ParCAT: A Parallel Climate Analysis Toolkit

NASA Astrophysics Data System (ADS)

Haugen, B.; Smith, B.; Steed, C.; Ricciuto, D. M.; Thornton, P. E.; Shipman, G.

2012-12-01

Climate science has employed increasingly complex models and simulations to analyze the past and predict the future of our climate. The size and dimensionality of climate simulation data has been growing with the complexity of the models. This growth in data is creating a widening gap between the data being produced and the tools necessary to analyze large, high dimensional data sets. With single run data sets increasing into 10's, 100's and even 1000's of gigabytes, parallel computing tools are becoming a necessity in order to analyze and compare climate simulation data. The Parallel Climate Analysis Toolkit (ParCAT) provides basic tools that efficiently use parallel computing techniques to narrow the gap between data set size and analysis tools. ParCAT was created as a collaborative effort between climate scientists and computer scientists in order to provide efficient parallel implementations of the computing tools that are of use to climate scientists. Some of the basic functionalities included in the toolkit are the ability to compute spatio-temporal means and variances, differences between two runs and histograms of the values in a data set. ParCAT is designed to facilitate the "heavy lifting" that is required for large, multidimensional data sets. The toolkit does not focus on performing the final visualizations and presentation of results but rather, reducing large data sets to smaller, more manageable summaries. The output from ParCAT is provided in commonly used file formats (NetCDF, CSV, ASCII) to allow for simple integration with other tools. The toolkit is currently implemented as a command line utility, but will likely also provide a C library for developers interested in tighter software integration. Elements of the toolkit are already being incorporated into projects such as UV-CDAT and CMDX. There is also an effort underway to implement portions of the CCSM Land Model Diagnostics package using ParCAT in conjunction with Python and gnuplot. ParCAT is implemented in C to provide efficient file IO. The file IO operations in the toolkit use the parallel-netcdf library; this enables the code to use the parallel IO capabilities of modern HPC systems. Analysis that currently requires an estimated 12+ hours with the traditional CCSM Land Model Diagnostics Package can now be performed in as little as 30 minutes on a single desktop workstation and a few minutes for relatively small jobs completed on modern HPC systems such as ORNL's Jaguar.
Oak Ridge Leadership Computing Facility Position Paper

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oral, H Sarp; Hill, Jason J; Thach, Kevin G

This paper discusses the business, administration, reliability, and usability aspects of storage systems at the Oak Ridge Leadership Computing Facility (OLCF). The OLCF has developed key competencies in architecting and administration of large-scale Lustre deployments as well as HPSS archival systems. Additionally as these systems are architected, deployed, and expanded over time reliability and availability factors are a primary driver. This paper focuses on the implementation of the Spider parallel Lustre file system as well as the implementation of the HPSS archive at the OLCF.
SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Palamuttam, R. S.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; Verma, R.; Waliser, D. E.; Lee, H.

2015-12-01

Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark under a NASA AIST grant (PI Mattmann). Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based ApacheTM Hadoop by 100x in memory and by 10x on disk. SciSpark will enable scalable model evaluation by executing large-scale comparisons of A-Train satellite observations to model grids on a cluster of 10 to 1000 compute nodes. This 2nd generation capability for NASA's Regional Climate Model Evaluation System (RCMES) will compute simple climate metrics at interactive speeds, and extend to quite sophisticated iterative algorithms such as machine-learning based clustering of temperature PDFs, and even graph-based algorithms for searching for Mesocale Convective Complexes. We have implemented a parallel data ingest capability in which the user specifies desired variables (arrays) as several time-sorted lists of URL's (i.e. using OPeNDAP model.nc?varname, or local files). The specified variables are partitioned by time/space and then each Spark node pulls its bundle of arrays into memory to begin a computation pipeline. We also investigated the performance of several N-dim. array libraries (scala breeze, java jblas & netlib-java, and ND4J). We are currently developing science codes using ND4J and studying memory behavior on the JVM. On the pyspark side, many of our science codes already use the numpy and SciPy ecosystems. The talk will cover: the architecture of SciSpark, the design of the scientific RDD (sRDD) data structure, our efforts to integrate climate science algorithms in Python and Scala, parallel ingest and partitioning of A-Train satellite observations from HDF files and model grids from netCDF files, first parallel runs to compute comparison statistics and PDF's, and first metrics quantifying parallel speedups and memory & disk usage.
Software Techniques for Balancing Computation & Communication in Parallel Systems

DTIC Science & Technology

1994-07-01

boer of Tasks: 15 PE Loand Yaltanc: 0.0000 K ] PE Loed Ya tance: 0.0000 Into-Tas Com: LInter-Task Com: 116 Ntwok traffic: ±16 PE LAYMT 1, Networkc...confusion. Because past versions for all files were saved and documented within SCCS, software developers were able to roll back to various combinations of
Current Research into Chemical and Textual Information Retrieval at the Department of Information Studies, University of Sheffield.

ERIC Educational Resources Information Center

Lynch, Michael F.; Willett, Peter

1987-01-01

Discusses research into chemical information and document retrieval systems at the University of Sheffield. Highlights include the use of cluster analysis methods for document retrieval and drug design, representation and searching of files of generic chemical structures, and the application of parallel computer hardware to information retrieval.…
GWM-VI: groundwater management with parallel processing for multiple MODFLOW versions

USGS Publications Warehouse

Banta, Edward R.; Ahlfeld, David P.

2013-01-01

Groundwater Management–Version Independent (GWM–VI) is a new version of the Groundwater Management Process of MODFLOW. The Groundwater Management Process couples groundwater-flow simulation with a capability to optimize stresses on the simulated aquifer based on an objective function and constraints imposed on stresses and aquifer state. GWM–VI extends prior versions of Groundwater Management in two significant ways—(1) it can be used with any version of MODFLOW that meets certain requirements on input and output, and (2) it is structured to allow parallel processing of the repeated runs of the MODFLOW model that are required to solve the optimization problem. GWM–VI uses the same input structure for files that describe the management problem as that used by prior versions of Groundwater Management. GWM–VI requires only minor changes to the input files used by the MODFLOW model. GWM–VI uses the Joint Universal Parameter IdenTification and Evaluation of Reliability Application Programming Interface (JUPITER-API) to implement both version independence and parallel processing. GWM–VI communicates with the MODFLOW model by manipulating certain input files and interpreting results from the MODFLOW listing file and binary output files. Nearly all capabilities of prior versions of Groundwater Management are available in GWM–VI. GWM–VI has been tested with MODFLOW-2005, MODFLOW-NWT (a Newton formulation for MODFLOW-2005), MF2005-FMP2 (the Farm Process for MODFLOW-2005), SEAWAT, and CFP (Conduit Flow Process for MODFLOW-2005). This report provides sample problems that demonstrate a range of applications of GWM–VI and the directory structure and input information required to use the parallel-processing capability.
17 CFR 12.24 - Parallel proceedings.

Code of Federal Regulations, 2010 CFR

2010-04-01

...) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration proceeding... the receivership includes the resolution of claims made by customers; or (3) A petition filed under... any of the foregoing with knowledge of a parallel proceeding shall promptly notify the Commission, by...

Use of Parallel Micro-Platform for the Simulation the Space Exploration

NASA Astrophysics Data System (ADS)

Velasco Herrera, Victor Manuel; Velasco Herrera, Graciela; Rosano, Felipe Lara; Rodriguez Lozano, Salvador; Lucero Roldan Serrato, Karen

The purpose of this work is to create a parallel micro-platform, that simulates the virtual movements of a space exploration in 3D. One of the innovations presented in this design consists of the application of a lever mechanism for the transmission of the movement. The development of such a robot is a challenging task very different of the industrial manipulators due to a totally different target system of requirements. This work presents the study and simulation, aided by computer, of the movement of this parallel manipulator. The development of this model has been developed using the platform of computer aided design Unigraphics, in which it was done the geometric modeled of each one of the components and end assembly (CAD), the generation of files for the computer aided manufacture (CAM) of each one of the pieces and the kinematics simulation of the system evaluating different driving schemes. We used the toolbox (MATLAB) of aerospace and create an adaptive control module to simulate the system.
Dynamic Non-Hierarchical File Systems for Exascale Storage

DOE Office of Scientific and Technical Information (OSTI.GOV)

Long, Darrell E.; Miller, Ethan L

This constitutes the final report for “Dynamic Non-Hierarchical File Systems for Exascale Storage”. The ultimate goal of this project was to improve data management in scientific computing and high-end computing (HEC) applications, and to achieve this goal we proposed: to develop the first, HEC-targeted, file system featuring rich metadata and provenance collection, extreme scalability, and future storage hardware integration as core design goals, and to evaluate and develop a flexible non-hierarchical file system interface suitable for providing more powerful and intuitive data management interfaces to HEC and scientific computing users. Data management is swiftly becoming a serious problem in themore » scientific community – while copious amounts of data are good for obtaining results, finding the right data is often daunting and sometimes impossible. Scientists participating in a Department of Energy workshop noted that most of their time was spent “...finding, processing, organizing, and moving data and it’s going to get much worse”. Scientists should not be forced to become data mining experts in order to retrieve the data they want, nor should they be expected to remember the naming convention they used several years ago for a set of experiments they now wish to revisit. Ideally, locating the data you need would be as easy as browsing the web. Unfortunately, existing data management approaches are usually based on hierarchical naming, a 40 year-old technology designed to manage thousands of files, not exabytes of data. Today’s systems do not take advantage of the rich array of metadata that current high-end computing (HEC) file systems can gather, including content-based metadata and provenance1 information. As a result, current metadata search approaches are typically ad hoc and often work by providing a parallel management system to the “main” file system, as is done in Linux (the locate utility), personal computers, and enterprise search appliances. These search applications are often optimized for a single file system, making it difficult to move files and their metadata between file systems. Users have tried to solve this problem in several ways, including the use of separate databases to index file properties, the encoding of file properties into file names, and separately gathering and managing provenance data, but none of these approaches has worked well, either due to limited usefulness or scalability, or both. Our research addressed several key issues: High-performance, real-time metadata harvesting: extracting important attributes from files dynamically and immediately updating indexes used to improve search; Transparent, automatic, and secure provenance capture: recording the data inputs and processing steps used in the production of each file in the system; Scalable indexing: indexes that are optimized for integration with the file system; Dynamic file system structure: our approach provides dynamic directories similar to those in semantic file systems, but these are the native organization rather than a feature grafted onto a conventional system. In addition to these goals, our research effort will include evaluating the impact of new storage technologies on the file system design and performance. In particular, the indexing and metadata harvesting functions can potentially benefit from the performance improvements promised by new storage class memories.« less
An Optimizing Compiler for Petascale I/O on Leadership Class Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Choudhary, Alok; Kandemir, Mahmut

In high-performance computing systems, parallel I/O architectures usually have very complex hierarchies with multiple layers that collectively constitute an I/O stack, including high-level I/O libraries such as PnetCDF and HDF5, I/O middleware such as MPI-IO, and parallel file systems such as PVFS and Lustre. Our project explored automated instrumentation and compiler support for I/O intensive applications. Our project made significant progress towards understanding the complex I/O hierarchies of high-performance storage systems (including storage caches, HDDs, and SSDs), and designing and implementing state-of-the-art compiler/runtime system technology that targets I/O intensive HPC applications that target leadership class machine. This final report summarizesmore » the major achievements of the project and also points out promising future directions.« less
Parallel object-oriented decision tree system

DOEpatents

Kamath,; Chandrika, Cantu-Paz [Dublin, CA; Erick, [Oakland, CA

2006-02-28

A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that reads the data, sorts the data if necessary, determines the best manner to split the data into subsets according to some criterion, and splits the data.
Toward an automated parallel computing environment for geosciences

NASA Astrophysics Data System (ADS)

Zhang, Huai; Liu, Mian; Shi, Yaolin; Yuen, David A.; Yan, Zhenzhen; Liang, Guoping

2007-08-01

Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.
Database Objects vs Files: Evaluation of alternative strategies for managing large remote sensing data

NASA Astrophysics Data System (ADS)

Baru, Chaitan; Nandigam, Viswanath; Krishnan, Sriram

2010-05-01

Increasingly, the geoscience user community expects modern IT capabilities to be available in service of their research and education activities, including the ability to easily access and process large remote sensing datasets via online portals such as GEON (www.geongrid.org) and OpenTopography (opentopography.org). However, serving such datasets via online data portals presents a number of challenges. In this talk, we will evaluate the pros and cons of alternative storage strategies for management and processing of such datasets using binary large object implementations (BLOBs) in database systems versus implementation in Hadoop files using the Hadoop Distributed File System (HDFS). The storage and I/O requirements for providing online access to large datasets dictate the need for declustering data across multiple disks, for capacity as well as bandwidth and response time performance. This requires partitioning larger files into a set of smaller files, and is accompanied by the concomitant requirement for managing large numbers of file. Storing these sub-files as blobs in a shared-nothing database implemented across a cluster provides the advantage that all the distributed storage management is done by the DBMS. Furthermore, subsetting and processing routines can be implemented as user-defined functions (UDFs) on these blobs and would run in parallel across the set of nodes in the cluster. On the other hand, there are both storage overheads and constraints, and software licensing dependencies created by such an implementation. Another approach is to store the files in an external filesystem with pointers to them from within database tables. The filesystem may be a regular UNIX filesystem, a parallel filesystem, or HDFS. In the HDFS case, HDFS would provide the file management capability, while the subsetting and processing routines would be implemented as Hadoop programs using the MapReduce model. Hadoop and its related software libraries are freely available. Another consideration is the strategy used for partitioning large data collections, and large datasets within collections, using round-robin vs hash partitioning vs range partitioning methods. Each has different characteristics in terms of spatial locality of data and resultant degree of declustering of the computations on the data. Furthermore, we have observed that, in practice, there can be large variations in the frequency of access to different parts of a large data collection and/or dataset, thereby creating "hotspots" in the data. We will evaluate the ability of different approaches for dealing effectively with such hotspots and alternative strategies for dealing with hotspots.
Decryption-decompression of AES protected ZIP files on GPUs

NASA Astrophysics Data System (ADS)

Duong, Tan Nhat; Pham, Phong Hong; Nguyen, Duc Huu; Nguyen, Thuy Thanh; Le, Hung Duc

2011-10-01

AES is a strong encryption system, so decryption-decompression of AES encrypted ZIP files requires very large computing power and techniques of reducing the password space. This makes implementations of techniques on common computing system not practical. In [1], we reduced the original very large password search space to a much smaller one which surely containing the correct password. Based on reduced set of passwords, in this paper, we parallel decryption, decompression and plain text recognition for encrypted ZIP files by using CUDA computing technology on graphics cards GeForce GTX295 of NVIDIA, to find out the correct password. The experimental results have shown that the speed of decrypting, decompressing, recognizing plain text and finding out the original password increases about from 45 to 180 times (depends on the number of GPUs) compared to sequential execution on the Intel Core 2 Quad Q8400 2.66 GHz. These results have demonstrated the potential applicability of GPUs in this cryptanalysis field.
PHAST--a program for simulating ground-water flow, solute transport, and multicomponent geochemical reactions

USGS Publications Warehouse

Parkhurst, David L.; Kipp, Kenneth L.; Engesgaard, Peter; Charlton, Scott R.

2004-01-01

The computer program PHAST simulates multi-component, reactive solute transport in three-dimensional saturated ground-water flow systems. PHAST is a versatile ground-water flow and solute-transport simulator with capabilities to model a wide range of equilibrium and kinetic geochemical reactions. The flow and transport calculations are based on a modified version of HST3D that is restricted to constant fluid density and constant temperature. The geochemical reactions are simulated with the geochemical model PHREEQC, which is embedded in PHAST. PHAST is applicable to the study of natural and contaminated ground-water systems at a variety of scales ranging from laboratory experiments to local and regional field scales. PHAST can be used in studies of migration of nutrients, inorganic and organic contaminants, and radionuclides; in projects such as aquifer storage and recovery or engineered remediation; and in investigations of the natural rock-water interactions in aquifers. PHAST is not appropriate for unsaturated-zone flow, multiphase flow, density-dependent flow, or waters with high ionic strengths. A variety of boundary conditions are available in PHAST to simulate flow and transport, including specified-head, flux, and leaky conditions, as well as the special cases of rivers and wells. Chemical reactions in PHAST include (1) homogeneous equilibria using an ion-association thermodynamic model; (2) heterogeneous equilibria between the aqueous solution and minerals, gases, surface complexation sites, ion exchange sites, and solid solutions; and (3) kinetic reactions with rates that are a function of solution composition. The aqueous model (elements, chemical reactions, and equilibrium constants), minerals, gases, exchangers, surfaces, and rate expressions may be defined or modified by the user. A number of options are available to save results of simulations to output files. The data may be saved in three formats: a format suitable for viewing with a text editor; a format suitable for exporting to spreadsheets and post-processing programs; or in Hierarchical Data Format (HDF), which is a compressed binary format. Data in the HDF file can be visualized on Windows computers with the program Model Viewer and extracted with the utility program PHASTHDF; both programs are distributed with PHAST. Operator splitting of the flow, transport, and geochemical equations is used to separate the three processes into three sequential calculations. No iterations between transport and reaction calculations are implemented. A three-dimensional Cartesian coordinate system and finite-difference techniques are used for the spatial and temporal discretization of the flow and transport equations. The non-linear chemical equilibrium equations are solved by a Newton-Raphson method, and the kinetic reaction equations are solved by a Runge-Kutta or an implicit method for integrating ordinary differential equations. The PHAST simulator may require large amounts of memory and long Central Processing Unit (CPU) times. To reduce the long CPU times, a parallel version of PHAST has been developed that runs on a multiprocessor computer or on a collection of computers that are networked. The parallel version requires Message Passing Interface, which is currently (2004) freely available. The parallel version is effective in reducing simulation times. This report documents the use of the PHAST simulator, including running the simulator, preparing the input files, selecting the output files, and visualizing the results. It also presents four examples that verify the numerical method and demonstrate the capabilities of the simulator. PHAST requires three input files. Only the flow and transport file is described in detail in this report. The other two files, the chemistry data file and the database file, are identical to PHREEQC files and the detailed description of these files is found in the PHREEQC documentation.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Harms, Kevin; Oral, H. Sarp; Atchley, Scott

The Oak Ridge and Argonne Leadership Computing Facilities are both receiving new systems under the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) program. Because they are both part of the INCITE program, applications need to be portable between these two facilities. However, the Summit and Aurora systems will be vastly different architectures, including their I/O subsystems. While both systems will have POSIX-compliant parallel file systems, their Burst Buffer technologies will be different. This difference may pose challenges to application portability between facilities. Application developers need to pay attention to specific burst buffer implementations to maximize code portability.
75 FR 40815 - PJM Interconnection, L.L.C.; Notice of Filing

Federal Register 2010, 2011, 2012, 2013, 2014

2010-07-14

... Interconnection, L.L.C.; Notice of Filing July 7, 2010. Take notice that on July 1, 2010, PJM Interconnection, L.L.C. (PJM) filed revised sheets to Schedule 1 of the Amended and Restated Operating Agreement of PJM Interconnection, L.L.C. (Operating Agreement) and the parallel provisions of Attachment K--Appendix of the PJM...
Parallel object-oriented data mining system

DOEpatents

Kamath, Chandrika; Cantu-Paz, Erick

2004-01-06

A data mining system uncovers patterns, associations, anomalies and other statistically significant structures in data. Data files are read and displayed. Objects in the data files are identified. Relevant features for the objects are extracted. Patterns among the objects are recognized based upon the features. Data from the Faint Images of the Radio Sky at Twenty Centimeters (FIRST) sky survey was used to search for bent doubles. This test was conducted on data from the Very Large Array in New Mexico which seeks to locate a special type of quasar (radio-emitting stellar object) called bent doubles. The FIRST survey has generated more than 32,000 images of the sky to date. Each image is 7.1 megabytes, yielding more than 100 gigabytes of image data in the entire data set.
Experiments and Analyses of Data Transfers Over Wide-Area Dedicated Connections

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rao, Nageswara S.; Liu, Qiang; Sen, Satyabrata

Dedicated wide-area network connections are increasingly employed in high-performance computing and big data scenarios. One might expect the performance and dynamics of data transfers over such connections to be easy to analyze due to the lack of competing traffic. However, non-linear transport dynamics and end-system complexities (e.g., multi-core hosts and distributed filesystems) can in fact make analysis surprisingly challenging. We present extensive measurements of memory-to-memory and disk-to-disk file transfers over 10 Gbps physical and emulated connections with 0–366 ms round trip times (RTTs). For memory-to-memory transfers, profiles of both TCP and UDT throughput as a function of RTT show concavemore » and convex regions; large buffer sizes and more parallel flows lead to wider concave regions, which are highly desirable. TCP and UDT both also display complex throughput dynamics, as indicated by their Poincare maps and Lyapunov exponents. For disk-to-disk transfers, we determine that high throughput can be achieved via a combination of parallel I/O threads, parallel network threads, and direct I/O mode. Our measurements also show that Lustre filesystems can be mounted over long-haul connections using LNet routers, although challenges remain in jointly optimizing file I/O and transport method parameters to achieve peak throughput.« less
Operating CFDP in the Interplanetary Internet

NASA Technical Reports Server (NTRS)

Burleigh, S.

2002-01-01

This paper examines the design elements of CCSDS File Delivery Protocol and Interplanetary Internet technologies that will simplify their integration and discusses the resulting new capabilities, such as efficient transmission of large files via multiple relay satellites operating in parallel.
The ATLAS EventIndex: architecture, design choices, deployment and first operation experience

NASA Astrophysics Data System (ADS)

Barberis, D.; Cárdenas Zárate, S. E.; Cranshaw, J.; Favareto, A.; Fernández Casaní, Á.; Gallas, E. J.; Glasman, C.; González de la Hoz, S.; Hřivnáč, J.; Malon, D.; Prokoshin, F.; Salt Cairols, J.; Sánchez, J.; Többicke, R.; Yuan, R.

2015-12-01

The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on some production sites, and technical checks of the completion and consistency of processing campaigns. The system design is highly modular so that its components (data collection system, storage system based on Hadoop, query web service and interfaces to other ATLAS systems) could be developed separately and in parallel during LSI. The EventIndex is in operation for the start of LHC Run 2. This paper describes the high-level system architecture, the technical design choices and the deployment process and issues. The performance of the data collection and storage systems, as well as the query services, are also reported.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gooding, Thomas M.

Distributing an executable job load file to compute nodes in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: determining, by a compute node in the parallel computer, whether the compute node is participating in a job; determining, by the compute node in the parallel computer, whether a descendant compute node is participating in the job; responsive to determining that the compute node is participating in the job or that the descendant compute node is participating in the job, communicating, by the compute node to a parent compute node, an identification of a data communications linkmore » over which the compute node receives data from the parent compute node; constructing a class route for the job, wherein the class route identifies all compute nodes participating in the job; and broadcasting the executable load file for the job along the class route for the job.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

SmartImport.py is a Python source-code file that implements a replacement for the standard Python module importer. The code is derived from knee.py, a file in the standard Python diestribution , and adds functionality to improve the performance of Python module imports in massively parallel contexts.
Bamgineer: Introduction of simulated allele-specific copy number variants into exome and targeted sequence data sets.

PubMed

Samadian, Soroush; Bruce, Jeff P; Pugh, Trevor J

2018-03-01

Somatic copy number variations (CNVs) play a crucial role in development of many human cancers. The broad availability of next-generation sequencing data has enabled the development of algorithms to computationally infer CNV profiles from a variety of data types including exome and targeted sequence data; currently the most prevalent types of cancer genomics data. However, systemic evaluation and comparison of these tools remains challenging due to a lack of ground truth reference sets. To address this need, we have developed Bamgineer, a tool written in Python to introduce user-defined haplotype-phased allele-specific copy number events into an existing Binary Alignment Mapping (BAM) file, with a focus on targeted and exome sequencing experiments. As input, this tool requires a read alignment file (BAM format), lists of non-overlapping genome coordinates for introduction of gains and losses (bed file), and an optional file defining known haplotypes (vcf format). To improve runtime performance, Bamgineer introduces the desired CNVs in parallel using queuing and parallel processing on a local machine or on a high-performance computing cluster. As proof-of-principle, we applied Bamgineer to a single high-coverage (mean: 220X) exome sequence file from a blood sample to simulate copy number profiles of 3 exemplar tumors from each of 10 tumor types at 5 tumor cellularity levels (20-100%, 150 BAM files in total). To demonstrate feasibility beyond exome data, we introduced read alignments to a targeted 5-gene cell-free DNA sequencing library to simulate EGFR amplifications at frequencies consistent with circulating tumor DNA (10, 1, 0.1 and 0.01%) while retaining the multimodal insert size distribution of the original data. We expect Bamgineer to be of use for development and systematic benchmarking of CNV calling algorithms by users using locally-generated data for a variety of applications. The source code is freely available at http://github.com/pughlab/bamgineer.
Parallel, Distributed Scripting with Python

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, P J

2002-05-24

Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI librarymore » gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.« less
Climate Prediction Center - Reanalysis: Atmospheric Data

Science.gov Websites

files; i.e., wgrib for GRIB-2 files wgrib2mv,wgrib2ms parallel processing with wgrib2 grb1to2.pl perl US government, DOC, NWS, NCEP or CPC. All spelling errors are property of the finder. comments
Architecture and method for a burst buffer using flash technology

DOEpatents

Tzelnic, Percy; Faibish, Sorin; Gupta, Uday K.; Bent, John; Grider, Gary Alan; Chen, Hsing-bung

2016-03-15

A parallel supercomputing cluster includes compute nodes interconnected in a mesh of data links for executing an MPI job, and solid-state storage nodes each linked to a respective group of the compute nodes for receiving checkpoint data from the respective compute nodes, and magnetic disk storage linked to each of the solid-state storage nodes for asynchronous migration of the checkpoint data from the solid-state storage nodes to the magnetic disk storage. Each solid-state storage node presents a file system interface to the MPI job, and multiple MPI processes of the MPI job write the checkpoint data to a shared file in the solid-state storage in a strided fashion, and the solid-state storage node asynchronously migrates the checkpoint data from the shared file in the solid-state storage to the magnetic disk storage and writes the checkpoint data to the magnetic disk storage in a sequential fashion.

77 FR 26542 - Black Mountain Hydro, LLC; Notice of Preliminary Permit Application Accepted for Filing and...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-05-04

... the powerhouse to Sierra Pacific Power's Fort Churchill generating station, parallel to an existing... powerhouse to the PDCI and then parallel to the PDCI to the Fort Churchill generating station; and (9...
A series of PDB related databases for everyday needs.

PubMed

Joosten, Robbie P; te Beek, Tim A H; Krieger, Elmar; Hekkelman, Maarten L; Hooft, Rob W W; Schneider, Reinhard; Sander, Chris; Vriend, Gert

2011-01-01

The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database holds one entry, if possible, for each PDB entry. DSSP holds the secondary structure of the proteins. PDBREPORT holds reports on the structure quality and lists errors. HSSP holds a multiple sequence alignment for all proteins. The PDBFINDER holds easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO holds re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT summarizes why certain files could not be produced. All these systems are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design.
Development of a Distributed Parallel Computing Framework to Facilitate Regional/Global Gridded Crop Modeling with Various Scenarios

NASA Astrophysics Data System (ADS)

Jang, W.; Engda, T. A.; Neff, J. C.; Herrick, J.

2017-12-01

Many crop models are increasingly used to evaluate crop yields at regional and global scales. However, implementation of these models across large areas using fine-scale grids is limited by computational time requirements. In order to facilitate global gridded crop modeling with various scenarios (i.e., different crop, management schedule, fertilizer, and irrigation) using the Environmental Policy Integrated Climate (EPIC) model, we developed a distributed parallel computing framework in Python. Our local desktop with 14 cores (28 threads) was used to test the distributed parallel computing framework in Iringa, Tanzania which has 406,839 grid cells. High-resolution soil data, SoilGrids (250 x 250 m), and climate data, AgMERRA (0.25 x 0.25 deg) were also used as input data for the gridded EPIC model. The framework includes a master file for parallel computing, input database, input data formatters, EPIC model execution, and output analyzers. Through the master file for parallel computing, the user-defined number of threads of CPU divides the EPIC simulation into jobs. Then, Using EPIC input data formatters, the raw database is formatted for EPIC input data and the formatted data moves into EPIC simulation jobs. Then, 28 EPIC jobs run simultaneously and only interesting results files are parsed and moved into output analyzers. We applied various scenarios with seven different slopes and twenty-four fertilizer ranges. Parallelized input generators create different scenarios as a list for distributed parallel computing. After all simulations are completed, parallelized output analyzers are used to analyze all outputs according to the different scenarios. This saves significant computing time and resources, making it possible to conduct gridded modeling at regional to global scales with high-resolution data. For example, serial processing for the Iringa test case would require 113 hours, while using the framework developed in this study requires only approximately 6 hours, a nearly 95% reduction in computing time.
An Optimizing Compiler for Petascale I/O on Leadership-Class Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kandemir, Mahmut Taylan; Choudary, Alok; Thakur, Rajeev

In high-performance computing (HPC), parallel I/O architectures usually have very complex hierarchies with multiple layers that collectively constitute an I/O stack, including high-level I/O libraries such as PnetCDF and HDF5, I/O middleware such as MPI-IO, and parallel file systems such as PVFS and Lustre. Our DOE project explored automated instrumentation and compiler support for I/O intensive applications. Our project made significant progress towards understanding the complex I/O hierarchies of high-performance storage systems (including storage caches, HDDs, and SSDs), and designing and implementing state-of-the-art compiler/runtime system technology that targets I/O intensive HPC applications that target leadership class machine. This final reportmore » summarizes the major achievements of the project and also points out promising future directions Two new sections in this report compared to the previous report are IOGenie and SSD/NVM-specific optimizations.« less
Schnek: A C++ library for the development of parallel simulation codes on regular grids

NASA Astrophysics Data System (ADS)

Schmitz, Holger

2018-05-01

A large number of algorithms across the field of computational physics are formulated on grids with a regular topology. We present Schnek, a library that enables fast development of parallel simulations on regular grids. Schnek contains a number of easy-to-use modules that greatly reduce the amount of administrative code for large-scale simulation codes. The library provides an interface for reading simulation setup files with a hierarchical structure. The structure of the setup file is translated into a hierarchy of simulation modules that the developer can specify. The reader parses and evaluates mathematical expressions and initialises variables or grid data. This enables developers to write modular and flexible simulation codes with minimal effort. Regular grids of arbitrary dimension are defined as well as mechanisms for defining physical domain sizes, grid staggering, and ghost cells on these grids. Ghost cells can be exchanged between neighbouring processes using MPI with a simple interface. The grid data can easily be written into HDF5 files using serial or parallel I/O.
Addressing the Tension Between Strong Perimeter Control an Usability

NASA Technical Reports Server (NTRS)

Hinke, Thomas H.; Kolano, Paul Z.; Keller, Chris

2006-01-01

This paper describes a strong perimeter control system for a general purpose processing system, with the perimeter control system taking significant steps to address usability issues, thus mitigating the tension between strong perimeter protection and usability. A secure front end enforces two-factor authentication for all interactive access to an enclave that contains a large supercomputer and various associated systems, with each requiring their own authentication. Usability is addressed through a design in which the user has to perform two-factor authentication at the secure front end in order to gain access to the enclave, while an agent transparently performs public key authentication as needed to authenticate to specific systems within the enclave. The paper then describes a proxy system that allows users to transfer files into the enclave under script control, when the user is not present to perform two-factor authentication. This uses a pre-authorization approach based on public key technology, which is still strongly tied to both two-factor authentication and strict control over where files can be transferred on the target system. Finally the paper describes an approach to support network applications and systems such as grids or parallel file transfer protocols that require the use of many ports through the perimeter. The paper describes a least privilege approach that dynamically opens ports on a host-specific, if-authorized, as-needed, just-in-time basis.
Cloud parallel processing of tandem mass spectrometry based proteomics data.

PubMed

Mohammed, Yassene; Mostovenko, Ekaterina; Henneman, Alex A; Marissen, Rob J; Deelder, André M; Palmblad, Magnus

2012-10-05

Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.
Study on parallel and distributed management of RS data based on spatial database

NASA Astrophysics Data System (ADS)

Chen, Yingbiao; Qian, Qinglan; Wu, Hongqiao; Liu, Shijin

2009-10-01

With the rapid development of current earth-observing technology, RS image data storage, management and information publication become a bottle-neck for its appliance and popularization. There are two prominent problems in RS image data storage and management system. First, background server hardly handle the heavy process of great capacity of RS data which stored at different nodes in a distributing environment. A tough burden has put on the background server. Second, there is no unique, standard and rational organization of Multi-sensor RS data for its storage and management. And lots of information is lost or not included at storage. Faced at the above two problems, the paper has put forward a framework for RS image data parallel and distributed management and storage system. This system aims at RS data information system based on parallel background server and a distributed data management system. Aiming at the above two goals, this paper has studied the following key techniques and elicited some revelatory conclusions. The paper has put forward a solid index of "Pyramid, Block, Layer, Epoch" according to the properties of RS image data. With the solid index mechanism, a rational organization for different resolution, different area, different band and different period of Multi-sensor RS image data is completed. In data storage, RS data is not divided into binary large objects to be stored at current relational database system, while it is reconstructed through the above solid index mechanism. A logical image database for the RS image data file is constructed. In system architecture, this paper has set up a framework based on a parallel server of several common computers. Under the framework, the background process is divided into two parts, the common WEB process and parallel process.
Study on parallel and distributed management of RS data based on spatial data base

NASA Astrophysics Data System (ADS)

Chen, Yingbiao; Qian, Qinglan; Liu, Shijin

2006-12-01

With the rapid development of current earth-observing technology, RS image data storage, management and information publication become a bottle-neck for its appliance and popularization. There are two prominent problems in RS image data storage and management system. First, background server hardly handle the heavy process of great capacity of RS data which stored at different nodes in a distributing environment. A tough burden has put on the background server. Second, there is no unique, standard and rational organization of Multi-sensor RS data for its storage and management. And lots of information is lost or not included at storage. Faced at the above two problems, the paper has put forward a framework for RS image data parallel and distributed management and storage system. This system aims at RS data information system based on parallel background server and a distributed data management system. Aiming at the above two goals, this paper has studied the following key techniques and elicited some revelatory conclusions. The paper has put forward a solid index of "Pyramid, Block, Layer, Epoch" according to the properties of RS image data. With the solid index mechanism, a rational organization for different resolution, different area, different band and different period of Multi-sensor RS image data is completed. In data storage, RS data is not divided into binary large objects to be stored at current relational database system, while it is reconstructed through the above solid index mechanism. A logical image database for the RS image data file is constructed. In system architecture, this paper has set up a framework based on a parallel server of several common computers. Under the framework, the background process is divided into two parts, the common WEB process and parallel process.
Status of the calibration and alignment framework at the Belle II experiment

NASA Astrophysics Data System (ADS)

Dossett, D.; Sevior, M.; Ritter, M.; Kuhr, T.; Bilka, T.; Yaschenko, S.; Belle Software Group, II

2017-10-01

The Belle II detector at the Super KEKB e+e-collider plans to take first collision data in 2018. The monetary and CPU time costs associated with storing and processing the data mean that it is crucial for the detector components at Belle II to be calibrated quickly and accurately. A fast and accurate calibration system would allow the high level trigger to increase the efficiency of event selection, and can give users analysis-quality reconstruction promptly. A flexible framework to automate the fast production of calibration constants is being developed in the Belle II Analysis Software Framework (basf2). Detector experts only need to create two components from C++ base classes in order to use the automation system. The first collects data from Belle II event data files and outputs much smaller files to pass to the second component. This runs the main calibration algorithm to produce calibration constants ready for upload into the conditions database. A Python framework coordinates the input files, order of processing, and submission of jobs. Splitting the operation into collection and algorithm processing stages allows the framework to optionally parallelize the collection stage on a batch system.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sadayappan, Ponnuswamy

Exascale computing systems will provide a thousand-fold increase in parallelism and a proportional increase in failure rate relative to today's machines. Systems software for exascale machines must provide the infrastructure to support existing applications while simultaneously enabling efficient execution of new programming models that naturally express dynamic, adaptive, irregular computation; coupled simulations; and massive data analysis in a highly unreliable hardware environment with billions of threads of execution. We propose a new approach to the data and work distribution model provided by system software based on the unifying formalism of an abstract file system. The proposed hierarchical data model providesmore » simple, familiar visibility and access to data structures through the file system hierarchy, while providing fault tolerance through selective redundancy. The hierarchical task model features work queues whose form and organization are represented as file system objects. Data and work are both first class entities. By exposing the relationships between data and work to the runtime system, information is available to optimize execution time and provide fault tolerance. The data distribution scheme provides replication (where desirable and possible) for fault tolerance and efficiency, and it is hierarchical to make it possible to take advantage of locality. The user, tools, and applications, including legacy applications, can interface with the data, work queues, and one another through the abstract file model. This runtime environment will provide multiple interfaces to support traditional Message Passing Interface applications, languages developed under DARPA's High Productivity Computing Systems program, as well as other, experimental programming models. We will validate our runtime system with pilot codes on existing platforms and will use simulation to validate for exascale-class platforms. In this final report, we summarize research results from the work done at the Ohio State University towards the larger goals of the project listed above.« less
POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph

NASA Astrophysics Data System (ADS)

Poat, M. D.; Lauret, J.; Betts, W.

2015-12-01

The STAR online computing infrastructure has become an intensive dynamic system used for first-hand data collection and analysis resulting in a dense collection of data output. As we have transitioned to our current state, inefficient, limited storage systems have become an impediment to fast feedback to online shift crews. Motivation for a centrally accessible, scalable and redundant distributed storage system had become a necessity in this environment. OpenStack Swift Object Storage and Ceph Object Storage are two eye-opening technologies as community use and development have led to success elsewhere. In this contribution, OpenStack Swift and Ceph have been put to the test with single and parallel I/O tests, emulating real world scenarios for data processing and workflows. The Ceph file system storage, offering a POSIX compliant file system mounted similarly to an NFS share was of particular interest as it aligned with our requirements and was retained as our solution. I/O performance tests were run against the Ceph POSIX file system and have presented surprising results indicating true potential for fast I/O and reliability. STAR'S online compute farm historical use has been for job submission and first hand data analysis. The goal of reusing the online compute farm to maintain a storage cluster and job submission will be an efficient use of the current infrastructure.
Use of geographic information system to display water-quality data from San Juan basin

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thorn, C.R.; Dam, W.L.

1989-09-01

The ARC/INFO geographic information system is creating thematic maps of the San Juan basin as part of the USGS Regional Aquifer-System Analysis program. (Use of trade names is for descriptive purposes only and does not constitute endorsement by the US Geological Survey.) Maps created by a Prime version of ARC/INFO, to be published in a series of Hydrologic Investigations Atlas reports for selected geologic units, will include outcrop patters, water-well locations, and water-quality data. The San Juan basin study area, encompassing about 19,400 mi{sup 2}, can be displayed with ARC/INFO at various scales; on the same scale, generated water-quality mapsmore » can be compared and overlain with other maps such as potentiometric surface and depth to top of a geologic or hydrologic unit. Selected water-quality and well data (including latitude and longitude) are retrieved from the USGS National Water Information System data base for a specified geologic unit. Data are formatted by Fortran programs and read into an INFO data base. Two parallel files - an INFO file containing water-quality data and well data and an ARC file containing the site coordinates - are joined to form the ARC/INFO data base. A file containing a series of commands using Prime's Command Procedure language is used to select coverage, display, and position data on the map. Data interpretation is enhanced by displaying water-quality data throughout the basin in combination with other hydrologic and geologic data.« less
An approach to enhance pnetCDF performance in environmental modeling applications

EPA Science Inventory

Data intensive simulations are often limited by their I/O (input/output) performance, and "novel" techniques need to be developed in order to overcome this limitation. The software package pnetCDF (parallel network Common Data Form), which works with parallel file syste...
Graphical Representation of Parallel Algorithmic Processes

DTIC Science & Technology

1990-12-01

interface with the AAARF main process . The source code for the AAARF class-common library is in the common subdi- rectory and consists of the following files... for public release; distribution unlimited AFIT/GCE/ENG/90D-07 Graphical Representation of Parallel Algorithmic Processes THESIS Presented to the...goal of this study is to develop an algorithm animation facility for parallel processes executing on different architectures, from multiprocessor
War in the Information Age: A Primer for Cyberspace Operations in 21st Century Warfare

DTIC Science & Technology

2010-01-01

funds transfers ( EFT ). 30 Paralleling the rapid expansion of civilian cyberspace use is the increasing use of cyberspace by modern militaries...company files by using a thumb drive to tap the corporate system. Boeing estimated that the stolen documents would have cost it between $5 billion...tactics and intelligence operations such as collecting data, recruiting members of state security services, and setting up phone taps .‖ 69
Draco,Version 6.x.x

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thompson, Kelly; Budge, Kent; Lowrie, Rob

2016-03-03

Draco is an object-oriented component library geared towards numerically intensive, radiation (particle) transport applications built for parallel computing hardware. It consists of semi-independent packages and a robust build system. The packages in Draco provide a set of components that can be used by multiple clients to build transport codes. The build system can also be extracted for use in clients. Software includes smart pointers, Design-by-Contract assertions, unit test framework, wrapped MPI functions, a file parser, unstructured mesh data structures, a random number generator, root finders and an angular quadrature component.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Jun

Our group has been working with ANL collaborators on the topic bridging the gap between parallel file system and local file system during the course of this project period. We visited Argonne National Lab -- Dr. Robert Ross's group for one week in the past summer 2007. We looked over our current project progress and planned the activities for the incoming years 2008-09. The PI met Dr. Robert Ross several times such as HEC FSIO workshop 08, SC08 and SC10. We explored the opportunities to develop a production system by leveraging our current prototype to (SOGP+PVFS) a new PVFS version.more » We delivered SOGP+PVFS codes to ANL PVFS2 group in 2008.We also talked about exploring a potential project on developing new parallel programming models and runtime systems for data-intensive scalable computing (DISC). The methodology is to evolve MPI towards DISC by incorporating some functions of Google MapReduce parallel programming model. More recently, we are together exploring how to leverage existing works to perform (1) coordination/aggregation of local I/O operations prior to movement over the WAN, (2) efficient bulk data movement over the WAN, (3) latency hiding techniques for latency-intensive operations. Since 2009, we start applying Hadoop/MapReduce to some HEC applications with LANL scientists John Bent and Salman Habib. Another on-going work is to improve checkpoint performance at I/O forwarding Layer for the Road Runner super computer with James Nuetz and Gary Gridder at LANL. Two senior undergraduates from our research group did summer internships about high-performance file and storage system projects in LANL since 2008 for consecutive three years. Both of them are now pursuing Ph.D. degree in our group and will be 4th year in the PhD program in Fall 2011 and go to LANL to advance two above-mentioned works during this winter break. Since 2009, we have been collaborating with several computer scientists (Gary Grider, John bent, Parks Fields, James Nunez, Hsing-Bung Chen, etc) from HPC5 and James Ahrens from Advanced Computing Laboratory in Los Alamos National Laboratory. We hold a weekly conference and/or video meeting on advancing works at two fronts: the hardware/software infrastructure of building large-scale data intensive cluster and research publications. Our group members assist in constructing several onsite LANL data intensive clusters. Two parties have been developing software codes and research papers together using both sides resources.« less
SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Mattmann, C. A.; Waliser, D. E.; Kim, J.; Loikith, P.; Lee, H.; McGibbney, L. J.; Whitehall, K. D.

2014-12-01

Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark. Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based ApacheTM Hadoop by 100x in memory and by 10x on disk, and makes iterative algorithms feasible. SciSpark will enable scalable model evaluation by executing large-scale comparisons of A-Train satellite observations to model grids on a cluster of 100 to 1000 compute nodes. This 2nd generation capability for NASA's Regional Climate Model Evaluation System (RCMES) will compute simple climate metrics at interactive speeds, and extend to quite sophisticated iterative algorithms such as machine-learning (ML) based clustering of temperature PDFs, and even graph-based algorithms for searching for Mesocale Convective Complexes. The goals of SciSpark are to: (1) Decrease the time to compute comparison statistics and plots from minutes to seconds; (2) Allow for interactive exploration of time-series properties over seasons and years; (3) Decrease the time for satellite data ingestion into RCMES to hours; (4) Allow for Level-2 comparisons with higher-order statistics or PDF's in minutes to hours; and (5) Move RCMES into a near real time decision-making platform. We will report on: the architecture and design of SciSpark, our efforts to integrate climate science algorithms in Python and Scala, parallel ingest and partitioning (sharding) of A-Train satellite observations from HDF files and model grids from netCDF files, first parallel runs to compute comparison statistics and PDF's, and first metrics quantifying parallel speedups and memory & disk usage.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Haghighat, A.; Sjoden, G.E.; Wagner, J.C.

In the past 10 yr, the Penn State Transport Theory Group (PSTTG) has concentrated its efforts on developing accurate and efficient particle transport codes to address increasing needs for efficient and accurate simulation of nuclear systems. The PSTTG's efforts have primarily focused on shielding applications that are generally treated using multigroup, multidimensional, discrete ordinates (S{sub n}) deterministic and/or statistical Monte Carlo methods. The difficulty with the existing public codes is that they require significant (impractical) computation time for simulation of complex three-dimensional (3-D) problems. For the S{sub n} codes, the large memory requirements are handled through the use of scratchmore » files (i.e., read-from and write-to-disk) that significantly increases the necessary execution time. Further, the lack of flexible features and/or utilities for preparing input and processing output makes these codes difficult to use. The Monte Carlo method becomes impractical because variance reduction (VR) methods have to be used, and normally determination of the necessary parameters for the VR methods is very difficult and time consuming for a complex 3-D problem. For the deterministic method, the authors have developed the 3-D parallel PENTRAN (Parallel Environment Neutral-particle TRANsport) code system that, in addition to a parallel 3-D S{sub n} solver, includes pre- and postprocessing utilities. PENTRAN provides for full phase-space decomposition, memory partitioning, and parallel input/output to provide the capability of solving large problems in a relatively short time. Besides having a modular parallel structure, PENTRAN has several unique new formulations and features that are necessary for achieving high parallel performance. For the Monte Carlo method, the major difficulty currently facing most users is the selection of an effective VR method and its associated parameters. For complex problems, generally, this process is very time consuming and may be complicated due to the possibility of biasing the results. In an attempt to eliminate this problem, the authors have developed the A{sup 3}MCNP (automated adjoint accelerated MCNP) code that automatically prepares parameters for source and transport biasing within a weight-window VR approach based on the S{sub n} adjoint function. A{sup 3}MCNP prepares the necessary input files for performing multigroup, 3-D adjoint S{sub n} calculations using TORT.« less

Request queues for interactive clients in a shared file system of a parallel computing system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bent, John M.; Faibish, Sorin

Interactive requests are processed from users of log-in nodes. A metadata server node is provided for use in a file system shared by one or more interactive nodes and one or more batch nodes. The interactive nodes comprise interactive clients to execute interactive tasks and the batch nodes execute batch jobs for one or more batch clients. The metadata server node comprises a virtual machine monitor; an interactive client proxy to store metadata requests from the interactive clients in an interactive client queue; a batch client proxy to store metadata requests from the batch clients in a batch client queue;more » and a metadata server to store the metadata requests from the interactive client queue and the batch client queue in a metadata queue based on an allocation of resources by the virtual machine monitor. The metadata requests can be prioritized, for example, based on one or more of a predefined policy and predefined rules.« less
Remote sensing applications for urban planning - The LUMIS project. [Land Use Management Information System

NASA Technical Reports Server (NTRS)

Paul, C. K.; Landini, A. J.; Diegert, C.

1975-01-01

The Santa Monica mountains of Los Angeles consist primarily of complexly folded sedimentary marine strata with igneous and metamorphic rocks at the eastern end of the mountains. With the increased development of the Santa Monicas, a study was conducted to determine the critical land use data items in the mountains. Two information systems developed in parallel are described. One capitalizes on the City's present computer line printer system, and the second utilizes map overlay techniques on an interactive computer terminal. Results concerning population, housing, and land improvement illustrate the successful linking of ordinal and nominal data files in the interactive system.-
MOLA: a bootable, self-configuring system for virtual screening using AutoDock4/Vina on computer clusters.

PubMed

Abreu, Rui Mv; Froufe, Hugo Jc; Queiroz, Maria João Rp; Ferreira, Isabel Cfr

2010-10-28

Virtual screening of small molecules using molecular docking has become an important tool in drug discovery. However, large scale virtual screening is time demanding and usually requires dedicated computer clusters. There are a number of software tools that perform virtual screening using AutoDock4 but they require access to dedicated Linux computer clusters. Also no software is available for performing virtual screening with Vina using computer clusters. In this paper we present MOLA, an easy-to-use graphical user interface tool that automates parallel virtual screening using AutoDock4 and/or Vina in bootable non-dedicated computer clusters. MOLA automates several tasks including: ligand preparation, parallel AutoDock4/Vina jobs distribution and result analysis. When the virtual screening project finishes, an open-office spreadsheet file opens with the ligands ranked by binding energy and distance to the active site. All results files can automatically be recorded on an USB-flash drive or on the hard-disk drive using VirtualBox. MOLA works inside a customized Live CD GNU/Linux operating system, developed by us, that bypass the original operating system installed on the computers used in the cluster. This operating system boots from a CD on the master node and then clusters other computers as slave nodes via ethernet connections. MOLA is an ideal virtual screening tool for non-experienced users, with a limited number of multi-platform heterogeneous computers available and no access to dedicated Linux computer clusters. When a virtual screening project finishes, the computers can just be restarted to their original operating system. The originality of MOLA lies on the fact that, any platform-independent computer available can he added to the cluster, without ever using the computer hard-disk drive and without interfering with the installed operating system. With a cluster of 10 processors, and a potential maximum speed-up of 10x, the parallel algorithm of MOLA performed with a speed-up of 8,64× using AutoDock4 and 8,60× using Vina.
Parallel line analysis: multifunctional software for the biomedical sciences

NASA Technical Reports Server (NTRS)

Swank, P. R.; Lewis, M. L.; Damron, K. L.; Morrison, D. R.

1990-01-01

An easy to use, interactive FORTRAN program for analyzing the results of parallel line assays is described. The program is menu driven and consists of five major components: data entry, data editing, manual analysis, manual plotting, and automatic analysis and plotting. Data can be entered from the terminal or from previously created data files. The data editing portion of the program is used to inspect and modify data and to statistically identify outliers. The manual analysis component is used to test the assumptions necessary for parallel line assays using analysis of covariance techniques and to determine potency ratios with confidence limits. The manual plotting component provides a graphic display of the data on the terminal screen or on a standard line printer. The automatic portion runs through multiple analyses without operator input. Data may be saved in a special file to expedite input at a future time.
Comparison of working length control consistency between hand K-files and Mtwo NiTi rotary system.

PubMed

Krajczár, Károly; Varga, Enikő; Marada, Gyula; Jeges, Sára; Tóth, Vilmos

2016-04-01

The purpose of this study was to investigate the consistency of working length control between hand instrumentation in comparison to engine driven Mtwo nickel-titanium rotary files. Forty extracted maxillary molars were selected and divided onto two parallel groups. The working lengths of the mesiobuccal root canals were estimated. The teeth were fixed in a phantom head. The root canal preparation was carried out group 1 (n=20) with hand K-files, (VDW, Munich, Germany) and group 2 (n=20) with Mtwo instruments (VDW, Munich, Germany). Vestibulo-oral and mesio-distal directional x-ray images were taken before the preparation with #10 K-file, inserted into the mesiobuccal root canal to the working length, and after preparation with #25, #30 and #40 files. Working lenght changes were detected with measurements between the radiological apex and the instrument tips. In the Mtwo group a difference in the working competency (p<0.05) could be noticed only in the vestibulo-oral direction from #10 to #40 file. The hand instrument group showed a significant difference in working length competency for each larger file size (p<0.05) (ANOVA). Regression analysis in the hand instrumentation group indicated a working length decrease with a mean of 0,2 mm after each consecutive file size (p<0.01). The outcome of our trial indicated a high consistency in working length control for root canal preparation under simulated clinical condition using Mtwo rotary files. Mtwo NiTi rotary file did therefore proved to be more accurate in comparison to the conventional hand instrumentation. Working length, Mtwo, nickel-titanium, hand preparation, engine driven preparation.
Comparison of working length control consistency between hand K-files and Mtwo NiTi rotary system

PubMed Central

Krajczár, Károly; Varga, Enikő; Jeges, Sára; Tóth, Vilmos

2016-01-01

Background The purpose of this study was to investigate the consistency of working length control between hand instrumentation in comparison to engine driven Mtwo nickel-titanium rotary files. Material and Methods Forty extracted maxillary molars were selected and divided onto two parallel groups. The working lengths of the mesiobuccal root canals were estimated. The teeth were fixed in a phantom head. The root canal preparation was carried out group 1 (n=20) with hand K-files, (VDW, Munich, Germany) and group 2 (n=20) with Mtwo instruments (VDW, Munich, Germany). Vestibulo-oral and mesio-distal directional x-ray images were taken before the preparation with #10 K-file, inserted into the mesiobuccal root canal to the working length, and after preparation with #25, #30 and #40 files. Working lenght changes were detected with measurements between the radiological apex and the instrument tips. Results In the Mtwo group a difference in the working competency (p<0.05) could be noticed only in the vestibulo-oral direction from #10 to #40 file. The hand instrument group showed a significant difference in working length competency for each larger file size (p<0.05) (ANOVA). Regression analysis in the hand instrumentation group indicated a working length decrease with a mean of 0,2 mm after each consecutive file size (p<0.01). Conclusions The outcome of our trial indicated a high consistency in working length control for root canal preparation under simulated clinical condition using Mtwo rotary files. Mtwo NiTi rotary file did therefore proved to be more accurate in comparison to the conventional hand instrumentation. Key words:Working length, Mtwo, nickel-titanium, hand preparation, engine driven preparation. PMID:27034752
Coupled Ocean/Atmospheric Mesoscale Prediction System (COAMPS), Version 5.0 (User’s Guide)

DTIC Science & Technology

2010-03-30

provides tools for common modeling functions, as well as regridding, data decomposition, and communication on parallel computers. NRL/MR/7320--10...specified gncomDir. If running COAMPS at the DSRC (e.g. BABBAGE, DAVINCI , or EINSTEIN), the global NCOM files will be copied to /scr/[user]/COAMPS/data...the site (DSRC or local) and the platform (BABBAGE. DAVINCI , EINSTEIN, or local machine) on which COAMPS is being run. site=navy_dsrc (for DSRC
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

PubMed

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Exploring the use of I/O nodes for computation in a MIMD multiprocessor

NASA Technical Reports Server (NTRS)

Kotz, David; Cai, Ting

1995-01-01

As parallel systems move into the production scientific-computing world, the emphasis will be on cost-effective solutions that provide high throughput for a mix of applications. Cost effective solutions demand that a system make effective use of all of its resources. Many MIMD multiprocessors today, however, distinguish between 'compute' and 'I/O' nodes, the latter having attached disks and being dedicated to running the file-system server. This static division of responsibilities simplifies system management but does not necessarily lead to the best performance in workloads that need a different balance of computation and I/O. Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.
Framework for Parallel Preprocessing of Microarray Data Using Hadoop

PubMed Central

2018-01-01

Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise and bias. Robust Multiarray Average (RMA) is one of the standard and popular methods that is utilized to preprocess the data and remove the noises. Most of the preprocessing algorithms are time-consuming and not able to handle a large number of datasets with thousands of experiments. Parallel processing can be used to address the above-mentioned issues. Hadoop is a well-known and ideal distributed file system framework that provides a parallel environment to run the experiment. In this research, for the first time, the capability of Hadoop and statistical power of R have been leveraged to parallelize the available preprocessing algorithm called RMA to efficiently process microarray data. The experiment has been run on cluster containing 5 nodes, while each node has 16 cores and 16 GB memory. It compares efficiency and the performance of parallelized RMA using Hadoop with parallelized RMA using affyPara package as well as sequential RMA. The result shows the speed-up rate of the proposed approach outperforms the sequential approach and affyPara approach. PMID:29796018
A DNA-based semantic fusion model for remote sensing data.

PubMed

Sun, Heng; Weng, Jian; Yu, Guangchuang; Massawe, Richard H

2013-01-01

Semantic technology plays a key role in various domains, from conversation understanding to algorithm analysis. As the most efficient semantic tool, ontology can represent, process and manage the widespread knowledge. Nowadays, many researchers use ontology to collect and organize data's semantic information in order to maximize research productivity. In this paper, we firstly describe our work on the development of a remote sensing data ontology, with a primary focus on semantic fusion-driven research for big data. Our ontology is made up of 1,264 concepts and 2,030 semantic relationships. However, the growth of big data is straining the capacities of current semantic fusion and reasoning practices. Considering the massive parallelism of DNA strands, we propose a novel DNA-based semantic fusion model. In this model, a parallel strategy is developed to encode the semantic information in DNA for a large volume of remote sensing data. The semantic information is read in a parallel and bit-wise manner and an individual bit is converted to a base. By doing so, a considerable amount of conversion time can be saved, i.e., the cluster-based multi-processes program can reduce the conversion time from 81,536 seconds to 4,937 seconds for 4.34 GB source data files. Moreover, the size of result file recording DNA sequences is 54.51 GB for parallel C program compared with 57.89 GB for sequential Perl. This shows that our parallel method can also reduce the DNA synthesis cost. In addition, data types are encoded in our model, which is a basis for building type system in our future DNA computer. Finally, we describe theoretically an algorithm for DNA-based semantic fusion. This algorithm enables the process of integration of the knowledge from disparate remote sensing data sources into a consistent, accurate, and complete representation. This process depends solely on ligation reaction and screening operations instead of the ontology.
A DNA-Based Semantic Fusion Model for Remote Sensing Data

PubMed Central

Sun, Heng; Weng, Jian; Yu, Guangchuang; Massawe, Richard H.

2013-01-01

Semantic technology plays a key role in various domains, from conversation understanding to algorithm analysis. As the most efficient semantic tool, ontology can represent, process and manage the widespread knowledge. Nowadays, many researchers use ontology to collect and organize data's semantic information in order to maximize research productivity. In this paper, we firstly describe our work on the development of a remote sensing data ontology, with a primary focus on semantic fusion-driven research for big data. Our ontology is made up of 1,264 concepts and 2,030 semantic relationships. However, the growth of big data is straining the capacities of current semantic fusion and reasoning practices. Considering the massive parallelism of DNA strands, we propose a novel DNA-based semantic fusion model. In this model, a parallel strategy is developed to encode the semantic information in DNA for a large volume of remote sensing data. The semantic information is read in a parallel and bit-wise manner and an individual bit is converted to a base. By doing so, a considerable amount of conversion time can be saved, i.e., the cluster-based multi-processes program can reduce the conversion time from 81,536 seconds to 4,937 seconds for 4.34 GB source data files. Moreover, the size of result file recording DNA sequences is 54.51 GB for parallel C program compared with 57.89 GB for sequential Perl. This shows that our parallel method can also reduce the DNA synthesis cost. In addition, data types are encoded in our model, which is a basis for building type system in our future DNA computer. Finally, we describe theoretically an algorithm for DNA-based semantic fusion. This algorithm enables the process of integration of the knowledge from disparate remote sensing data sources into a consistent, accurate, and complete representation. This process depends solely on ligation reaction and screening operations instead of the ontology. PMID:24116207
Parallelisation study of a three-dimensional environmental flow model

NASA Astrophysics Data System (ADS)

O'Donncha, Fearghal; Ragnoli, Emanuele; Suits, Frank

2014-03-01

There are many simulation codes in the geosciences that are serial and cannot take advantage of the parallel computational resources commonly available today. One model important for our work in coastal ocean current modelling is EFDC, a Fortran 77 code configured for optimal deployment on vector computers. In order to take advantage of our cache-based, blade computing system we restructured EFDC from serial to parallel, thereby allowing us to run existing models more quickly, and to simulate larger and more detailed models that were previously impractical. Since the source code for EFDC is extensive and involves detailed computation, it is important to do such a port in a manner that limits changes to the files, while achieving the desired speedup. We describe a parallelisation strategy involving surgical changes to the source files to minimise error-prone alteration of the underlying computations, while allowing load-balanced domain decomposition for efficient execution on a commodity cluster. The use of conjugate gradient posed particular challenges due to implicit non-local communication posing a hindrance to standard domain partitioning schemes; a number of techniques are discussed to address this in a feasible, computationally efficient manner. The parallel implementation demonstrates good scalability in combination with a novel domain partitioning scheme that specifically handles mixed water/land regions commonly found in coastal simulations. The approach presented here represents a practical methodology to rejuvenate legacy code on a commodity blade cluster with reasonable effort; our solution has direct application to other similar codes in the geosciences.
Enabling High-performance Interactive Geoscience Data Analysis Through Data Placement and Movement Optimization

NASA Astrophysics Data System (ADS)

Zhu, F.; Yu, H.; Rilee, M. L.; Kuo, K. S.; Yu, L.; Pan, Y.; Jiang, H.

2017-12-01

Since the establishment of data archive centers and the standardization of file formats, scientists are required to search metadata catalogs for data needed and download the data files to their local machines to carry out data analysis. This approach has facilitated data discovery and access for decades, but it inevitably leads to data transfer from data archive centers to scientists' computers through low-bandwidth Internet connections. Data transfer becomes a major performance bottleneck in such an approach. Combined with generally constrained local compute/storage resources, they limit the extent of scientists' studies and deprive them of timely outcomes. Thus, this conventional approach is not scalable with respect to both the volume and variety of geoscience data. A much more viable solution is to couple analysis and storage systems to minimize data transfer. In our study, we compare loosely coupled approaches (exemplified by Spark and Hadoop) and tightly coupled approaches (exemplified by parallel distributed database management systems, e.g., SciDB). In particular, we investigate the optimization of data placement and movement to effectively tackle the variety challenge, and boost the popularization of parallelization to address the volume challenge. Our goal is to enable high-performance interactive analysis for a good portion of geoscience data analysis exercise. We show that tightly coupled approaches can concentrate data traffic between local storage systems and compute units, and thereby optimizing bandwidth utilization to achieve a better throughput. Based on our observations, we develop a geoscience data analysis system that tightly couples analysis engines with storages, which has direct access to the detailed map of data partition locations. Through an innovation data partitioning and distribution scheme, our system has demonstrated scalable and interactive performance in real-world geoscience data analysis applications.
Oak Ridge Institutional Cluster Autotune Test Drive Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jibonananda, Sanyal; New, Joshua Ryan

2014-02-01

The Oak Ridge Institutional Cluster (OIC) provides general purpose computational resources for the ORNL staff to run computation heavy jobs that are larger than desktop applications but do not quite require the scale and power of the Oak Ridge Leadership Computing Facility (OLCF). This report details the efforts made and conclusions derived in performing a short test drive of the cluster resources on Phase 5 of the OIC. EnergyPlus was used in the analysis as a candidate user program and the overall software environment was evaluated against anticipated challenges experienced with resources such as the shared memory-Nautilus (JICS) and Titanmore » (OLCF). The OIC performed within reason and was found to be acceptable in the context of running EnergyPlus simulations. The number of cores per node and the availability of scratch space per node allow non-traditional desktop focused applications to leverage parallel ensemble execution. Although only individual runs of EnergyPlus were executed, the software environment on the OIC appeared suitable to run ensemble simulations with some modifications to the Autotune workflow. From a standpoint of general usability, the system supports common Linux libraries, compilers, standard job scheduling software (Torque/Moab), and the OpenMPI library (the only MPI library) for MPI communications. The file system is a Panasas file system which literature indicates to be an efficient file system.« less
LHCb Online event processing and filtering

NASA Astrophysics Data System (ADS)

Alessio, F.; Barandela, C.; Brarda, L.; Frank, M.; Franek, B.; Galli, D.; Gaspar, C.; Herwijnen, E. v.; Jacobsson, R.; Jost, B.; Köstner, S.; Moine, G.; Neufeld, N.; Somogyi, P.; Stoica, R.; Suman, S.

2008-07-01

The first level trigger of LHCb accepts one million events per second. After preprocessing in custom FPGA-based boards these events are distributed to a large farm of PC-servers using a high-speed Gigabit Ethernet network. Synchronisation and event management is achieved by the Timing and Trigger system of LHCb. Due to the complex nature of the selection of B-events, which are the main interest of LHCb, a full event-readout is required. Event processing on the servers is parallelised on an event basis. The reduction factor is typically 1/500. The remaining events are forwarded to a formatting layer, where the raw data files are formed and temporarily stored. A small part of the events is also forwarded to a dedicated farm for calibration and monitoring. The files are subsequently shipped to the CERN Tier0 facility for permanent storage and from there to the various Tier1 sites for reconstruction. In parallel files are used by various monitoring and calibration processes running within the LHCb Online system. The entire data-flow is controlled and configured by means of a SCADA system and several databases. After an overview of the LHCb data acquisition and its design principles this paper will emphasize the LHCb event filter system, which is now implemented using the final hardware and will be ready for data-taking for the LHC startup. Control, configuration and security aspects will also be discussed.
Processing EOS MLS Level-2 Data

NASA Technical Reports Server (NTRS)

Snyder, W. Van; Wu, Dong; Read, William; Jiang, Jonathan; Wagner, Paul; Livesey, Nathaniel; Schwartz, Michael; Filipiak, Mark; Pumphrey, Hugh; Shippony, Zvi

2006-01-01

A computer program performs level-2 processing of thermal-microwave-radiance data from observations of the limb of the Earth by the Earth Observing System (EOS) Microwave Limb Sounder (MLS). The purpose of the processing is to estimate the composition and temperature of the atmosphere versus altitude from .8 to .90 km. "Level-2" as used here is a specialists f term signifying both vertical profiles of geophysical parameters along the measurement track of the instrument and processing performed by this or other software to generate such profiles. Designed to be flexible, the program is controlled via a configuration file that defines all aspects of processing, including contents of state and measurement vectors, configurations of forward models, measurement and calibration data to be read, and the manner of inverting the models to obtain the desired estimates. The program can operate in a parallel form in which one instance of the program acts a master, coordinating the work of multiple slave instances on a cluster of computers, each slave operating on a portion of the data. Optionally, the configuration file can be made to instruct the software to produce files of simulated radiances based on state vectors formed from sets of geophysical data-product files taken as input.
User’s Guide for the Coupled Ocean/Atmospheric Mesoscale Prediction System (COAMPS) Version 5.0

DTIC Science & Technology

2010-03-30

provides tools for common modeling functions, as well as regridding, data decomposition, and communication on parallel computers. NRL/MR/7320...specified gncomDir. If running COAMPS at the DSRC (e.g. BABBAGE, DAVINCI , or EINSTEIN), the global NCOM files will be copied to /scr/[user]/COAMPS/data...the site (DSRC or local) and the platform (BABBAGE. DAVINCI , EINSTEIN, or local machine) on which COAMPS is being run. site=navy_dsrc (for DSRC
[Development and evaluation of the medical imaging distribution system with dynamic web application and clustering technology].

PubMed

Yokohama, Noriya; Tsuchimoto, Tadashi; Oishi, Masamichi; Itou, Katsuya

2007-01-20

It has been noted that the downtime of medical informatics systems is often long. Many systems encounter downtimes of hours or even days, which can have a critical effect on daily operations. Such systems remain especially weak in the areas of database and medical imaging data. The scheme design shows the three-layer architecture of the system: application, database, and storage layers. The application layer uses the DICOM protocol (Digital Imaging and Communication in Medicine) and HTTP (Hyper Text Transport Protocol) with AJAX (Asynchronous JavaScript+XML). The database is designed to decentralize in parallel using cluster technology. Consequently, restoration of the database can be done not only with ease but also with improved retrieval speed. In the storage layer, a network RAID (Redundant Array of Independent Disks) system, it is possible to construct exabyte-scale parallel file systems that exploit storage spread. Development and evaluation of the test-bed has been successful in medical information data backup and recovery in a network environment. This paper presents a schematic design of the new medical informatics system that can be accommodated from a recovery and the dynamic Web application for medical imaging distribution using AJAX.
GLAD: a system for developing and deploying large-scale bioinformatics grid.

PubMed

Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong

2005-03-01

Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.

76 FR 20750 - Self-Regulatory Organizations; EDGX Exchange, Inc.; Notice of Filing and Immediate Effectiveness...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-04-13

... on the Exchange's Internet Web site at http://www.directedge.com . \\3\\ A Member is any registered... strategy to the ROUD/ROUE routing strategies is Parallel D or Parallel 2D with the DRT (Dark routing... one method. The Commission will post all comments on the Commission's Internet Web site ( http://www...
DOE Office of Scientific and Technical Information (OSTI.GOV)

Yu, Weikuan; Vetter, Jeffrey S

Parallel NFS (pNFS) is touted as an emergent standard protocol for parallel I/O access in various storage environments. Several pNFS prototypes have been implemented for initial validation and protocol examination. Previous efforts have focused on realizing the pNFS protocol to expose the best bandwidth potential from underlying file and storage systems. In this presentation, we provide an initial characterization of two pNFS prototype implementations, lpNFS (a Lustre-based parallel NFS implementation) and spNFS (another reference implementation from Network Appliance, Inc.). We show that both lpNFS and spNFS can faithfully achieve the primary goal of pNFS, i.e., aggregating I/O bandwidth from manymore » storage servers. However, they both face the challenge of scalable metadata management. Particularly, the throughput of sp-NFS metadata operations degrades significanlty with an increasing number of data servers. Even for the better-performing lpNFS, we discuss its architecture and propose a direct I/O request flow protocol to improve its performance.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Collins, Benjamin S.

The Futility package contains the following: 1) Definition of the size of integers and real numbers; 2) A generic Unit test harness; 3) Definitions for some basic extensions to the Fortran language: arbitrary length strings, a parameter list construct, exception handlers, command line processor, timers; 4) Geometry definitions: point, line, plane, box, cylinder, polyhedron; 5) File wrapper functions: standard Fortran input/output files, Fortran binary files, HDF5 files; 6) Parallel wrapper functions: MPI, and Open MP abstraction layers, partitioning algorithms; 7) Math utilities: BLAS, Matrix and Vector definitions, Linear Solver methods and wrappers for other TPLs (PETSC, MKL, etc), preconditioner classes;more » 8) Misc: random number generator, water saturation properties, sorting algorithms.« less
Hospital integrated parallel cluster for fast and cost-efficient image analysis: clinical experience and research evaluation

NASA Astrophysics Data System (ADS)

Erberich, Stephan G.; Hoppe, Martin; Jansen, Christian; Schmidt, Thomas; Thron, Armin; Oberschelp, Walter

2001-08-01

In the last few years more and more University Hospitals as well as private hospitals changed to digital information systems for patient record, diagnostic files and digital images. Not only that patient management becomes easier, it is also very remarkable how clinical research can profit from Picture Archiving and Communication Systems (PACS) and diagnostic databases, especially from image databases. Since images are available on the finger tip, difficulties arise when image data needs to be processed, e.g. segmented, classified or co-registered, which usually demands a lot computational power. Today's clinical environment does support PACS very well, but real image processing is still under-developed. The purpose of this paper is to introduce a parallel cluster of standard distributed systems and its software components and how such a system can be integrated into a hospital environment. To demonstrate the cluster technique we present our clinical experience with the crucial but cost-intensive motion correction of clinical routine and research functional MRI (fMRI) data, as it is processed in our Lab on a daily basis.
Design and Execution of make-like, distributed Analyses based on Spotify’s Pipelining Package Luigi

NASA Astrophysics Data System (ADS)

Erdmann, M.; Fischer, B.; Fischer, R.; Rieger, M.

2017-10-01

In high-energy particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo production. However, physicists performing data analyses are usually required to steer their individual workflows manually which is time-consuming and often leads to undocumented relations between particular workloads. We present a generic analysis design pattern that copes with the sophisticated demands of end-to-end HEP analyses and provides a make-like execution system. It is based on the open-source pipelining package Luigi which was developed at Spotify and enables the definition of arbitrary workloads, so-called Tasks, and the dependencies between them in a lightweight and scalable structure. Further features are multi-user support, automated dependency resolution and error handling, central scheduling, and status visualization in the web. In addition to already built-in features for remote jobs and file systems like Hadoop and HDFS, we added support for WLCG infrastructure such as LSF and CREAM job submission, as well as remote file access through the Grid File Access Library. Furthermore, we implemented automated resubmission functionality, software sandboxing, and a command line interface with auto-completion for a convenient working environment. For the implementation of a t \\overline{{{t}}} H cross section measurement, we created a generic Python interface that provides programmatic access to all external information such as datasets, physics processes, statistical models, and additional files and values. In summary, the setup enables the execution of the entire analysis in a parallelized and distributed fashion with a single command.
A Note on Compiling Fortran

DOE Office of Scientific and Technical Information (OSTI.GOV)

Busby, L. E.

Fortran modules tend to serialize compilation of large Fortran projects, by introducing dependencies among the source files. If file A depends on file B, (A uses a module defined by B), you must finish compiling B before you can begin compiling A. Some Fortran compilers (Intel ifort, GNU gfortran and IBM xlf, at least) offer an option to ‘‘verify syntax’’, with the side effect of also producing any associated Fortran module files. As it happens, this option usually runs much faster than the object code generation and optimization phases. For some projects on some machines, it can be advantageous tomore » compile in two passes: The first pass generates the module files, quickly; the second pass produces the object files, in parallel. We achieve a 3.8× speedup in the case study below.« less
A Massively Parallel Computational Method of Reading Index Files for SOAPsnv.

PubMed

Zhu, Xiaoqian; Peng, Shaoliang; Liu, Shaojie; Cui, Yingbo; Gu, Xiang; Gao, Ming; Fang, Lin; Fang, Xiaodong

2015-12-01

SOAPsnv is the software used for identifying the single nucleotide variation in cancer genes. However, its performance is yet to match the massive amount of data to be processed. Experiments reveal that the main performance bottleneck of SOAPsnv software is the pileup algorithm. The original pileup algorithm's I/O process is time-consuming and inefficient to read input files. Moreover, the scalability of the pileup algorithm is also poor. Therefore, we designed a new algorithm, named BamPileup, aiming to improve the performance of sequential read, and the new pileup algorithm implemented a parallel read mode based on index. Using this method, each thread can directly read the data start from a specific position. The results of experiments on the Tianhe-2 supercomputer show that, when reading data in a multi-threaded parallel I/O way, the processing time of algorithm is reduced to 3.9 s and the application program can achieve a speedup up to 100×. Moreover, the scalability of the new algorithm is also satisfying.
Development of Parallel Computing Framework to Enhance Radiation Transport Code Capabilities for Rare Isotope Beam Facility Design

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kostin, Mikhail; Mokhov, Nikolai; Niita, Koji

A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA andmore » MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.« less
Scalable Molecular Dynamics with NAMD

PubMed Central

Phillips, James C.; Braun, Rosemary; Wang, Wei; Gumbart, James; Tajkhorshid, Emad; Villa, Elizabeth; Chipot, Christophe; Skeel, Robert D.; Kalé, Laxmikant; Schulten, Klaus

2008-01-01

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD scales to hundreds of processors on high-end parallel platforms, as well as tens of processors on low-cost commodity clusters, and also runs on individual desktop and laptop computers. NAMD works with AMBER and CHARMM potential functions, parameters, and file formats. This paper, directed to novices as well as experts, first introduces concepts and methods used in the NAMD program, describing the classical molecular dynamics force field, equations of motion, and integration methods along with the efficient electrostatics evaluation algorithms employed and temperature and pressure controls used. Features for steering the simulation across barriers and for calculating both alchemical and conformational free energy differences are presented. The motivations for and a roadmap to the internal design of NAMD, implemented in C++ and based on Charm++ parallel objects, are outlined. The factors affecting the serial and parallel performance of a simulation are discussed. Next, typical NAMD use is illustrated with representative applications to a small, a medium, and a large biomolecular system, highlighting particular features of NAMD, e.g., the Tcl scripting language. Finally, the paper provides a list of the key features of NAMD and discusses the benefits of combining NAMD with the molecular graphics/sequence analysis software VMD and the grid computing/collaboratory software BioCoRE. NAMD is distributed free of charge with source code at www.ks.uiuc.edu. PMID:16222654
76 FR 21076 - Self-Regulatory Organizations; EDGA Exchange, Inc.; Notice of Filing and Immediate Effectiveness...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-04-14

... on the Exchange's Internet Web site at http://www.directedge.com . \\3\\ A Member is any registered... Parallel D or Parallel 2D with the DRT (Dark routing technique) option on BZX. BZX charges $0.0020 per... the DRT (Dark routing technique) option on BZX or SCAN/STGY on Nasdaq OMX Exchange (``Nasdaq.'') BATS...
ArrayBridge: Interweaving declarative array processing with high-performance computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xing, Haoyuan; Floratos, Sofoklis; Blanas, Spyros

Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aimsmore » to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.« less
A performance analysis of advanced I/O architectures for PC-based network file servers

NASA Astrophysics Data System (ADS)

Huynh, K. D.; Khoshgoftaar, T. M.

1994-12-01

In the personal computing and workstation environments, more and more I/O adapters are becoming complete functional subsystems that are intelligent enough to handle I/O operations on their own without much intervention from the host processor. The IBM Subsystem Control Block (SCB) architecture has been defined to enhance the potential of these intelligent adapters by defining services and conventions that deliver command information and data to and from the adapters. In recent years, a new storage architecture, the Redundant Array of Independent Disks (RAID), has been quickly gaining acceptance in the world of computing. In this paper, we would like to discuss critical system design issues that are important to the performance of a network file server. We then present a performance analysis of the SCB architecture and disk array technology in typical network file server environments based on personal computers (PCs). One of the key issues investigated in this paper is whether a disk array can outperform a group of disks (of same type, same data capacity, and same cost) operating independently, not in parallel as in a disk array.
NIF Ignition Target 3D Point Design

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, O; Marinak, M; Milovich, J

2008-11-05

We have developed an input file for running 3D NIF hohlraums that is optimized such that it can be run in 1-2 days on parallel computers. We have incorporated increasing levels of automation into the 3D input file: (1) Configuration controlled input files; (2) Common file for 2D and 3D, different types of capsules (symcap, etc.); and (3) Can obtain target dimensions, laser pulse, and diagnostics settings automatically from NIF Campaign Management Tool. Using 3D Hydra calculations to investigate different problems: (1) Intrinsic 3D asymmetry; (2) Tolerance to nonideal 3D effects (e.g. laser power balance, pointing errors); and (3) Syntheticmore » diagnostics.« less
Beyond the single-file fluid limit using transfer matrix method: Exact results for confined parallel hard squares

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gurin, Péter; Varga, Szabolcs

2015-06-14

We extend the transfer matrix method of one-dimensional hard core fluids placed between confining walls for that case where the particles can pass each other and at most two layers can form. We derive an eigenvalue equation for a quasi-one-dimensional system of hard squares confined between two parallel walls, where the pore width is between σ and 3σ (σ is the side length of the square). The exact equation of state and the nearest neighbor distribution functions show three different structures: a fluid phase with one layer, a fluid phase with two layers, and a solid-like structure where the fluidmore » layers are strongly correlated. The structural transition between differently ordered fluids develops continuously with increasing density, i.e., no thermodynamic phase transition occurs. The high density structure of the system consists of clusters with two layers which are broken with particles staying in the middle of the pore.« less
A computer program for two-particle generalized coefficients of fractional parentage

NASA Astrophysics Data System (ADS)

Deveikis, A.; Juodagalvis, A.

2008-10-01

We present a FORTRAN90 program GCFP for the calculation of the generalized coefficients of fractional parentage (generalized CFPs or GCFP). The approach is based on the observation that the multi-shell CFPs can be expressed in terms of single-shell CFPs, while the latter can be readily calculated employing a simple enumeration scheme of antisymmetric A-particle states and an efficient method of construction of the idempotent matrix eigenvectors. The program provides fast calculation of GCFPs for a given particle number and produces results possessing numerical uncertainties below the desired tolerance. A single j-shell is defined by four quantum numbers, (e,l,j,t). A supplemental C++ program parGCFP allows calculation to be done in batches and/or in parallel. Program summaryProgram title:GCFP, parGCFP Catalogue identifier: AEBI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 17 199 No. of bytes in distributed program, including test data, etc.: 88 658 Distribution format: tar.gz Programming language: FORTRAN 77/90 ( GCFP), C++ ( parGCFP) Computer: Any computer with suitable compilers. The program GCFP requires a FORTRAN 77/90 compiler. The auxiliary program parGCFP requires GNU-C++ compatible compiler, while its parallel version additionally requires MPI-1 standard libraries Operating system: Linux (Ubuntu, Scientific) (all programs), also checked on Windows XP ( GCFP, serial version of parGCFP) RAM: The memory demand depends on the computation and output mode. If this mode is not 4, the program GCFP demands the following amounts of memory on a computer with Linux operating system. It requires around 2 MB of RAM for the A=12 system at E⩽2. Computation of the A=50 particle system requires around 60 MB of RAM at E=0 and ˜70 MB at E=2 (note, however, that the calculation of this system will take a very long time). If the computation and output mode is set to 4, the memory demands by GCFP are significantly larger. Calculation of GCFPs of A=12 system at E=1 requires 145 MB. The program parGCFP requires additional 2.5 and 4.5 MB of memory for the serial and parallel version, respectively. Classification: 17.18 Nature of problem: The program GCFP generates a list of two-particle coefficients of fractional parentage for several j-shells with isospin. Solution method: The method is based on the observation that multishell coefficients of fractional parentage can be expressed in terms of single-shell CFPs [1]. The latter are calculated using the algorithm [2,3] for a spectral decomposition of an antisymmetrization operator matrix Y. The coefficients of fractional parentage are those eigenvectors of the antisymmetrization operator matrix Y that correspond to unit eigenvalues. A computer code for these coefficients is available [4]. The program GCFP offers computation of two-particle multishell coefficients of fractional parentage. The program parGCFP allows a batch calculation using one input file. Sets of GCFPs are independent and can be calculated in parallel. Restrictions:A<86 when E=0 (due to the memory constraints); small numbers of particles allow significantly higher excitations, though the shell with j⩾11/2 cannot get full (it is the implementation constraint). Unusual features: Using the program GCFP it is possible to determine allowed particle configurations without the GCFP computation. The GCFPs can be calculated either for all particle configurations at once or for a specified particle configuration. The values of GCFPs can be printed out with a complete specification in either one file or with the parent and daughter configurations printed in separate files. The latter output mode requires additional time and RAM memory. It is possible to restrict the ( J,T) values of the considered particle configurations. (Here J is the total angular momentum and T is the total isospin of the system.) The program parGCFP produces several result files the number of which equals to the number of particle configurations. To work correctly, the program GCFP needs to be compiled to read parameters from the standard input (the default setting). Running time: It depends on the size of the problem. The minimum time is required, if the computation and output mode ( CompMode) is not 4, but the resulting file is larger. A system with A=12 particles at E=0 (all 9411 GCFPs) took around 1 sec on a Pentium4 2.8 GHz processor with 1 MB L2 cache. The program required about 14 min to calculate all 1.3×10 GCFPs of E=1. The time for all 5.5×10 GCFPs of E=2 was about 53 hours. For this number of particles, the calculation time of both E=0 and E=1 with CompMode = 1 and 4 is nearly the same, when no other processes are running. The case of E=2 could not be calculated with CompMode = 4, because the RAM memory was insufficient. In general, the latter CompMode requires a longer computation time, although the resulting files are smaller in size. The program parGCFP puts virtually no time overhead. Its parallel version speeds-up the calculation. However, the results need to be collected from several files created for each configuration. References: [1] J. Levinsonas, Works of Lithuanian SSR Academy of Sciences 4 (1957) 17. [2] A. Deveikis, A. Bončkus, R. Kalinauskas, Lithuanian Phys. J. 41 (2001) 3. [3] A. Deveikis, R.K. Kalinauskas, B.R. Barrett, Ann. Phys. 296 (2002) 287. [4] A. Deveikis, Comput. Phys. Comm. 173 (2005) 186. (CPC Catalogue ID. ADWI_v1_0)
A computer program for converting rectangular coordinates to latitude-longitude coordinates

USGS Publications Warehouse

Rutledge, A.T.

1989-01-01

A computer program was developed for converting the coordinates of any rectangular grid on a map to coordinates on a grid that is parallel to lines of equal latitude and longitude. Using this program in conjunction with groundwater flow models, the user can extract data and results from models with varying grid orientations and place these data into grid structure that is oriented parallel to lines of equal latitude and longitude. All cells in the rectangular grid must have equal dimensions, and all cells in the latitude-longitude grid measure one minute by one minute. This program is applicable if the map used shows lines of equal latitude as arcs and lines of equal longitude as straight lines and assumes that the Earth 's surface can be approximated as a sphere. The program user enters the row number , column number, and latitude and longitude of the midpoint of the cell for three test cells on the rectangular grid. The latitude and longitude of boundaries of the rectangular grid also are entered. By solving sets of simultaneous linear equations, the program calculates coefficients that are used for making the conversion. As an option in the program, the user may build a groundwater model file based on a grid that is parallel to lines of equal latitude and longitude. The program reads a data file based on the rectangular coordinates and automatically forms the new data file. (USGS)
A microkernel design for component-based parallel numerical software systems.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Balay, S.

1999-01-13

What is the minimal software infrastructure and what type of conventions are needed to simplify development of sophisticated parallel numerical application codes using a variety of software components that are not necessarily available as source code? We propose an opaque object-based model where the objects are dynamically loadable from the file system or network. The microkernel required to manage such a system needs to include, at most: (1) a few basic services, namely--a mechanism for loading objects at run time via dynamic link libraries, and consistent schemes for error handling and memory management; and (2) selected methods that all objectsmore » share, to deal with object life (destruction, reference counting, relationships), and object observation (viewing, profiling, tracing). We are experimenting with these ideas in the context of extensible numerical software within the ALICE (Advanced Large-scale Integrated Computational Environment) project, where we are building the microkernel to manage the interoperability among various tools for large-scale scientific simulations. This paper presents some preliminary observations and conclusions from our work with microkernel design.« less
Simulation of a Rotorcraft in Turbulent Flows

DTIC Science & Technology

1991-09-01

Knot) Aircraft Parallel Aircraft Parallel Aircraft Parallel To Ship’s To Port-To-Star- To Starboard- Centerline board Landing To-Port Landing Lineup ...Line Lineup Line 345 to 015/35 340 to 005/45 345 to 005145 016 t,) 040/30 006 to 035!35 006 to 025/40 041 to 180/45 036 to 050/30 026 to 040/30 181 to...WIND /FRA3 LOW REYNOLD’S NUMBER AERODYNAMICS FOR NACA0012 AIRFOIL REQUIRES DS/DM NACA0012/AIRFOIL NO SEQUENTIAL FILES REQUIRED INPUT FOR FORCE FRA3
GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations.

PubMed

Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

2015-07-01

GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.
Compact 0-complete trees: A new method for searching large files

DOE Office of Scientific and Technical Information (OSTI.GOV)

Orlandic, R.; Pfaltz, J.L.

1988-01-26

In this report, a novel approach to ordered retrieval in very large files is developed. The method employs a B-tree like search algorithm that is independent of key type or key length because all keys in index blocks are encoded by a 1 byte surrogate. The replacement of actual key sequences by the 1 byte surrogate ensures a maximal possible fan out and greatly reduces the storage overhead of maintaining access indices. Initially, retrieval in binary trie structure is developed. With the aid of a fairly complex recurrence relation, the rather scraggly binary trie is transformed into compact multi-way searchmore » tree. Then the recurrence relation itself is replaced by an unusually simple search algorithm. Then implementation details and empirical performance results are presented. Reduction of index size by 50%--75% opens up the possibility of replicating system-wide indices for parallel access in distributed databases. 23 figs.« less

Use of statecharts in the modelling of dynamic behaviour in the ATLAS DAQ prototype-1

NASA Astrophysics Data System (ADS)

Croll, P.; Duval, P.-Y.; Jones, R.; Kolos, S.; Sari, R. F.; Wheeler, S.

1998-08-01

Many applications within the ATLAS DAQ prototype-1 system have complicated dynamic behaviour which can be successfully modelled in terms of states and transitions between states. Previously, state diagrams implemented as finite-state machines have been used. Although effective, they become ungainly as system size increases. Harel statecharts address this problem by implementing additional features such as hierarchy and concurrency. The CHSM object-oriented language system is freeware which implements Harel statecharts as concurrent, hierarchical, finite-state machines (CHSMs). An evaluation of this language system by the ATLAS DAQ group has shown it to be suitable for describing the dynamic behaviour of typical DAQ applications. The language is currently being used to model the dynamic behaviour of the prototype-1 run-control system. The design is specified by means of a CHSM description file, and C++ code is obtained by running the CHSM compiler on the file. In parallel with the modelling work, a code generator has been developed which translates statecharts, drawn using the StP CASE tool, into the CHSM language. C++ code, describing the dynamic behaviour of the run-control system, has been successfully generated directly from StP statecharts using the CHSM generator and compiler. The validity of the design was tested using the simulation features of the Statemate CASE tool.
Parallel sort with a ranged, partitioned key-value store in a high perfomance computing environment

DOEpatents

Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron; Poole, Stephen W.

2016-01-26

Improved sorting techniques are provided that perform a parallel sort using a ranged, partitioned key-value store in a high performance computing (HPC) environment. A plurality of input data files comprising unsorted key-value data in a partitioned key-value store are sorted. The partitioned key-value store comprises a range server for each of a plurality of ranges. Each input data file has an associated reader thread. Each reader thread reads the unsorted key-value data in the corresponding input data file and performs a local sort of the unsorted key-value data to generate sorted key-value data. A plurality of sorted, ranged subsets of each of the sorted key-value data are generated based on the plurality of ranges. Each sorted, ranged subset corresponds to a given one of the ranges and is provided to one of the range servers corresponding to the range of the sorted, ranged subset. Each range server sorts the received sorted, ranged subsets and provides a sorted range. A plurality of the sorted ranges are concatenated to obtain a globally sorted result.
Parallel community climate model: Description and user`s guide

DOE Office of Scientific and Technical Information (OSTI.GOV)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain intomore » geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.« less
Cloud Engineering Principles and Technology Enablers for Medical Image Processing-as-a-Service.

PubMed

Bao, Shunxing; Plassard, Andrew J; Landman, Bennett A; Gokhale, Aniruddha

2017-04-01

Traditional in-house, laboratory-based medical imaging studies use hierarchical data structures (e.g., NFS file stores) or databases (e.g., COINS, XNAT) for storage and retrieval. The resulting performance from these approaches is, however, impeded by standard network switches since they can saturate network bandwidth during transfer from storage to processing nodes for even moderate-sized studies. To that end, a cloud-based "medical image processing-as-a-service" offers promise in utilizing the ecosystem of Apache Hadoop, which is a flexible framework providing distributed, scalable, fault tolerant storage and parallel computational modules, and HBase, which is a NoSQL database built atop Hadoop's distributed file system. Despite this promise, HBase's load distribution strategy of region split and merge is detrimental to the hierarchical organization of imaging data (e.g., project, subject, session, scan, slice). This paper makes two contributions to address these concerns by describing key cloud engineering principles and technology enhancements we made to the Apache Hadoop ecosystem for medical imaging applications. First, we propose a row-key design for HBase, which is a necessary step that is driven by the hierarchical organization of imaging data. Second, we propose a novel data allocation policy within HBase to strongly enforce collocation of hierarchically related imaging data. The proposed enhancements accelerate data processing by minimizing network usage and localizing processing to machines where the data already exist. Moreover, our approach is amenable to the traditional scan, subject, and project-level analysis procedures, and is compatible with standard command line/scriptable image processing software. Experimental results for an illustrative sample of imaging data reveals that our new HBase policy results in a three-fold time improvement in conversion of classic DICOM to NiFTI file formats when compared with the default HBase region split policy, and nearly a six-fold improvement over a commonly available network file system (NFS) approach even for relatively small file sets. Moreover, file access latency is lower than network attached storage.
VisIO: enabling interactive visualization of ultra-scale, time-series data via high-bandwidth distributed I/O systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mitchell, Christopher J; Ahrens, James P; Wang, Jun

2010-10-15

Petascale simulations compute at resolutions ranging into billions of cells and write terabytes of data for visualization and analysis. Interactive visuaUzation of this time series is a desired step before starting a new run. The I/O subsystem and associated network often are a significant impediment to interactive visualization of time-varying data; as they are not configured or provisioned to provide necessary I/O read rates. In this paper, we propose a new I/O library for visualization applications: VisIO. Visualization applications commonly use N-to-N reads within their parallel enabled readers which provides an incentive for a shared-nothing approach to I/O, similar tomore » other data-intensive approaches such as Hadoop. However, unlike other data-intensive applications, visualization requires: (1) interactive performance for large data volumes, (2) compatibility with MPI and POSIX file system semantics for compatibility with existing infrastructure, and (3) use of existing file formats and their stipulated data partitioning rules. VisIO, provides a mechanism for using a non-POSIX distributed file system to provide linear scaling of 110 bandwidth. In addition, we introduce a novel scheduling algorithm that helps to co-locate visualization processes on nodes with the requested data. Testing using VisIO integrated into Para View was conducted using the Hadoop Distributed File System (HDFS) on TACC's Longhorn cluster. A representative dataset, VPIC, across 128 nodes showed a 64.4% read performance improvement compared to the provided Lustre installation. Also tested, was a dataset representing a global ocean salinity simulation that showed a 51.4% improvement in read performance over Lustre when using our VisIO system. VisIO, provides powerful high-performance I/O services to visualization applications, allowing for interactive performance with ultra-scale, time-series data.« less
PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lichtner, Peter C.; Hammond, Glenn E.; Lu, Chuan

PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Writtenmore » in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2 32 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.« less
Enabling Object Storage via shims for Grid Middleware

NASA Astrophysics Data System (ADS)

Cadellin Skipsey, Samuel; De Witt, Shaun; Dewhurst, Alastair; Britton, David; Roy, Gareth; Crooks, David

2015-12-01

The Object Store model has quickly become the basis of most commercially successful mass storage infrastructure, backing so-called ”Cloud” storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in Object Store design are similar, but not identical, to concepts in the design of Grid Storage Elements, although the requirement for ”POSIX-like” filesystem structures on top of SEs makes the disjunction seem larger. As modern Object Stores provide many features that most Grid SEs do not (block level striping, parallel access, automatic file repair, etc.), it is of interest to see how easily we can provide interfaces to typical Object Stores via plugins and shims for Grid tools, and how well experiments can adapt their data models to them. We present evaluation of, and first-deployment experiences with, (for example) Xrootd-Ceph interfaces for direct object-store access, as part of an initiative within GridPP[1] hosted at RAL. Additionally, we discuss the tradeoffs and experience of developing plugins for the currently-popular Ceph parallel distributed filesystem for the GFAL2 access layer, at Glasgow.
Exploring Hardware-Based Primitives to Enhance Parallel Security Monitoring in a Novel Computing Architecture

DTIC Science & Technology

2007-03-01

software level retrieve state information that can inherently contain more contextual information . As a result, such mechanisms can be applied in more...ease by which state information can be gathered for monitoring purposes. For example, we consider soft security to allow for easier state retrieval ...files are to be checked and what parameters are to be verified. The independent auditor periodically retrieves information pertaining to the files in
Reconstructing evolutionary trees in parallel for massive sequences.

PubMed

Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam

2017-12-14

Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .
A parallel calibration utility for WRF-Hydro on high performance computers

NASA Astrophysics Data System (ADS)

Wang, J.; Wang, C.; Kotamarthi, V. R.

2017-12-01

A successful modeling of complex hydrological processes comprises establishing an integrated hydrological model which simulates the hydrological processes in each water regime, calibrates and validates the model performance based on observation data, and estimates the uncertainties from different sources especially those associated with parameters. Such a model system requires large computing resources and often have to be run on High Performance Computers (HPC). The recently developed WRF-Hydro modeling system provides a significant advancement in the capability to simulate regional water cycles more completely. The WRF-Hydro model has a large range of parameters such as those in the input table files — GENPARM.TBL, SOILPARM.TBL and CHANPARM.TBL — and several distributed scaling factors such as OVROUGHRTFAC. These parameters affect the behavior and outputs of the model and thus may need to be calibrated against the observations in order to obtain a good modeling performance. Having a parameter calibration tool specifically for automate calibration and uncertainty estimates of WRF-Hydro model can provide significant convenience for the modeling community. In this study, we developed a customized tool using the parallel version of the model-independent parameter estimation and uncertainty analysis tool, PEST, to enabled it to run on HPC with PBS and SLURM workload manager and job scheduler. We also developed a series of PEST input file templates that are specifically for WRF-Hydro model calibration and uncertainty analysis. Here we will present a flood case study occurred in April 2013 over Midwest. The sensitivity and uncertainties are analyzed using the customized PEST tool we developed.
Approaches in highly parameterized inversion-PESTCommander, a graphical user interface for file and run management across networks

USGS Publications Warehouse

Karanovic, Marinko; Muffels, Christopher T.; Tonkin, Matthew J.; Hunt, Randall J.

2012-01-01

Models of environmental systems have become increasingly complex, incorporating increasingly large numbers of parameters in an effort to represent physical processes on a scale approaching that at which they occur in nature. Consequently, the inverse problem of parameter estimation (specifically, model calibration) and subsequent uncertainty analysis have become increasingly computation-intensive endeavors. Fortunately, advances in computing have made computational power equivalent to that of dozens to hundreds of desktop computers accessible through a variety of alternate means: modelers have various possibilities, ranging from traditional Local Area Networks (LANs) to cloud computing. Commonly used parameter estimation software is well suited to take advantage of the availability of such increased computing power. Unfortunately, logistical issues become increasingly important as an increasing number and variety of computers are brought to bear on the inverse problem. To facilitate efficient access to disparate computer resources, the PESTCommander program documented herein has been developed to provide a Graphical User Interface (GUI) that facilitates the management of model files ("file management") and remote launching and termination of "slave" computers across a distributed network of computers ("run management"). In version 1.0 described here, PESTCommander can access and ascertain resources across traditional Windows LANs: however, the architecture of PESTCommander has been developed with the intent that future releases will be able to access computing resources (1) via trusted domains established in Wide Area Networks (WANs) in multiple remote locations and (2) via heterogeneous networks of Windows- and Unix-based operating systems. The design of PESTCommander also makes it suitable for extension to other computational resources, such as those that are available via cloud computing. Version 1.0 of PESTCommander was developed primarily to work with the parameter estimation software PEST; the discussion presented in this report focuses on the use of the PESTCommander together with Parallel PEST. However, PESTCommander can be used with a wide variety of programs and models that require management, distribution, and cleanup of files before or after model execution. In addition to its use with the Parallel PEST program suite, discussion is also included in this report regarding the use of PESTCommander with the Global Run Manager GENIE, which was developed simultaneously with PESTCommander.
A ring transducer system for medical ultrasound research.

PubMed

Waag, Robert C; Fedewa, Russell J

2006-10-01

An ultrasonic ring transducer system has been developed for experimental studies of scattering and imaging. The transducer consists of 2048 rectangular elements with a 2.5-MHz center frequency, a 67% -6 dB bandwidth, and a 0.23-mm pitch arranged in a 150-mm-diameter ring with a 25-mm elevation. At the center frequency, the element size is 0.30lambda x 42lambda and the pitch is 0.38lambda. The system has 128 parallel transmit channels, 16 parallel receive channels, a 2048:128 transmit multiplexer, a 2048:16 receive multiplexer, independently programmable transmit waveforms with 8-bit resolution, and receive amplifiers with time variable gain independently programmable over a 40-dB range. Receive signals are sampled at 20 MHz with 12-bit resolution. Arbitrary transmit and receive apertures can be synthesized. Calibration software minimizes system nonidealities caused by noncircularity of the ring and element-to-element response differences. Application software enables the system to be used by specification of high-level parameters in control files from which low-level hardware-dependent parameters are derived by specialized code. Use of the system is illustrated by producing focused and steered beams, synthesizing a spatially limited plane wave, measuring angular scattering, and forming b-scan images.
BOREAS HYD-9 Hourly and Daily Rainfall Maps for the Southern Study Area

NASA Technical Reports Server (NTRS)

Eley, F. Joe; Hall, Forrest G. (Editor); Knapp, David E. (Editor); Krauss, Terry S.; Smith, David E. (Technical Monitor)

2000-01-01

The Boreal Ecosystem-Atmosphere Study (BOREAS) Hydrology (HYD)-9 team collected data on precipitation and streamflow over portions of the Northern Study Area (NSA) and Southern Study Area (SSA). This data set contains Cartesian maps of rain accumulation for one-hour and daily periods during the summer of 1994 over the SSA only (not the full view of the radar). A parallel set of one-hour maps for the whole radar view has been prepared and is available upon request from the HYD-09 personnel. An incidental benefit of the areal selection was the elimination of some of the less accurate data, because for various reasons the radar rain estimates degrade considerably outside a range of about 100 km. The data are available in tabular ASCII files. The HYD-09 hourly and daily radar rainfall maps for the SSA are available from the Earth Observing System Data and Information System (EOSDIS) Oak Ridge National Laboratory (ORNL) Distributed Active Archive Center (DAAC). The data files are available on a CD-ROM (see document number 20010000884).
An Efficient Format for Nearly Constant-Time Access to Arbitrary Time Intervals in Large Trace Files

DOE PAGES

Chan, Anthony; Gropp, William; Lusk, Ewing

2008-01-01

A powerful method to aid in understanding the performance of parallel applications uses log or trace files containing time-stamped events and states (pairs of events). These trace files can be very large, often hundreds or even thousands of megabytes. Because of the cost of accessing and displaying such files, other methods are often used that reduce the size of the tracefiles at the cost of sacrificing detail or other information. This paper describes a hierarchical trace file format that provides for display of an arbitrary time window in a time independent of the total size of the file and roughlymore » proportional to the number of events within the time window. This format eliminates the need to sacrifice data to achieve a smaller trace file size (since storage is inexpensive, it is necessary only to make efficient use of bandwidth to that storage). The format can be used to organize a trace file or to create a separate file of annotations that may be used with conventional trace files. We present an analysis of the time to access all of the events relevant to an interval of time and we describe experiments demonstrating the performance of this file format.« less
OPAL: An Open-Source MPI-IO Library over Cray XT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yu, Weikuan; Vetter, Jeffrey S; Canon, Richard Shane

Parallel IO over Cray XT is supported by a vendor-supplied MPI-IO package. This package contains a proprietary ADIO implementation built on top of the sysio library. While it is reasonable to maintain a stable code base for application scientists' convenience, it is also very important to the system developers and researchers to analyze and assess the effectiveness of parallel IO software, and accordingly, tune and optimize the MPI-IO implementation. A proprietary parallel IO code base relinquishes such flexibilities. On the other hand, a generic UFS-based MPI-IO implementation is typically used on many Linux-based platforms. We have developed an open-source MPI-IOmore » package over Lustre, referred to as OPAL (OPportunistic and Adaptive MPI-IO Library over Lustre). OPAL provides a single source-code base for MPI-IO over Lustre on Cray XT and Linux platforms. Compared to Cray implementation, OPAL provides a number of good features, including arbitrary specification of striping patterns and Lustre-stripe aligned file domain partitioning. This paper presents the performance comparisons between OPAL and Cray's proprietary implementation. Our evaluation demonstrates that OPAL achieves the performance comparable to the Cray implementation. We also exemplify the benefits of an open source package in revealing the underpinning of the parallel IO performance.« less
GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations

PubMed Central

Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

2015-01-01

GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008
A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data

NASA Astrophysics Data System (ADS)

Li, Z.; Hodgson, M.; Li, W.

2016-12-01

Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.
P43-S Computational Biology Applications Suite for High-Performance Computing (BioHPC.net)

PubMed Central

Pillardy, J.

2007-01-01

One of the challenges of high-performance computing (HPC) is user accessibility. At the Cornell University Computational Biology Service Unit, which is also a Microsoft HPC institute, we have developed a computational biology application suite that allows researchers from biological laboratories to submit their jobs to the parallel cluster through an easy-to-use Web interface. Through this system, we are providing users with popular bioinformatics tools including BLAST, HMMER, InterproScan, and MrBayes. The system is flexible and can be easily customized to include other software. It is also scalable; the installation on our servers currently processes approximately 8500 job submissions per year, many of them requiring massively parallel computations. It also has a built-in user management system, which can limit software and/or database access to specified users. TAIR, the major database of the plant model organism Arabidopsis, and SGN, the international tomato genome database, are both using our system for storage and data analysis. The system consists of a Web server running the interface (ASP.NET C#), Microsoft SQL server (ADO.NET), compute cluster running Microsoft Windows, ftp server, and file server. Users can interact with their jobs and data via a Web browser, ftp, or e-mail. The interface is accessible at http://cbsuapps.tc.cornell.edu/.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sublet, J.-Ch.; Koning, A.J.; Forrest, R.A.

The reasons for the conversion of the European Activation File, EAF into ENDF-6 format are threefold. First, it significantly enhances the JEFF-3.0 release by the addition of an activation file. Second, to considerably increase its usage by using a recognized, official file format, allowing existing plug-in processes to be effective; and third, to move towards a universal nuclear data file in contrast to the current separate general and special-purpose files. The format chosen for the JEFF-3.0/A file uses reaction cross sections (MF-3), cross sections (MF-10), and multiplicities (MF-9). Having the data in ENDF-6 format allows the ENDF suite of utilitiesmore » and checker codes to be used alongside many other utility, visualizing, and processing codes. It is based on the EAF activation file used for many applications from fission to fusion, including dosimetry, inventories, depletion-transmutation, and geophysics. JEFF-3.0/A takes advantage of four generations of EAF files. Extensive benchmarking activities on these files provide feedback and validation with integral measurements. These, in parallel with a detailed graphical analysis based on EXFOR, have been applied stimulating new measurements, significantly increasing the quality of this activation file. The next step is to include the EAF uncertainty data for all channels into JEFF-3.0/A.« less
ParDRe: faster parallel duplicated reads removal tool for sequencing studies.

PubMed

González-Domínguez, Jorge; Schmidt, Bertil

2016-05-15

Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but can increase memory requirements and computational time of downstream analysis. In this work we present ParDRe, a de novo parallel tool to remove duplicated and near-duplicated reads through the clustering of Single-End or Paired-End sequences from fasta or fastq files. It uses a novel bitwise approach to compare the suffixes of DNA strings and employs hybrid MPI/multithreading to reduce runtime on multicore systems. We show that ParDRe is up to 27.29 times faster than Fulcrum (a representative state-of-the-art tool) on a platform with two 8-core Sandy-Bridge processors. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/pardre/ jgonzalezd@udc.es. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

VO-KOREL: A Fourier Disentangling Service of the Virtual Observatory

NASA Astrophysics Data System (ADS)

Škoda, Petr; Hadrava, Petr; Fuchs, Jan

2012-04-01

VO-KOREL is a web service exploiting the technology of the Virtual Observatory for providing astronomers with the intuitive graphical front-end and distributed computing back-end running the most recent version of the Fourier disentangling code KOREL. The system integrates the ideas of the e-shop basket, conserving the privacy of every user by transfer encryption and access authentication, with features of laboratory notebook, allowing the easy housekeeping of both input parameters and final results, as well as it explores a newly emerging technology of cloud computing. While the web-based front-end allows the user to submit data and parameter files, edit parameters, manage a job list, resubmit or cancel running jobs and mainly watching the text and graphical results of a disentangling process, the main part of the back-end is a simple job queue submission system executing in parallel multiple instances of the FORTRAN code KOREL. This may be easily extended for GRID-based deployment on massively parallel computing clusters. The short introduction into underlying technologies is given, briefly mentioning advantages as well as bottlenecks of the design used.
Canal transportation and centering ability of protaper and self-adjusting file system in long oval canals: An ex-vivo cone-beam computed tomography analysis.

PubMed

Shah, Dipali Yogesh; Wadekar, Swati Ishwara; Dadpe, Ashwini Manish; Jadhav, Ganesh Ranganath; Choudhary, Lalit Jayant; Kalra, Dheeraj Deepak

2017-01-01

The purpose of this study was to compare and evaluate the shaping ability of ProTaper (PT) and Self-Adjusting File (SAF) system using cone-beam computed tomography (CBCT) to assess their performance in oval-shaped root canals. Sixty-two mandibular premolars with single oval canals were divided into two experimental groups ( n = 31) according to the systems used: Group I - PT and Group II - SAF. Canals were evaluated before and after instrumentation using CBCT to assess centering ratio and canal transportation at three levels. Data were statistically analyzed using one-way analysis of variance, post hoc Tukey's test, and t -test. The SAF showed better centering ability and lesser canal transportation than the PT only in the buccolingual plane at 6 and 9 mm levels. The shaping ability of the PT was best in the apical third in both the planes. The SAF had statistically significant better centering and lesser canal transportation in the buccolingual as compared to the mesiodistal plane at the middle and coronal levels. The SAF produced significantly less transportation and remained centered than the PT at the middle and coronal levels in the buccolingual plane of oval canals. In the mesiodistal plane, the performance of both the systems was parallel.
OceanXtremes: Scalable Anomaly Detection in Oceanographic Time-Series

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Armstrong, E. M.; Chin, T. M.; Gill, K. M.; Greguska, F. R., III; Huang, T.; Jacob, J. C.; Quach, N.

2016-12-01

The oceanographic community must meet the challenge to rapidly identify features and anomalies in complex and voluminous observations to further science and improve decision support. Given this data-intensive reality, we are developing an anomaly detection system, called OceanXtremes, powered by an intelligent, elastic Cloud-based analytic service backend that enables execution of domain-specific, multi-scale anomaly and feature detection algorithms across the entire archive of 15 to 30-year ocean science datasets.Our parallel analytics engine is extending the NEXUS system and exploits multiple open-source technologies: Apache Cassandra as a distributed spatial "tile" cache, Apache Spark for in-memory parallel computation, and Apache Solr for spatial search and storing pre-computed tile statistics and other metadata. OceanXtremes provides these key capabilities: Parallel generation (Spark on a compute cluster) of 15 to 30-year Ocean Climatologies (e.g. sea surface temperature or SST) in hours or overnight, using simple pixel averages or customizable Gaussian-weighted "smoothing" over latitude, longitude, and time; Parallel pre-computation, tiling, and caching of anomaly fields (daily variables minus a chosen climatology) with pre-computed tile statistics; Parallel detection (over the time-series of tiles) of anomalies or phenomena by regional area-averages exceeding a specified threshold (e.g. high SST in El Nino or SST "blob" regions), or more complex, custom data mining algorithms; Shared discovery and exploration of ocean phenomena and anomalies (facet search using Solr), along with unexpected correlations between key measured variables; Scalable execution for all capabilities on a hybrid Cloud, using our on-premise OpenStack Cloud cluster or at Amazon. The key idea is that the parallel data-mining operations will be run "near" the ocean data archives (a local "network" hop) so that we can efficiently access the thousands of files making up a three decade time-series. The presentation will cover the architecture of OceanXtremes, parallelization of the climatology computation and anomaly detection algorithms using Spark, example results for SST and other time-series, and parallel performance metrics.
DSN Beowulf Cluster-Based VLBI Correlator

NASA Technical Reports Server (NTRS)

Rogstad, Stephen P.; Jongeling, Andre P.; Finley, Susan G.; White, Leslie A.; Lanyi, Gabor E.; Clark, John E.; Goodhart, Charles E.

2009-01-01

The NASA Deep Space Network (DSN) requires a broadband VLBI (very long baseline interferometry) correlator to process data routinely taken as part of the VLBI source Catalogue Maintenance and Enhancement task (CAT M&E) and the Time and Earth Motion Precision Observations task (TEMPO). The data provided by these measurements are a crucial ingredient in the formation of precision deep-space navigation models. In addition, a VLBI correlator is needed to provide support for other VLBI related activities for both internal and external customers. The JPL VLBI Correlator (JVC) was designed, developed, and delivered to the DSN as a successor to the legacy Block II Correlator. The JVC is a full-capability VLBI correlator that uses software processes running on multiple computers to cross-correlate two-antenna broadband noise data. Components of this new system (see Figure 1) consist of Linux PCs integrated into a Beowulf Cluster, an existing Mark5 data storage system, a RAID array, an existing software correlator package (SoftC) originally developed for Delta DOR Navigation processing, and various custom- developed software processes and scripts. Parallel processing on the JVC is achieved by assigning slave nodes of the Beowulf cluster to process separate scans in parallel until all scans have been processed. Due to the single stream sequential playback of the Mark5 data, some ramp-up time is required before all nodes can have access to required scan data. Core functions of each processing step are accomplished using optimized C programs. The coordination and execution of these programs across the cluster is accomplished using Pearl scripts, PostgreSQL commands, and a handful of miscellaneous system utilities. Mark5 data modules are loaded on Mark5 Data systems playback units, one per station. Data processing is started when the operator scans the Mark5 systems and runs a script that reads various configuration files and then creates an experiment-dependent status database used to delegate parallel tasks between nodes and storage areas (see Figure 2). This script forks into three processes: extract, translate, and correlate. Each of these processes iterates on available scan data and updates the status database as the work for each scan is completed. The extract process coordinates and monitors the transfer of data from each of the Mark5s to the Beowulf RAID storage systems. The translate process monitors and executes the data conversion processes on available scan files, and writes the translated files to the slave nodes. The correlate process monitors the execution of SoftC correlation processes on the slave nodes for scans that have completed translation. A comparison of the JVC and the legacy Block II correlator outputs reveals they are well within a formal error, and that the data are comparable with respect to their use in flight navigation. The processing speed of the JVC is improved over the Block II correlator by a factor of 4, largely due to the elimination of the reel-to-reel tape drives used in the Block II correlator.
TU-AB-BRC-07: Efficiency of An IAEA Phase-Space Source for a Low Energy X-Ray Tube Using Egs++

DOE Office of Scientific and Technical Information (OSTI.GOV)

Watson, PGF; Renaud, MA; Seuntjens, J

Purpose: To extend the capability of the EGSnrc C++ class library (egs++) to write and read IAEA phase-space files as a particle source, and to assess the relative efficiency gain in dose calculation using an IAEA phase-space source for modelling a miniature low energy x-ray source. Methods: We created a new ausgab object to score particles exiting a user-defined geometry and write them to an IAEA phase-space file. A new particle source was created to read from IAEA phase-space data. With these tools, a phase-space file was generated for particles exiting a miniature 50 kVp x-ray tube (The INTRABEAM System,more » Carl Zeiss). The phase-space source was validated by comparing calculated PDDs with a full electron source simulation of the INTRABEAM. The dose calculation efficiency gain of the phase-space source was determined relative to the full simulation. The efficiency gain as a function of i) depth in water, and ii) job parallelization was investigated. Results: The phase-space and electron source PDDs were found to agree to 0.5% RMS, comparable to statistical uncertainties. The use of a phase-space source for the INTRABEAM led to a relative efficiency gain of greater than 20 over the full electron source simulation, with an increase of up to a factor of 196. The efficiency gain was found to decrease with depth in water, due to the influence of scattering. Job parallelization (across 2 to 256 cores) was not found to have any detrimental effect on efficiency gain. Conclusion: A set of tools has been developed for writing and reading IAEA phase-space files, which can be used with any egs++ user code. For simulation of a low energy x-ray tube, the use of a phase-space source was found to increase the relative dose calculation efficiency by factor of up to 196. The authors acknowledge partial support by the CREATE Medical Physics Research Training Network grant of the Natural Sciences and Engineering Research Council (Grant No. 432290).« less
Near Real Time Processing Chain for Suomi NPP Satellite Data

NASA Astrophysics Data System (ADS)

Monsorno, Roberto; Cuozzo, Giovanni; Costa, Armin; Mateescu, Gabriel; Ventura, Bartolomeo; Zebisch, Marc

2014-05-01

Since 2009, the EURAC satellite receiving station, located at Corno del Renon, in a free obstacle site at 2260 m a.s.l., has been acquiring data from Aqua and Terra NASA satellites equipped with Moderate Resolution Imaging Spectroradiometer (MODIS) sensors. The experience gained with this local ground segmenthas given the opportunity of adapting and modifying the processing chain for MODIS data to the Suomi NPP, the natural successor to Terra and Aqua satellites. The processing chain, initially implemented by mean of a proprietary system supplied by Seaspace and Advanced Computer System, was further developed by EURAC's Institute for Applied Remote Sensing engineers. Several algorithms have been developed using MODIS and Visible Infrared Imaging Radiometer Suite (VIIRS) data to produce Snow Cover, Particulate Matter estimation and Meteo maps. These products are implemented on a common processor structure based on the use of configuration files and a generic processor. Data and products have then automatically delivered to the customers such as the Autonomous Province of Bolzano-Civil Protection office. For the processing phase we defined two goals: i) the adaptation and implementation of the products already available for MODIS (and possibly new ones) to VIIRS, that is one of the sensors onboard Suomi NPP; ii) the use of an open source processing chain in order to process NPP data in Near Real Time, exploiting the knowledge we acquired on parallel computing. In order to achieve the second goal, the S-NPP data received and ingested are sent as input to RT-STPS (Real-time Software Telemetry Processing System) software developed by the NASA Direct Readout Laboratory 1 (DRL) that gives as output RDR files (Raw Data Record) for VIIRS, ATMS (Advanced Technology Micorwave Sounder) and CrIS (Cross-track Infrared Sounder)sensors. RDR are then transferred to a server equipped with CSPP2 (Community Satellite Processing Package) software developed by the University of Wisconsin. CSPP subdivides the input file in granules, making possible the use of parallel computing, and produces SDR (Science Data Record) and some EDR (Environmental Data Record) products. The integration with the EDRs not yet available with CSPP is realized with the use of SPAs (Science Processing Algorithm) stand-alone version by DRL. The important result of this system consists in the possibility of processing data acquired by the EURAC antenna with open source software and delivering the SDRs, EDRs and higher level products developed internally by EURAC in near real time using a Data Exchange Server. By means of the parallelized CSPP, SDR data are currently available after about 7 minutes since the production of RDR, while we are currently implementing a strategy to get the best possible processing time for the EDRs products that are in principle not parallelizable. 1. http://directreadout.sci.gsfc.nasa.gov/ 2. http://cimss.ssec.wisc.edu/cspp/
Understanding and Improving High-Performance I/O Subsystems

NASA Technical Reports Server (NTRS)

El-Ghazawi, Tarek A.; Frieder, Gideon; Clark, A. James

1996-01-01

This research program has been conducted in the framework of the NASA Earth and Space Science (ESS) evaluations led by Dr. Thomas Sterling. In addition to the many important research findings for NASA and the prestigious publications, the program has helped orienting the doctoral research program of two students towards parallel input/output in high-performance computing. Further, the experimental results in the case of the MasPar were very useful and helpful to MasPar with which the P.I. has had many interactions with the technical management. The contributions of this program are drawn from three experimental studies conducted on different high-performance computing testbeds/platforms, and therefore presented in 3 different segments as follows: 1. Evaluating the parallel input/output subsystem of a NASA high-performance computing testbeds, namely the MasPar MP- 1 and MP-2; 2. Characterizing the physical input/output request patterns for NASA ESS applications, which used the Beowulf platform; and 3. Dynamic scheduling techniques for hiding I/O latency in parallel applications such as sparse matrix computations. This study also has been conducted on the Intel Paragon and has also provided an experimental evaluation for the Parallel File System (PFS) and parallel input/output on the Paragon. This report is organized as follows. The summary of findings discusses the results of each of the aforementioned 3 studies. Three appendices, each containing a key scholarly research paper that details the work in one of the studies are included.
78 FR 9687 - Prineville Energy Storage, LLC; Notice of Preliminary Permit Application Accepted for Filing and...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-02-11

... PDCI to the Ponderosa substation, or (ii) the Bonneville Power Administration (BPA) existing transmission line corridor and then running parallel to the BPA line to the Ponderosa substation; and (9...
High-performance mass storage system for workstations

NASA Technical Reports Server (NTRS)

Chiang, T.; Tang, Y.; Gupta, L.; Cooperman, S.

1993-01-01

Reduced Instruction Set Computer (RISC) workstations and Personnel Computers (PC) are very popular tools for office automation, command and control, scientific analysis, database management, and many other applications. However, when using Input/Output (I/O) intensive applications, the RISC workstations and PC's are often overburdened with the tasks of collecting, staging, storing, and distributing data. Also, by using standard high-performance peripherals and storage devices, the I/O function can still be a common bottleneck process. Therefore, the high-performance mass storage system, developed by Loral AeroSys' Independent Research and Development (IR&D) engineers, can offload a RISC workstation of I/O related functions and provide high-performance I/O functions and external interfaces. The high-performance mass storage system has the capabilities to ingest high-speed real-time data, perform signal or image processing, and stage, archive, and distribute the data. This mass storage system uses a hierarchical storage structure, thus reducing the total data storage cost, while maintaining high-I/O performance. The high-performance mass storage system is a network of low-cost parallel processors and storage devices. The nodes in the network have special I/O functions such as: SCSI controller, Ethernet controller, gateway controller, RS232 controller, IEEE488 controller, and digital/analog converter. The nodes are interconnected through high-speed direct memory access links to form a network. The topology of the network is easily reconfigurable to maximize system throughput for various applications. This high-performance mass storage system takes advantage of a 'busless' architecture for maximum expandability. The mass storage system consists of magnetic disks, a WORM optical disk jukebox, and an 8mm helical scan tape to form a hierarchical storage structure. Commonly used files are kept in the magnetic disk for fast retrieval. The optical disks are used as archive media, and the tapes are used as backup media. The storage system is managed by the IEEE mass storage reference model-based UniTree software package. UniTree software will keep track of all files in the system, will automatically migrate the lesser used files to archive media, and will stage the files when needed by the system. The user can access the files without knowledge of their physical location. The high-performance mass storage system developed by Loral AeroSys will significantly boost the system I/O performance and reduce the overall data storage cost. This storage system provides a highly flexible and cost-effective architecture for a variety of applications (e.g., realtime data acquisition with a signal and image processing requirement, long-term data archiving and distribution, and image analysis and enhancement).
An effective XML based name mapping mechanism within StoRM

NASA Astrophysics Data System (ADS)

Corso, E.; Forti, A.; Ghiselli, A.; Magnoni, L.; Zappi, R.

2008-07-01

In a Grid environment the naming capability allows users to refer to specific data resources in a physical storage system using a high level logical identifier. This logical identifier is typically organized in a file system like structure, a hierarchical tree of names. Storage Resource Manager (SRM) services map the logical identifier to the physical location of data evaluating a set of parameters as the desired quality of services and the VOMS attributes specified in the requests. StoRM is a SRM service developed by INFN and ICTP-EGRID to manage file and space on standard POSIX and high performing parallel and cluster file systems. An upcoming requirement in the Grid data scenario is the orthogonality of the logical name and the physical location of data, in order to refer, with the same identifier, to different copies of data archived in various storage areas with different quality of service. The mapping mechanism proposed in StoRM is based on a XML document that represents the different storage components managed by the service, the storage areas defined by the site administrator, the quality of service they provide and the Virtual Organization that want to use the storage area. An appropriate directory tree is realized in each storage component reflecting the XML schema. In this scenario StoRM is able to identify the physical location of a requested data evaluating the logical identifier and the specified attributes following the XML schema, without querying any database service. This paper presents the namespace schema defined, the different entities represented and the technical details of the StoRM implementation.
AESOP: A Python Library for Investigating Electrostatics in Protein Interactions.

PubMed

Harrison, Reed E S; Mohan, Rohith R; Gorham, Ronald D; Kieslich, Chris A; Morikis, Dimitrios

2017-05-09

Electric fields often play a role in guiding the association of protein complexes. Such interactions can be further engineered to accelerate complex association, resulting in protein systems with increased productivity. This is especially true for enzymes where reaction rates are typically diffusion limited. To facilitate quantitative comparisons of electrostatics in protein families and to describe electrostatic contributions of individual amino acids, we previously developed a computational framework called AESOP. We now implement this computational tool in Python with increased usability and the capability of performing calculations in parallel. AESOP utilizes PDB2PQR and Adaptive Poisson-Boltzmann Solver to generate grid-based electrostatic potential files for protein structures provided by the end user. There are methods within AESOP for quantitatively comparing sets of grid-based electrostatic potentials in terms of similarity or generating ensembles of electrostatic potential files for a library of mutants to quantify the effects of perturbations in protein structure and protein-protein association. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.
The use of ZFP lossy floating point data compression in tornado-resolving thunderstorm simulations

NASA Astrophysics Data System (ADS)

Orf, L.

2017-12-01

In the field of atmospheric science, numerical models are used to produce forecasts of weather and climate and serve as virtual laboratories for scientists studying atmospheric phenomena. In both operational and research arenas, atmospheric simulations exploiting modern supercomputing hardware can produce a tremendous amount of data. During model execution, the transfer of floating point data from memory to the file system is often a significant bottleneck where I/O can dominate wallclock time. One way to reduce the I/O footprint is to compress the floating point data, which reduces amount of data saved to the file system. In this presentation we introduce LOFS, a file system developed specifically for use in three-dimensional numerical weather models that are run on massively parallel supercomputers. LOFS utilizes the core (in-memory buffered) HDF5 driver and includes compression options including ZFP, a lossy floating point data compression algorithm. ZFP offers several mechanisms for specifying the amount of lossy compression to be applied to floating point data, including the ability to specify the maximum absolute error allowed in each compressed 3D array. We explore different maximum error tolerances in a tornado-resolving supercell thunderstorm simulation for model variables including cloud and precipitation, temperature, wind velocity and vorticity magnitude. We find that average compression ratios exceeding 20:1 in scientifically interesting regions of the simulation domain produce visually identical results to uncompressed data in visualizations and plots. Since LOFS splits the model domain across many files, compression ratios for a given error tolerance can be compared across different locations within the model domain. We find that regions of high spatial variability (which tend to be where scientifically interesting things are occurring) show the lowest compression ratios, whereas regions of the domain with little spatial variability compress extremely well. We observe that the overhead for compressing data with ZFP is low, and that compressing data in memory reduces the amount of memory overhead needed to store the virtual files before they are flushed to disk.
Striped Data Server for Scalable Parallel Data Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chang, Jin; Gutsche, Oliver; Mandrichenko, Igor

A columnar data representation is known to be an efficient way for data storage, specifically in cases when the analysis is often done based only on a small fragment of the available data structures. A data representation like Apache Parquet is a step forward from a columnar representation, which splits data horizontally to allow for easy parallelization of data analysis. Based on the general idea of columnar data storage, working on the [LDRD Project], we have developed a striped data representation, which, we believe, is better suited to the needs of High Energy Physics data analysis. A traditional columnar approachmore » allows for efficient data analysis of complex structures. While keeping all the benefits of columnar data representations, the striped mechanism goes further by enabling easy parallelization of computations without requiring special hardware. We will present an implementation and some performance characteristics of such a data representation mechanism using a distributed no-SQL database or a local file system, unified under the same API and data representation model. The representation is efficient and at the same time simple so that it allows for a common data model and APIs for wide range of underlying storage mechanisms such as distributed no-SQL databases and local file systems. Striped storage adopts Numpy arrays as its basic data representation format, which makes it easy and efficient to use in Python applications. The Striped Data Server is a web service, which allows to hide the server implementation details from the end user, easily exposes data to WAN users, and allows to utilize well known and developed data caching solutions to further increase data access efficiency. We are considering the Striped Data Server as the core of an enterprise scale data analysis platform for High Energy Physics and similar areas of data processing. We have been testing this architecture with a 2TB dataset from a CMS dark matter search and plan to expand it to multiple 100 TB or even PB scale. We will present the striped format, Striped Data Server architecture and performance test results.« less
SCELib2: the new revision of SCELib, the parallel computational library of molecular properties in the single center approach

NASA Astrophysics Data System (ADS)

Sanna, N.; Morelli, G.

2004-09-01

In this paper we present the new version of the SCELib program (CPC Catalogue identifier ADMG) a full numerical implementation of the Single Center Expansion (SCE) method. The physics involved is that of producing the SCE description of molecular electronic densities, of molecular electrostatic potentials and of molecular perturbed potentials due to a point negative or positive charge. This new revision of the program has been optimized to run in serial as well as in parallel execution mode, to support a larger set of molecular symmetries and to permit the restart of long-lasting calculations. To measure the performance of this new release, a comparative study has been carried out on the most powerful computing architectures in serial and parallel runs. The results of the calculations reported in this paper refer to real cases medium to large molecular systems and they are reported in full details to benchmark at best the parallel architectures the new SCELib code will run on. Program summaryTitle of program: SCELib2 Catalogue identifier: ADGU Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADGU Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Reference to previous versions: Comput. Phys. Commun. 128 (2) (2000) 139 (CPC catalogue identifier: ADMG) Does the new version supersede the original program?: Yes Computer for which the program is designed and others on which it has been tested: HP ES45 and rx2600, SUN ES4500, IBM SP and any single CPU workstation based on Alpha, SPARC, POWER, Itanium2 and X86 processors Installations: CASPUR, local Operating systems under which the program has been tested: HP Tru64 V5.X, SUNOS V5.8, IBM AIX V5.X, Linux RedHat V8.0 Programming language used: C Memory required to execute with typical data: 10 Mwords. Up to 2000 Mwords depending on the molecular system and runtime parameters No. of bits in a word: 64 No. of processors used: 1 to 32 Has the code been vectorized or parallelized?: Yes No. of bytes in distributed program, including test data, etc.: 3 798 507 No. of lines in distributed program, including test data, etc.: 187 226 Distribution format: tar.gz Nature of physical problem: In this set of codes an efficient procedure is implemented to describe the wavefunction and related molecular properties of a polyatomic molecular system within the Single Center of Expansion (SCE) approximation. The resulting SCE wavefunction, electron density, electrostatic and exchange/correlation potentials can then be used via a proper Application Programming Interface (API) to describe the target molecular system which can be employed in electron-molecule scattering calculations. The molecular properties expanded over a single center turn out to also be of more general application and some possible uses in quantum chemistry, biomodelling and drug design are also outlined. Method of solution: The polycentre Hartee-Fock solution for a molecule of arbitrary geometry, based on linear combination of Gaussian-Type Orbital (GTO), is expanded over a single center, typically the Center Of Mass (C.O.M.), by means of a Gauss-Legendre/Chebyschev quadrature over the θ, φ angular coordinates. The resulting SCE numerical wavefunction is then used to calculate the one-particle electron density, the electrostatic potential and two different models for the correlation/polarization potentials induced by the impinging electron, which have the correct asymptotic behaviour for the leading dipole molecular polarizabilities. Restrictions on the complexity of the problem: Depending on the molecular system under study and on the operating conditions the program may or may not fit into available RAM memory. In this case a feature of the program is to memory map a disk file in order to efficiently access the memory data through a disk device. Typical running time: The execution time strongly depends on the molecular target description and on the hardware/OS chosen, it is directly proportional to the ( r, θ, φ) grid size and to the number of angular basis functions used. Thus, from the program printout of the main arrays memory occupancy, the user can approximately derive the expected computer time needed for a given calculation executed in serial mode. For parallel executions the overall efficiency must be further taken into account, and this depends on the no. of processors used as well as on the parallel architecture chosen, so a simple general law is at present not determinable. Unusual features of the program: The code has been engineered to use dynamical, runtime determined, global parameters with the aim to have all the data fitted in the RAM memory. Some unusual circumstances, e.g., when using large values of those parameters, may cause the program to run with unexpected performance reductions due to runtime bottlenecks like those caused by memory swap operations which strongly depend on the hardware used. In such cases, a parallel execution of the code is generally sufficient to fix the problem since the data size is partitioned over the available processors. When a suitable parallel system is not available for execution, a mechanism of memory mapped file can be used; with this option on, all the available memory will be used as a buffer for a disk file which contains the whole data set, thus having a better throughput with respect to the traditional swapping/paging of the Unix OS.
Cloud Engineering Principles and Technology Enablers for Medical Image Processing-as-a-Service

PubMed Central

Bao, Shunxing; Plassard, Andrew J.; Landman, Bennett A.; Gokhale, Aniruddha

2017-01-01

Traditional in-house, laboratory-based medical imaging studies use hierarchical data structures (e.g., NFS file stores) or databases (e.g., COINS, XNAT) for storage and retrieval. The resulting performance from these approaches is, however, impeded by standard network switches since they can saturate network bandwidth during transfer from storage to processing nodes for even moderate-sized studies. To that end, a cloud-based “medical image processing-as-a-service” offers promise in utilizing the ecosystem of Apache Hadoop, which is a flexible framework providing distributed, scalable, fault tolerant storage and parallel computational modules, and HBase, which is a NoSQL database built atop Hadoop’s distributed file system. Despite this promise, HBase’s load distribution strategy of region split and merge is detrimental to the hierarchical organization of imaging data (e.g., project, subject, session, scan, slice). This paper makes two contributions to address these concerns by describing key cloud engineering principles and technology enhancements we made to the Apache Hadoop ecosystem for medical imaging applications. First, we propose a row-key design for HBase, which is a necessary step that is driven by the hierarchical organization of imaging data. Second, we propose a novel data allocation policy within HBase to strongly enforce collocation of hierarchically related imaging data. The proposed enhancements accelerate data processing by minimizing network usage and localizing processing to machines where the data already exist. Moreover, our approach is amenable to the traditional scan, subject, and project-level analysis procedures, and is compatible with standard command line/scriptable image processing software. Experimental results for an illustrative sample of imaging data reveals that our new HBase policy results in a three-fold time improvement in conversion of classic DICOM to NiFTI file formats when compared with the default HBase region split policy, and nearly a six-fold improvement over a commonly available network file system (NFS) approach even for relatively small file sets. Moreover, file access latency is lower than network attached storage. PMID:28884169
An approach to enhance pnetCDF performance in ...

EPA Pesticide Factsheets

Data intensive simulations are often limited by their I/O (input/output) performance, and "novel" techniques need to be developed in order to overcome this limitation. The software package pnetCDF (parallel network Common Data Form), which works with parallel file systems, was developed to address this issue by providing parallel I/O capability. This study examines the performance of an application-level data aggregation approach which performs data aggregation along either row or column dimension of MPI (Message Passing Interface) processes on a spatially decomposed domain, and then applies the pnetCDF parallel I/O paradigm. The test was done with three different domain sizes which represent small, moderately large, and large data domains, using a small-scale Community Multiscale Air Quality model (CMAQ) mock-up code. The examination includes comparing I/O performance with traditional serial I/O technique, straight application of pnetCDF, and the data aggregation along row and column dimension before applying pnetCDF. After the comparison, "optimal" I/O configurations of this application-level data aggregation approach were quantified. Data aggregation along the row dimension (pnetCDFcr) works better than along the column dimension (pnetCDFcc) although it may perform slightly worse than the straight pnetCDF method with a small number of processors. When the number of processors becomes larger, pnetCDFcr outperforms pnetCDF significantly. If the number of proces
Benchmark Comparison of Cloud Analytics Methods Applied to Earth Observations

NASA Technical Reports Server (NTRS)

Lynnes, Chris; Little, Mike; Huang, Thomas; Jacob, Joseph; Yang, Phil; Kuo, Kwo-Sen

2016-01-01

Cloud computing has the potential to bring high performance computing capabilities to the average science researcher. However, in order to take full advantage of cloud capabilities, the science data used in the analysis must often be reorganized. This typically involves sharding the data across multiple nodes to enable relatively fine-grained parallelism. This can be either via cloud-based file systems or cloud-enabled databases such as Cassandra, Rasdaman or SciDB. Since storing an extra copy of data leads to increased cost and data management complexity, NASA is interested in determining the benefits and costs of various cloud analytics methods for real Earth Observation cases. Accordingly, NASA's Earth Science Technology Office and Earth Science Data and Information Systems project have teamed with cloud analytics practitioners to run a benchmark comparison on cloud analytics methods using the same input data and analysis algorithms. We have particularly looked at analysis algorithms that work over long time series, because these are particularly intractable for many Earth Observation datasets which typically store data with one or just a few time steps per file. This post will present side-by-side cost and performance results for several common Earth observation analysis operations.
DMFS: A Data Migration File System for NetBSD

NASA Technical Reports Server (NTRS)

Studenmund, William

1999-01-01

I have recently developed dmfs, a Data Migration File System, for NetBSD. This file system is based on the overlay file system, which is discussed in a separate paper, and provides kernel support for the data migration system being developed by my research group here at NASA/Ames. The file system utilizes an underlying file store to provide the file backing, and coordinates user and system access to the files. It stores its internal meta data in a flat file, which resides on a separate file system. Our data migration system provides archiving and file migration services. System utilities scan the dmfs file system for recently modified files, and archive them to two separate tape stores. Once a file has been doubly archived, files larger than a specified size will be truncated to that size, potentially freeing up large amounts of the underlying file store. Some sites will choose to retain none of the file (deleting its contents entirely from the file system) while others may choose to retain a portion, for instance a preamble describing the remainder of the file. The dmfs layer coordinates access to the file, retaining user-perceived access and modification times, file size, and restricting access to partially migrated files to the portion actually resident. When a user process attempts to read from the non-resident portion of a file, it is blocked and the dmfs layer sends a request to a system daemon to restore the file. As more of the file becomes resident, the user process is permitted to begin accessing the now-resident portions of the file. For simplicity, our data migration system divides a file into two portions, a resident portion followed by an optional non-resident portion. Also, a file is in one of three states: fully resident, fully resident and archived, and (partially) non-resident and archived. For a file which is only partially resident, any attempt to write or truncate the file, or to read a non-resident portion, will trigger a file restoration. Truncations and writes are blocked until the file is fully restored so that a restoration which only partially succeed does not leave the file in an indeterminate state with portions existing only on tape and other portions only in the disk file system. We chose layered file system technology as it permits us to focus on the data migration functionality, and permits end system administrators to choose the underlying file store technology. We chose the overlay layered file system instead of the null layer for two reasons: first to permit our layer to better preserve meta data integrity and second to prevent even root processes from accessing migrated files. This is achieved as the underlying file store becomes inaccessible once the dmfs layer is mounted. We are quite pleased with how the layered file system has turned out. Of the 45 vnode operations in NetBSD, 20 (forty-four percent) required no intervention by our file layer - they are passed directly to the underlying file store. Of the twenty five we do intercept, nine (such as vop_create()) are intercepted only to ensure meta data integrity. Most of the functionality was concentrated in five operations: vop_read, vop_write, vop_getattr, vop_setattr, and vop_fcntl. The first four are the core operations for controlling access to migrated files and preserving the user experience. vop_fcntl, a call generated for a certain class of fcntl codes, provides the command channel used by privileged user programs to communicate with the dmfs layer.
Livermore Big Artificial Neural Network Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Essen, Brian Van; Jacobs, Sam; Kim, Hyojin

2016-07-01

LBANN is a toolkit that is designed to train artificial neural networks efficiently on high performance computing architectures. It is optimized to take advantages of key High Performance Computing features to accelerate neural network training. Specifically it is optimized for low-latency, high bandwidth interconnects, node-local NVRAM, node-local GPU accelerators, and high bandwidth parallel file systems. It is built on top of the open source Elemental distributed-memory dense and spars-direct linear algebra and optimization library that is released under the BSD license. The algorithms contained within LBANN are drawn from the academic literature and implemented to work within a distributed-memory framework.
Accessing files in an Internet: The Jade file system

NASA Technical Reports Server (NTRS)

Peterson, Larry L.; Rao, Herman C.

1991-01-01

Jade is a new distribution file system that provides a uniform way to name and access files in an internet environment. It makes two important contributions. First, Jade is a logical system that integrates a heterogeneous collection of existing file systems, where heterogeneous means that the underlying file systems support different file access protocols. Jade is designed under the restriction that the underlying file system may not be modified. Second, rather than providing a global name space, Jade permits each user to define a private name space. These private name spaces support two novel features: they allow multiple file systems to be mounted under one directory, and they allow one logical name space to mount other logical name spaces. A prototype of the Jade File System was implemented on Sun Workstations running Unix. It consists of interfaces to the Unix file system, the Sun Network File System, the Andrew File System, and FTP. This paper motivates Jade's design, highlights several aspects of its implementation, and illustrates applications that can take advantage of its features.

Accessing files in an internet - The Jade file system

NASA Technical Reports Server (NTRS)

Rao, Herman C.; Peterson, Larry L.

1993-01-01

Jade is a new distribution file system that provides a uniform way to name and access files in an internet environment. It makes two important contributions. First, Jade is a logical system that integrates a heterogeneous collection of existing file systems, where heterogeneous means that the underlying file systems support different file access protocols. Jade is designed under the restriction that the underlying file system may not be modified. Second, rather than providing a global name space, Jade permits each user to define a private name space. These private name spaces support two novel features: they allow multiple file systems to be mounted under one directory, and they allow one logical name space to mount other logical name spaces. A prototype of the Jade File System was implemented on Sun Workstations running Unix. It consists of interfaces to the Unix file system, the Sun Network File System, the Andrew File System, and FTP. This paper motivates Jade's design, highlights several aspects of its implementation, and illustrates applications that can take advantage of its features.
Highly parallel reconfigurable computer architecture for robotic computation having plural processor cells each having right and left ensembles of plural processors

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

1994-01-01

In a computer having a large number of single-instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.
a Hadoop-Based Distributed Framework for Efficient Managing and Processing Big Remote Sensing Images

NASA Astrophysics Data System (ADS)

Wang, C.; Hu, F.; Hu, X.; Zhao, S.; Wen, W.; Yang, C.

2015-07-01

Various sensors from airborne and satellite platforms are producing large volumes of remote sensing images for mapping, environmental monitoring, disaster management, military intelligence, and others. However, it is challenging to efficiently storage, query and process such big data due to the data- and computing- intensive issues. In this paper, a Hadoop-based framework is proposed to manage and process the big remote sensing data in a distributed and parallel manner. Especially, remote sensing data can be directly fetched from other data platforms into the Hadoop Distributed File System (HDFS). The Orfeo toolbox, a ready-to-use tool for large image processing, is integrated into MapReduce to provide affluent image processing operations. With the integration of HDFS, Orfeo toolbox and MapReduce, these remote sensing images can be directly processed in parallel in a scalable computing environment. The experiment results show that the proposed framework can efficiently manage and process such big remote sensing data.
Mcqueuer

DOE Office of Scientific and Technical Information (OSTI.GOV)

2016-09-12

Mcqueuer is a simple tool that allows anyone from researchers to experienced developers to create multi-node/multi-core jobs by simply creating a file with a list of commands. Users simply combine tasks, which would otherwise each be their own job on the cluster, into a single file that is given to Mcqueuer. Mcqueuer then does the heavy lifting required to process the tasks in parallel in a single multi-node job. In addition, Mcqueuer provides load-balancing, which frees the user from having to worry about complex memory and CPU considerations, and instead focus on the processing itself.
SCORPIO: A Scalable Two-Phase Parallel I/O Library With Application To A Large Scale Subsurface Simulator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T

2013-01-01

Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting themore » I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library, SCORPIO (SCalable block-ORiented Parallel I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.« less
Mash-up of techniques between data crawling/transfer, data preservation/stewardship and data processing/visualization technologies on a science cloud system designed for Earth and space science: a report of successful operation and science projects of the NICT Science Cloud

NASA Astrophysics Data System (ADS)

Murata, K. T.

2014-12-01

Data-intensive or data-centric science is 4th paradigm after observational and/or experimental science (1st paradigm), theoretical science (2nd paradigm) and numerical science (3rd paradigm). Science cloud is an infrastructure for 4th science methodology. The NICT science cloud is designed for big data sciences of Earth, space and other sciences based on modern informatics and information technologies [1]. Data flow on the cloud is through the following three techniques; (1) data crawling and transfer, (2) data preservation and stewardship, and (3) data processing and visualization. Original tools and applications of these techniques have been designed and implemented. We mash up these tools and applications on the NICT Science Cloud to build up customized systems for each project. In this paper, we discuss science data processing through these three steps. For big data science, data file deployment on a distributed storage system should be well designed in order to save storage cost and transfer time. We developed a high-bandwidth virtual remote storage system (HbVRS) and data crawling tool, NICTY/DLA and Wide-area Observation Network Monitoring (WONM) system, respectively. Data files are saved on the cloud storage system according to both data preservation policy and data processing plan. The storage system is developed via distributed file system middle-ware (Gfarm: GRID datafarm). It is effective since disaster recovery (DR) and parallel data processing are carried out simultaneously without moving these big data from storage to storage. Data files are managed on our Web application, WSDBank (World Science Data Bank). The big-data on the cloud are processed via Pwrake, which is a workflow tool with high-bandwidth of I/O. There are several visualization tools on the cloud; VirtualAurora for magnetosphere and ionosphere, VDVGE for google Earth, STICKER for urban environment data and STARStouch for multi-disciplinary data. There are 30 projects running on the NICT Science Cloud for Earth and space science. In 2003 56 refereed papers were published. At the end, we introduce a couple of successful results of Earth and space sciences using these three techniques carried out on the NICT Sciences Cloud. [1] http://sc-web.nict.go.jp
The Jade File System. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Rao, Herman Chung-Hwa

1991-01-01

File systems have long been the most important and most widely used form of shared permanent storage. File systems in traditional time-sharing systems, such as Unix, support a coherent sharing model for multiple users. Distributed file systems implement this sharing model in local area networks. However, most distributed file systems fail to scale from local area networks to an internet. Four characteristics of scalability were recognized: size, wide area, autonomy, and heterogeneity. Owing to size and wide area, techniques such as broadcasting, central control, and central resources, which are widely adopted by local area network file systems, are not adequate for an internet file system. An internet file system must also support the notion of autonomy because an internet is made up by a collection of independent organizations. Finally, heterogeneity is the nature of an internet file system, not only because of its size, but also because of the autonomy of the organizations in an internet. The Jade File System, which provides a uniform way to name and access files in the internet environment, is presented. Jade is a logical system that integrates a heterogeneous collection of existing file systems, where heterogeneous means that the underlying file systems support different file access protocols. Because of autonomy, Jade is designed under the restriction that the underlying file systems may not be modified. In order to avoid the complexity of maintaining an internet-wide, global name space, Jade permits each user to define a private name space. In Jade's design, we pay careful attention to avoiding unnecessary network messages between clients and file servers in order to achieve acceptable performance. Jade's name space supports two novel features: (1) it allows multiple file systems to be mounted under one direction; and (2) it permits one logical name space to mount other logical name spaces. A prototype of Jade was implemented to examine and validate its design. The prototype consists of interfaces to the Unix File System, the Sun Network File System, and the File Transfer Protocol.
Please Move Inactive Files Off the /projects File System | High-Performance

Science.gov Websites

Computing | NREL Please Move Inactive Files Off the /projects File System Please Move Inactive Files Off the /projects File System January 11, 2018 The /projects file system is a shared resource . This year this has created a space crunch - the file system is now about 90% full and we need your help
Canal transportation and centering ability of protaper and self-adjusting file system in long oval canals: An ex-vivo cone-beam computed tomography analysis

PubMed Central

Shah, Dipali Yogesh; Wadekar, Swati Ishwara; Dadpe, Ashwini Manish; Jadhav, Ganesh Ranganath; Choudhary, Lalit Jayant; Kalra, Dheeraj Deepak

2017-01-01

Context and Aims: The purpose of this study was to compare and evaluate the shaping ability of ProTaper (PT) and Self-Adjusting File (SAF) system using cone-beam computed tomography (CBCT) to assess their performance in oval-shaped root canals. Materials and Methods: Sixty-two mandibular premolars with single oval canals were divided into two experimental groups (n = 31) according to the systems used: Group I – PT and Group II – SAF. Canals were evaluated before and after instrumentation using CBCT to assess centering ratio and canal transportation at three levels. Data were statistically analyzed using one-way analysis of variance, post hoc Tukey's test, and t-test. Results: The SAF showed better centering ability and lesser canal transportation than the PT only in the buccolingual plane at 6 and 9 mm levels. The shaping ability of the PT was best in the apical third in both the planes. The SAF had statistically significant better centering and lesser canal transportation in the buccolingual as compared to the mesiodistal plane at the middle and coronal levels. Conclusions: The SAF produced significantly less transportation and remained centered than the PT at the middle and coronal levels in the buccolingual plane of oval canals. In the mesiodistal plane, the performance of both the systems was parallel. PMID:28855757
Parallel processing optimization strategy based on MapReduce model in cloud storage environment

NASA Astrophysics Data System (ADS)

Cui, Jianming; Liu, Jiayi; Li, Qiuyan

2017-05-01

Currently, a large number of documents in the cloud storage process employed the way of packaging after receiving all the packets. From the local transmitter this stored procedure to the server, packing and unpacking will consume a lot of time, and the transmission efficiency is low as well. A new parallel processing algorithm is proposed to optimize the transmission mode. According to the operation machine graphs model work, using MPI technology parallel execution Mapper and Reducer mechanism. It is good to use MPI technology to implement Mapper and Reducer parallel mechanism. After the simulation experiment of Hadoop cloud computing platform, this algorithm can not only accelerate the file transfer rate, but also shorten the waiting time of the Reducer mechanism. It will break through traditional sequential transmission constraints and reduce the storage coupling to improve the transmission efficiency.
SciDB versus Spark: A Preliminary Comparison Based on an Earth Science Use Case

NASA Astrophysics Data System (ADS)

Clune, T.; Kuo, K. S.; Doan, K.; Oloso, A.

2015-12-01

We compare two Big Data technologies, SciDB and Spark, for performance, usability, and extensibility, when applied to a representative Earth science use case. SciDB is a new-generation parallel distributed database management system (DBMS) based on the array data model that is capable of handling multidimensional arrays efficiently but requires lengthy data ingest prior to analysis, whereas Spark is a fast and general engine for large scale data processing that can immediately process raw data files and thereby avoid the ingest process. Once data have been ingested, SciDB is very efficient in database operations such as subsetting. Spark, on the other hand, provides greater flexibility by supporting a wide variety of high-level tools including DBMS's. For the performance aspect of this preliminary comparison, we configure Spark to operate directly on text or binary data files and thereby limit the need for additional tools. Arguably, a more appropriate comparison would involve exploring other configurations of Spark which exploit supported high-level tools, but that is beyond our current resources. To make the comparison as "fair" as possible, we export the arrays produced by SciDB into text files (or converting them to binary files) for the intake by Spark and thereby avoid any additional file processing penalties. The Earth science use case selected for this comparison is the identification and tracking of snowstorms in the NASA Modern Era Retrospective-analysis for Research and Applications (MERRA) reanalysis data. The identification portion of the use case is to flag all grid cells of the MERRA high-resolution hourly data that satisfies our criteria for snowstorm, whereas the tracking portion connects flagged cells adjacent in time and space to form a snowstorm episode. We will report the results of our comparisons at this presentation.
Geologic map of the Devore 7.5' quadrangle, San Bernardino County, California

USGS Publications Warehouse

Morton, Douglas M.; Matti, Jonathan C.

2001-01-01

This Open-File Report contains a digital geologic map database of the Devore 7.5' quadrangle, San Bernardino County, California, that includes: 1. ARC/INFO (Environmental Systems Research Institute) version 7.2.1 coverages of the various components of the geologic map 2. A PostScript (.ps) file to plot the geologic map on a topographic base, containing a Correlation of Map Units diagram, a Description of Map Units, an index map, and a regional structure map 3. Portable Document Format (.pdf) files of: a. This Readme; includes an Appendix, containing metadata details found in devre_met.txt b. The same graphic as plotted in 2 above. (Test plots from this .pdf do not produce 1:24,000-scale maps. Adobe Acrobat page-size settings control map scale.) The Correlation of Map Units and Description of Map Units are in the editorial format of USGS Miscellaneous Investigations Series maps (I-maps) but have not been edited to comply with I-map standards. Within the geologic-map data package, map units are identified by such standard geologic-map criteria as formation name, age, and lithology. Even though this is an author-prepared report, every attempt has been made to closely adhere to the stratigraphic nomenclature of the U.S. Geological Survey. Descriptions of units can be obtained by viewing or plotting the .pdf file (3b above) or plotting the postscript file (2 above). If roads in some areas, especially forest roads that parallel topographic contours, do not show well on plots of the geologic map, we recommend use of the USGS Devore 7.5’ topographic quadrangle in conjunction with the geologic map.
Geologic map of the Fifteenmile Valley 7.5' quadrangle, San Bernardino County, California

USGS Publications Warehouse

Miller, F.K.; Matti, J.C.

2001-01-01

Open-File Report OF 01-132 contains a digital geologic map database of the Fifteenmile Valley 7.5’ quadrangle, San Bernardino County, California that includes: 1. ARC/INFO (Environmental Systems Research Institute, http://www.esri.com) version 7.2.1 coverages of the various elements of the geologic map. 2. A PostScript file to plot the geologic map on a topographic base, and containing a Correlation of Map Units diagram, a Description of Map Units, an index map, and a regional structure map. 3. Portable Document Format (.pdf) files of: a. This Readme; includes in Appendix I, data contained in fif_met.txt b. The same graphic as plotted in 2 above. (Test plots have not produced 1:24,000-scale map sheets. Adobe Acrobat pagesize setting influences map scale.) The Correlation of Map Units (CMU) and Description of Map Units (DMU) is in the editorial format of USGS Miscellaneous Investigations Series (I-series) maps. Within the geologic map data package, map units are identified by standard geologic map criteria such as formation-name, age, and lithology. Even though this is an author-prepared report, every attempt has been made to closely adhere to the stratigraphic nomenclature of the U. S. Geological Survey. Descriptions of units can be obtained by viewing or plotting the .pdf file (3b above) or plotting the postscript file (2 above). If roads in some areas, especially forest roads that parallel topographic contours, do not show well on plots of the geologic map, we recommend use of the USGS Fifteenmile Valley 7.5’ topographic quadrangle in conjunction with the geologic map.
START: a system for flexible analysis of hundreds of genomic signal tracks in few lines of SQL-like queries.

PubMed

Zhu, Xinjie; Zhang, Qiang; Ho, Eric Dun; Yu, Ken Hung-On; Liu, Chris; Huang, Tim H; Cheng, Alfred Sze-Lok; Kao, Ben; Lo, Eric; Yip, Kevin Y

2017-09-22

A genomic signal track is a set of genomic intervals associated with values of various types, such as measurements from high-throughput experiments. Analysis of signal tracks requires complex computational methods, which often make the analysts focus too much on the detailed computational steps rather than on their biological questions. Here we propose Signal Track Query Language (STQL) for simple analysis of signal tracks. It is a Structured Query Language (SQL)-like declarative language, which means one only specifies what computations need to be done but not how these computations are to be carried out. STQL provides a rich set of constructs for manipulating genomic intervals and their values. To run STQL queries, we have developed the Signal Track Analytical Research Tool (START, http://yiplab.cse.cuhk.edu.hk/start/ ), a system that includes a Web-based user interface and a back-end execution system. The user interface helps users select data from our database of around 10,000 commonly-used public signal tracks, manage their own tracks, and construct, store and share STQL queries. The back-end system automatically translates STQL queries into optimized low-level programs and runs them on a computer cluster in parallel. We use STQL to perform 14 representative analytical tasks. By repeating these analyses using bedtools, Galaxy and custom Python scripts, we show that the STQL solution is usually the simplest, and the parallel execution achieves significant speed-up with large data files. Finally, we describe how a biologist with minimal formal training in computer programming self-learned STQL to analyze DNA methylation data we produced from 60 pairs of hepatocellular carcinoma (HCC) samples. Overall, STQL and START provide a generic way for analyzing a large number of genomic signal tracks in parallel easily.
CERN data services for LHC computing

NASA Astrophysics Data System (ADS)

Espinal, X.; Bocchi, E.; Chan, B.; Fiorot, A.; Iven, J.; Lo Presti, G.; Lopez, J.; Gonzalez, H.; Lamanna, M.; Mascetti, L.; Moscicki, J.; Pace, A.; Peters, A.; Ponce, S.; Rousseau, H.; van der Ster, D.

2017-10-01

Dependability, resilience, adaptability and efficiency. Growing requirements require tailoring storage services and novel solutions. Unprecedented volumes of data coming from the broad number of experiments at CERN need to be quickly available in a highly scalable way for large-scale processing and data distribution while in parallel they are routed to tape for long-term archival. These activities are critical for the success of HEP experiments. Nowadays we operate at high incoming throughput (14GB/s during 2015 LHC Pb-Pb run and 11PB in July 2016) and with concurrent complex production work-loads. In parallel our systems provide the platform for the continuous user and experiment driven work-loads for large-scale data analysis, including end-user access and sharing. The storage services at CERN cover the needs of our community: EOS and CASTOR as a large-scale storage; CERNBox for end-user access and sharing; Ceph as data back-end for the CERN OpenStack infrastructure, NFS services and S3 functionality; AFS for legacy distributed-file-system services. In this paper we will summarise the experience in supporting LHC experiments and the transition of our infrastructure from static monolithic systems to flexible components providing a more coherent environment with pluggable protocols, tuneable QoS, sharing capabilities and fine grained ACLs management while continuing to guarantee dependable and robust services.
Scientific workflow and support for high resolution global climate modeling at the Oak Ridge Leadership Computing Facility

NASA Astrophysics Data System (ADS)

Anantharaj, V.; Mayer, B.; Wang, F.; Hack, J.; McKenna, D.; Hartman-Baker, R.

2012-04-01

The Oak Ridge Leadership Computing Facility (OLCF) facilitates the execution of computational experiments that require tens of millions of CPU hours (typically using thousands of processors simultaneously) while generating hundreds of terabytes of data. A set of ultra high resolution climate experiments in progress, using the Community Earth System Model (CESM), will produce over 35,000 files, ranging in sizes from 21 MB to 110 GB each. The execution of the experiments will require nearly 70 Million CPU hours on the Jaguar and Titan supercomputers at OLCF. The total volume of the output from these climate modeling experiments will be in excess of 300 TB. This model output must then be archived, analyzed, distributed to the project partners in a timely manner, and also made available more broadly. Meeting this challenge would require efficient movement of the data, staging the simulation output to a large and fast file system that provides high volume access to other computational systems used to analyze the data and synthesize results. This file system also needs to be accessible via high speed networks to an archival system that can provide long term reliable storage. Ideally this archival system is itself directly available to other systems that can be used to host services making the data and analysis available to the participants in the distributed research project and to the broader climate community. The various resources available at the OLCF now support this workflow. The available systems include the new Jaguar Cray XK6 2.63 petaflops (estimated) supercomputer, the 10 PB Spider center-wide parallel file system, the Lens/EVEREST analysis and visualization system, the HPSS archival storage system, the Earth System Grid (ESG), and the ORNL Climate Data Server (CDS). The ESG features federated services, search & discovery, extensive data handling capabilities, deep storage access, and Live Access Server (LAS) integration. The scientific workflow enabled on these systems, and developed as part of the Ultra-High Resolution Climate Modeling Project, allows users of OLCF resources to efficiently share simulated data, often multi-terabyte in volume, as well as the results from the modeling experiments and various synthesized products derived from these simulations. The final objective in the exercise is to ensure that the simulation results and the enhanced understanding will serve the needs of a diverse group of stakeholders across the world, including our research partners in U.S. Department of Energy laboratories & universities, domain scientists, students (K-12 as well as higher education), resource managers, decision makers, and the general public.
Highway Safety Information System guidebook for the Minnesota state data files. Volume 1 : SAS file formats

DOT National Transportation Integrated Search

2001-02-01

The Minnesota data system includes the following basic files: Accident data (Accident File, Vehicle File, Occupant File); Roadlog File; Reference Post File; Traffic File; Intersection File; Bridge (Structures) File; and RR Grade Crossing File. For ea...
LMC: Logarithmantic Monte Carlo

NASA Astrophysics Data System (ADS)

Mantz, Adam B.

2017-06-01

LMC is a Markov Chain Monte Carlo engine in Python that implements adaptive Metropolis-Hastings and slice sampling, as well as the affine-invariant method of Goodman & Weare, in a flexible framework. It can be used for simple problems, but the main use case is problems where expensive likelihood evaluations are provided by less flexible third-party software, which benefit from parallelization across many nodes at the sampling level. The parallel/adaptive methods use communication through MPI, or alternatively by writing/reading files, and mostly follow the approaches pioneered by CosmoMC (ascl:1106.025).
Scalable Unix tools on parallel processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gropp, W.; Lusk, E.

1994-12-31

The introduction of parallel processors that run a separate copy of Unix on each process has introduced new problems in managing the user`s environment. This paper discusses some generalizations of common Unix commands for managing files (e.g. 1s) and processes (e.g. ps) that are convenient and scalable. These basic tools, just like their Unix counterparts, are text-based. We also discuss a way to use these with a graphical user interface (GUI). Some notes on the implementation are provided. Prototypes of these commands are publicly available.
Long-Term file activity patterns in a UNIX workstation environment

NASA Technical Reports Server (NTRS)

Gibson, Timothy J.; Miller, Ethan L.

1998-01-01

As mass storage technology becomes more affordable for sites smaller than supercomputer centers, understanding their file access patterns becomes crucial for developing systems to store rarely used data on tertiary storage devices such as tapes and optical disks. This paper presents a new way to collect and analyze file system statistics for UNIX-based file systems. The collection system runs in user-space and requires no modification of the operating system kernel. The statistics package provides details about file system operations at the file level: creations, deletions, modifications, etc. The paper analyzes four months of file system activity on a university file system. The results confirm previously published results gathered from supercomputer file systems, but differ in several important areas. Files in this study were considerably smaller than those at supercomputer centers, and they were accessed less frequently. Additionally, the long-term creation rate on workstation file systems is sufficiently low so that all data more than a day old could be cheaply saved on a mass storage device, allowing the integration of time travel into every file system.

Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
10 CFR 13.2 - Definitions.

Code of Federal Regulations, 2010 CFR

2010-01-01

... identity when filing documents and serving participants electronically through the E-Filing system, and... transmitted electronically from the E-Filing system to the submitter confirming receipt of electronic filing... presentation of the docket and a link to its files. E-Filing System means an electronic system that receives...
Communication library for run-time visualization of distributed, asynchronous data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rowlan, J.; Wightman, B.T.

1994-04-01

In this paper we present a method for collecting and visualizing data generated by a parallel computational simulation during run time. Data distributed across multiple processes is sent across parallel communication lines to a remote workstation, which sorts and queues the data for visualization. We have implemented our method in a set of tools called PORTAL (for Parallel aRchitecture data-TrAnsfer Library). The tools comprise generic routines for sending data from a parallel program (callable from either C or FORTRAN), a semi-parallel communication scheme currently built upon Unix Sockets, and a real-time connection to the scientific visualization program AVS. Our methodmore » is most valuable when used to examine large datasets that can be efficiently generated and do not need to be stored on disk. The PORTAL source libraries, detailed documentation, and a working example can be obtained by anonymous ftp from info.mcs.anl.gov from the file portal.tar.Z from the directory pub/portal.« less
ModelMate - A graphical user interface for model analysis

USGS Publications Warehouse

Banta, Edward R.

2011-01-01

ModelMate is a graphical user interface designed to facilitate use of model-analysis programs with models. This initial version of ModelMate supports one model-analysis program, UCODE_2005, and one model software program, MODFLOW-2005. ModelMate can be used to prepare input files for UCODE_2005, run UCODE_2005, and display analysis results. A link to the GW_Chart graphing program facilitates visual interpretation of results. ModelMate includes capabilities for organizing directories used with the parallel-processing capabilities of UCODE_2005 and for maintaining files in those directories to be identical to a set of files in a master directory. ModelMate can be used on its own or in conjunction with ModelMuse, a graphical user interface for MODFLOW-2005 and PHAST.
Measuring driver satisfaction with an urban arterial before and after deployment of an adaptive timing signal system

DOT National Transportation Integrated Search

2001-02-01

The Minnesota data system includes the following basic files: Accident data (Accident File, Vehicle File, Occupant File); Roadlog File; Reference Post File; Traffic File; Intersection File; Bridge (Structures) File; and RR Grade Crossing File. For ea...
Runtime Detection of C-Style Errors in UPC Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pirkelbauer, P; Liao, C; Panas, T

2011-09-29

Unified Parallel C (UPC) extends the C programming language (ISO C 99) with explicit parallel programming support for the partitioned global address space (PGAS), which provides a global memory space with localized partitions to each thread. Like its ancestor C, UPC is a low-level language that emphasizes code efficiency over safety. The absence of dynamic (and static) safety checks allows programmer oversights and software flaws that can be hard to spot. In this paper, we present an extension of a dynamic analysis tool, ROSE-Code Instrumentation and Runtime Monitor (ROSECIRM), for UPC to help programmers find C-style errors involving the globalmore » address space. Built on top of the ROSE source-to-source compiler infrastructure, the tool instruments source files with code that monitors operations and keeps track of changes to the system state. The resulting code is linked to a runtime monitor that observes the program execution and finds software defects. We describe the extensions to ROSE-CIRM that were necessary to support UPC. We discuss complications that arise from parallel code and our solutions. We test ROSE-CIRM against a runtime error detection test suite, and present performance results obtained from running error-free codes. ROSE-CIRM is released as part of the ROSE compiler under a BSD-style open source license.« less
Zebra: A striped network file system

NASA Technical Reports Server (NTRS)

Hartman, John H.; Ousterhout, John K.

1992-01-01

The design of Zebra, a striped network file system, is presented. Zebra applies ideas from log-structured file system (LFS) and RAID research to network file systems, resulting in a network file system that has scalable performance, uses its servers efficiently even when its applications are using small files, and provides high availability. Zebra stripes file data across multiple servers, so that the file transfer rate is not limited by the performance of a single server. High availability is achieved by maintaining parity information for the file system. If a server fails its contents can be reconstructed using the contents of the remaining servers and the parity information. Zebra differs from existing striped file systems in the way it stripes file data: Zebra does not stripe on a per-file basis; instead it stripes the stream of bytes written by each client. Clients write to the servers in units called stripe fragments, which are analogous to segments in an LFS. Stripe fragments contain file blocks that were written recently, without regard to which file they belong. This method of striping has numerous advantages over per-file striping, including increased server efficiency, efficient parity computation, and elimination of parity update.
DMFS: A Data Migration File System for NetBSD

NASA Technical Reports Server (NTRS)

Studenmund, William

2000-01-01

I have recently developed DMFS, a Data Migration File System, for NetBSD. This file system provides kernel support for the data migration system being developed by my research group at NASA/Ames. The file system utilizes an underlying file store to provide the file backing, and coordinates user and system access to the files. It stores its internal metadata in a flat file, which resides on a separate file system. This paper will first describe our data migration system to provide a context for DMFS, then it will describe DMFS. It also will describe the changes to NetBSD needed to make DMFS work. Then it will give an overview of the file archival and restoration procedures, and describe how some typical user actions are modified by DMFS. Lastly, the paper will present simple performance measurements which indicate that there is little performance loss due to the use of the DMFS layer.
78 FR 26066 - Filing of Plats of Survey; NV

Federal Register 2010, 2011, 2012, 2013, 2014

2013-05-03

... plat, in 1 sheet, representing the dependent resurvey of the Fifth Standard Parallel North through... interest lands for Barrick Gold Exploration, Inc. 2. The Supplemental Plat of the following described lands... DEPARTMENT OF THE INTERIOR Bureau of Land Management [LLNV952000 L14200000.BJ0000 241A; 13-08807...
Processing MPI Datatypes Outside MPI

NASA Astrophysics Data System (ADS)

Ross, Robert; Latham, Robert; Gropp, William; Lusk, Ewing; Thakur, Rajeev

The MPI datatype functionality provides a powerful tool for describing structured memory and file regions in parallel applications, enabling noncontiguous data to be operated on by MPI communication and I/O routines. However, no facilities are provided by the MPI standard to allow users to efficiently manipulate MPI datatypes in their own codes.
An Approach to Integrate a Space-Time GIS Data Model with High Performance Computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Dali; Zhao, Ziliang; Shaw, Shih-Lung

2011-01-01

In this paper, we describe an approach to integrate a Space-Time GIS data model on a high performance computing platform. The Space-Time GIS data model has been developed on a desktop computing environment. We use the Space-Time GIS data model to generate GIS module, which organizes a series of remote sensing data. We are in the process of porting the GIS module into an HPC environment, in which the GIS modules handle large dataset directly via parallel file system. Although it is an ongoing project, authors hope this effort can inspire further discussions on the integration of GIS on highmore » performance computing platforms.« less
A hybrid parallel architecture for electrostatic interactions in the simulation of dissipative particle dynamics

NASA Astrophysics Data System (ADS)

Yang, Sheng-Chun; Lu, Zhong-Yuan; Qian, Hu-Jun; Wang, Yong-Lei; Han, Jie-Ping

2017-11-01

In this work, we upgraded the electrostatic interaction method of CU-ENUF (Yang, et al., 2016) which first applied CUNFFT (nonequispaced Fourier transforms based on CUDA) to the reciprocal-space electrostatic computation and made the computation of electrostatic interaction done thoroughly in GPU. The upgraded edition of CU-ENUF runs concurrently in a hybrid parallel way that enables the computation parallelizing on multiple computer nodes firstly, then further on the installed GPU in each computer. By this parallel strategy, the size of simulation system will be never restricted to the throughput of a single CPU or GPU. The most critical technical problem is how to parallelize a CUNFFT in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Furthermore, the upgraded method is capable of computing electrostatic interactions for both the atomistic molecular dynamics (MD) and the dissipative particle dynamics (DPD). Finally, the benchmarks conducted for validation and performance indicate that the upgraded method is able to not only present a good precision when setting suitable parameters, but also give an efficient way to compute electrostatic interactions for huge simulation systems. Program Files doi:http://dx.doi.org/10.17632/zncf24fhpv.1 Licensing provisions: GNU General Public License 3 (GPL) Programming language: C, C++, and CUDA C Supplementary material: The program is designed for effective electrostatic interactions of large-scale simulation systems, which runs on particular computers equipped with NVIDIA GPUs. It has been tested on (a) single computer node with Intel(R) Core(TM) i7-3770@ 3.40 GHz (CPU) and GTX 980 Ti (GPU), and (b) MPI parallel computer nodes with the same configurations. Nature of problem: For molecular dynamics simulation, the electrostatic interaction is the most time-consuming computation because of its long-range feature and slow convergence in simulation space, which approximately take up most of the total simulation time. Although the parallel method CU-ENUF (Yang et al., 2016) based on GPU has achieved a qualitative leap compared with previous methods in electrostatic interactions computation, the computation capability is limited to the throughput capacity of a single GPU for super-scale simulation system. Therefore, we should look for an effective method to handle the calculation of electrostatic interactions efficiently for a simulation system with super-scale size. Solution method: We constructed a hybrid parallel architecture, in which CPU and GPU are combined to accelerate the electrostatic computation effectively. Firstly, the simulation system is divided into many subtasks via domain-decomposition method. Then MPI (Message Passing Interface) is used to implement the CPU-parallel computation with each computer node corresponding to a particular subtask, and furthermore each subtask in one computer node will be executed in GPU in parallel efficiently. In this hybrid parallel method, the most critical technical problem is how to parallelize a CUNFFT (nonequispaced fast Fourier transform based on CUDA) in the parallel strategy, which is conquered effectively by deep-seated research of basic principles and some algorithm skills. Restrictions: The HP-ENUF is mainly oriented to super-scale system simulations, in which the performance superiority is shown adequately. However, for a small simulation system containing less than 106 particles, the mode of multiple computer nodes has no apparent efficiency advantage or even lower efficiency due to the serious network delay among computer nodes, than the mode of single computer node. References: (1) S.-C. Yang, H.-J. Qian, Z.-Y. Lu, Appl. Comput. Harmon. Anal. 2016, http://dx.doi.org/10.1016/j.acha.2016.04.009. (2) S.-C. Yang, Y.-L. Wang, G.-S. Jiao, H.-J. Qian, Z.-Y. Lu, J. Comput. Chem. 37 (2016) 378. (3) S.-C. Yang, Y.-L. Zhu, H.-J. Qian, Z.-Y. Lu, Appl. Chem. Res. Chin. Univ., 2017, http://dx.doi.org/10.1007/s40242-016-6354-5. (4) Y.-L. Zhu, H. Liu, Z.-W. Li, H.-J. Qian, G. Milano, Z.-Y. Lu, J. Comput. Chem. 34 (2013) 2197.
Production Maintenance Infrastructure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jason Gabler, David Skinner

2005-11-01

PMI is a XML framework for formulating tests of software and software environments which operate in a relatively push button manner, i.e., can be automated, and that provide results that are readily consumable/publishable via RSS. Insofar as possible the tests are carried out in manner congruent with real usage. PMI drives shell scripts via a perl program which is charge of timing, validating each test, and controlling the flow through sets of tests. Testing in PMI is built up hierarchically. A suite of tests may start by testing basic functionalities (file system is writable, compiler is found and functions, shellmore » environment behaves as expected, etc.) and work up to large more complicated activities (execution of parallel code, file transfers, etc.) At each step in this hierarchy a failure leads to generation of a text message or RSS that can be tagged as to who should be notified of the failure. There are two functionalities that PMI has been directed at. 1) regular and automated testing of multi user environments and 2) version-wise testing of new software releases prior to their deployment in a production mode.« less
An integrated solution for remote data access

NASA Astrophysics Data System (ADS)

Sapunenko, Vladimir; D'Urso, Domenico; dell'Agnello, Luca; Vagnoni, Vincenzo; Duranti, Matteo

2015-12-01

Data management constitutes one of the major challenges that a geographically- distributed e-Infrastructure has to face, especially when remote data access is involved. We discuss an integrated solution which enables transparent and efficient access to on-line and near-line data through high latency networks. The solution is based on the joint use of the General Parallel File System (GPFS) and of the Tivoli Storage Manager (TSM). Both products, developed by IBM, are well known and extensively used in the HEP computing community. Owing to a new feature introduced in GPFS 3.5, so-called Active File Management (AFM), the definition of a single, geographically-distributed namespace, characterised by automated data flow management between different locations, becomes possible. As a practical example, we present the implementation of AFM-based remote data access between two data centres located in Bologna and Rome, demonstrating the validity of the solution for the use case of the AMS experiment, an astro-particle experiment supported by the INFN CNAF data centre with the large disk space requirements (more than 1.5 PB).
Scientific Data Management Center for Enabling Technologies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vouk, Mladen A.

Managing scientific data has been identified by the scientific community as one of the most important emerging needs because of the sheer volume and increasing complexity of data being collected. Effectively generating, managing, and analyzing this information requires a comprehensive, end-to-end approach to data management that encompasses all of the stages from the initial data acquisition to the final analysis of the data. Fortunately, the data management problems encountered by most scientific domains are common enough to be addressed through shared technology solutions. Based on community input, we have identified three significant requirements. First, more efficient access to storage systemsmore » is needed. In particular, parallel file system and I/O system improvements are needed to write and read large volumes of data without slowing a simulation, analysis, or visualization engine. These processes are complicated by the fact that scientific data are structured differently for specific application domains, and are stored in specialized file formats. Second, scientists require technologies to facilitate better understanding of their data, in particular the ability to effectively perform complex data analysis and searches over extremely large data sets. Specialized feature discovery and statistical analysis techniques are needed before the data can be understood or visualized. Furthermore, interactive analysis requires techniques for efficiently selecting subsets of the data. Finally, generating the data, collecting and storing the results, keeping track of data provenance, data post-processing, and analysis of results is a tedious, fragmented process. Tools for automation of this process in a robust, tractable, and recoverable fashion are required to enhance scientific exploration. The SDM center was established under the SciDAC program to address these issues. The SciDAC-1 Scientific Data Management (SDM) Center succeeded in bringing an initial set of advanced data management technologies to DOE application scientists in astrophysics, climate, fusion, and biology. Equally important, it established collaborations with these scientists to better understand their science as well as their forthcoming data management and data analytics challenges. Building on our early successes, we have greatly enhanced, robustified, and deployed our technology to these communities. In some cases, we identified new needs that have been addressed in order to simplify the use of our technology by scientists. This report summarizes our work so far in SciDAC-2. Our approach is to employ an evolutionary development and deployment process: from research through prototypes to deployment and infrastructure. Accordingly, we have organized our activities in three layers that abstract the end-to-end data flow described above. We labeled the layers (from bottom to top): a) Storage Efficient Access (SEA), b) Data Mining and Analysis (DMA), c) Scientific Process Automation (SPA). The SEA layer is immediately on top of hardware, operating systems, file systems, and mass storage systems, and provides parallel data access technology, and transparent access to archival storage. The DMA layer, which builds on the functionality of the SEA layer, consists of indexing, feature identification, and parallel statistical analysis technology. The SPA layer, which is on top of the DMA layer, provides the ability to compose scientific workflows from the components in the DMA layer as well as application specific modules. NCSU work performed under this contract was primarily at the SPA layer.« less
Development of a Stiffness-Based Chemistry Load Balancing Scheme, and Optimization of Input/Output and Communication, to Enable Massively Parallel High-Fidelity Internal Combustion Engine Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kodavasal, Janardhan; Harms, Kevin; Srivastava, Priyesh

A closed-cycle gasoline compression ignition engine simulation near top dead center (TDC) was used to profile the performance of a parallel commercial engine computational fluid dynamics code, as it was scaled on up to 4096 cores of an IBM Blue Gene/Q supercomputer. The test case has 9 million cells near TDC, with a fixed mesh size of 0.15 mm, and was run on configurations ranging from 128 to 4096 cores. Profiling was done for a small duration of 0.11 crank angle degrees near TDC during ignition. Optimization of input/output performance resulted in a significant speedup in reading restart files, andmore » in an over 100-times speedup in writing restart files and files for post-processing. Improvements to communication resulted in a 1400-times speedup in the mesh load balancing operation during initialization, on 4096 cores. An improved, “stiffness-based” algorithm for load balancing chemical kinetics calculations was developed, which results in an over 3-times faster run-time near ignition on 4096 cores relative to the original load balancing scheme. With this improvement to load balancing, the code achieves over 78% scaling efficiency on 2048 cores, and over 65% scaling efficiency on 4096 cores, relative to 256 cores.« less
Enhanced pyroelectric and piezoelectric properties of PZT with aligned porosity for energy harvesting applications† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7ta00967d Click here for additional data file.

PubMed Central

Zhang, Yan; Xie, Mengying; Roscow, James; Bao, Yinxiang; Zhou, Kechao

2017-01-01

This paper demonstrates the significant benefits of exploiting highly aligned porosity in piezoelectric and pyroelectric materials for improved energy harvesting performance. Porous lead zirconate (PZT) ceramics with aligned pore channels and varying fractions of porosity were manufactured in a water-based suspension using freeze-casting. The aligned porous PZT ceramics were characterized in detail for both piezoelectric and pyroelectric properties and their energy harvesting performance figures of merit were assessed parallel and perpendicular to the freezing direction. As a result of the introduction of porosity into the ceramic microstructure, high piezoelectric and pyroelectric harvesting figures of merits were achieved for porous freeze-cast PZT compared to dense PZT due to the reduced permittivity and volume specific heat capacity. Experimental results were compared to parallel and series analytical models with good agreement and the PZT with porosity aligned parallel to the freezing direction exhibited the highest piezoelectric and pyroelectric harvesting response; this was a result of the enhanced interconnectivity of the ferroelectric material along the poling direction and reduced fraction of unpoled material that leads to a higher polarization. A complete thermal energy harvesting system, composed of a parallel-aligned PZT harvester element and an AC/DC converter, was successfully demonstrated by charging a storage capacitor. The maximum energy density generated by the 60 vol% porous parallel-connected PZT when subjected to thermal oscillations was 1653 μJ cm–3, which was 374% higher than that of the dense PZT with an energy density of 446 μJ cm–3. The results are beneficial for the design and manufacture of high performance porous pyroelectric and piezoelectric materials in devices for energy harvesting and sensor applications. PMID:28580142
MPI_XSTAR: MPI-based parallelization of XSTAR program

NASA Astrophysics Data System (ADS)

Danehkar, A.

2017-12-01

MPI_XSTAR parallelizes execution of multiple XSTAR runs using Message Passing Interface (MPI). XSTAR (ascl:9910.008), part of the HEASARC's HEAsoft (ascl:1408.004) package, calculates the physical conditions and emission spectra of ionized gases. MPI_XSTAR invokes XSTINITABLE from HEASoft to generate a job list of XSTAR commands for given physical parameters. The job list is used to make directories in ascending order, where each individual XSTAR is spawned on each processor and outputs are saved. HEASoft's XSTAR2TABLE program is invoked upon the contents of each directory in order to produce table model FITS files for spectroscopy analysis tools.
Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools

PubMed Central

Cheng, Yinhe; Tzeng, Tzy-Hwa Kathy

2016-01-01

This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156–186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators, if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting input data are provided by using plug-in tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of next generation sequencing (NGS) data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime of NGS data pre-processing from about 20 hours to about nine minutes for a whole-genome sequencing data set on the same system using up to 711 GB of memory. PMID:27861637
Measurement of ridge-spreading movements (Sackungen) at Bald Eagle Mountain, Lake County, Colorado, II : continuation of the 1975-1989 measurements using a Global Positioning System in 1997 and 1999

USGS Publications Warehouse

Varnes, David J.; Coe, J.A.; Godt, J.W.; Savage, W.Z.; Savage, J.E.

2000-01-01

Measurements of ridge-spreading movements at Bald Eagle Mountain in north-central Colorado were reported in USGS Open-File Report 90-543 for the years 1975-1989. Measurements were renewed in 1997 and 1999 using the Global Positioning System (GPS). Movements are generally away from a ridge-top graben and appear to be concentrated along 3 or 4 trenches with uphill facing scarps that are parallel with slope contours. A point just below the lowest trench has moved the most? a total of 8.3 cm horizontally and slightly downward from 1977 to 1999 relative to an assumed stable point on the periphery of the graben. Movements from 1997 to 1999 are less than 1 cm or within the error of measurement.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kurth, Thorsten; Pochinsky, Andrew; Sarje, Abhinav

Practitioners of lattice QCD/QFT have been some of the primary pioneer users of the state-of-the-art high-performance-computing systems, and contribute towards the stress tests of such new machines as soon as they become available. As with all aspects of high-performance-computing, I/O is becoming an increasingly specialized component of these systems. In order to take advantage of the latest available high-performance I/O infrastructure, to ensure reliability and backwards compatibility of data files, and to help unify the data structures used in lattice codes, we have incorporated parallel HDF5 I/O into the SciDAC supported USQCD software stack. Here we present the design andmore » implementation of this I/O framework. Our HDF5 implementation outperforms optimized QIO at the 10-20% level and leaves room for further improvement by utilizing appropriate dataset chunking.« less
Weakly Interacting Symmetric and Anti-Symmetric States in the Bilayer Systems

NASA Astrophysics Data System (ADS)

Marchewka, M.; Sheregii, E. M.; Tralle, I.; Tomaka, G.; Ploch, D.

We have studied the parallel magneto-transport in DQW-structures of two different potential shapes: quasi-rectangular and quasi-triangular. The quantum beats effect was observed in Shubnikov-de Haas (SdH) oscillations for both types of the DQW structures in perpendicular magnetic filed arrangement. We developed a special scheme for the Landau levels energies calculation by means of which we carried out the necessary simulations of beating effect. In order to obtain the agreement between our experimental data and the results of simulations, we introduced two different quasi-Fermi levels which characterize symmetric and anti-symmetric states in DQWs. The existence of two different quasi Fermi-Levels simply means, that one can treat two sub-systems (charge carriers characterized by symmetric and anti-symmetric wave functions) as weakly interacting and having their own rate of establishing the equilibrium state.
CloudMC: a cloud computing application for Monte Carlo simulation.

PubMed

Miras, H; Jiménez, R; Miras, C; Gomà, C

2013-04-21

This work presents CloudMC, a cloud computing application-developed in Windows Azure®, the platform of the Microsoft® cloud-for the parallelization of Monte Carlo simulations in a dynamic virtual cluster. CloudMC is a web application designed to be independent of the Monte Carlo code in which the simulations are based-the simulations just need to be of the form: input files → executable → output files. To study the performance of CloudMC in Windows Azure®, Monte Carlo simulations with penelope were performed on different instance (virtual machine) sizes, and for different number of instances. The instance size was found to have no effect on the simulation runtime. It was also found that the decrease in time with the number of instances followed Amdahl's law, with a slight deviation due to the increase in the fraction of non-parallelizable time with increasing number of instances. A simulation that would have required 30 h of CPU on a single instance was completed in 48.6 min when executed on 64 instances in parallel (speedup of 37 ×). Furthermore, the use of cloud computing for parallel computing offers some advantages over conventional clusters: high accessibility, scalability and pay per usage. Therefore, it is strongly believed that cloud computing will play an important role in making Monte Carlo dose calculation a reality in future clinical practice.
CRITIC2: A program for real-space analysis of quantum chemical interactions in solids

NASA Astrophysics Data System (ADS)

Otero-de-la-Roza, A.; Johnson, Erin R.; Luaña, Víctor

2014-03-01

We present CRITIC2, a program for the analysis of quantum-mechanical atomic and molecular interactions in periodic solids. This code, a greatly improved version of the previous CRITIC program (Otero-de-la Roza et al., 2009), can: (i) find critical points of the electron density and related scalar fields such as the electron localization function (ELF), Laplacian, … (ii) integrate atomic properties in the framework of Bader’s Atoms-in-Molecules theory (QTAIM), (iii) visualize non-covalent interactions in crystals using the non-covalent interactions (NCI) index, (iv) generate relevant graphical representations including lines, planes, gradient paths, contour plots, atomic basins, … and (v) perform transformations between file formats describing scalar fields and crystal structures. CRITIC2 can interface with the output produced by a variety of electronic structure programs including WIEN2k, elk, PI, abinit, Quantum ESPRESSO, VASP, Gaussian, and, in general, any other code capable of writing the scalar field under study to a three-dimensional grid. CRITIC2 is parallelized, completely documented (including illustrative test cases) and publicly available under the GNU General Public License. Catalogue identifier: AECB_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECB_v2_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: yes No. of lines in distributed program, including test data, etc.: 11686949 No. of bytes in distributed program, including test data, etc.: 337020731 Distribution format: tar.gz Programming language: Fortran 77 and 90. Computer: Workstations. Operating system: Unix, GNU/Linux. Has the code been vectorized or parallelized?: Shared-memory parallelization can be used for most tasks. Classification: 7.3. Catalogue identifier of previous version: AECB_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180 (2009) 157 Nature of problem: Analysis of quantum-chemical interactions in periodic solids by means of atoms-in-molecules and related formalisms. Solution method: Critical point search using Newton’s algorithm, atomic basin integration using bisection, qtree and grid-based algorithms, diverse graphical representations and computation of the non-covalent interactions index on a three-dimensional grid. Additional comments: !!!!! The distribution file for this program is over 330 Mbytes and therefore is not delivered directly when download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. !!!!! Running time: Variable, depending on the crystal and the source of the underlying scalar field.
Collective operations in a file system based execution model

DOEpatents

Shinde, Pravin; Van Hensbergen, Eric

2013-02-12

A mechanism is provided for group communications using a MULTI-PIPE synthetic file system. A master application creates a multi-pipe synthetic file in the MULTI-PIPE synthetic file system, the master application indicating a multi-pipe operation to be performed. The master application then writes a header-control block of the multi-pipe synthetic file specifying at least one of a multi-pipe synthetic file system name, a message type, a message size, a specific destination, or a specification of the multi-pipe operation. Any other application participating in the group communications then opens the same multi-pipe synthetic file. A MULTI-PIPE file system module then implements the multi-pipe operation as identified by the master application. The master application and the other applications then either read or write operation messages to the multi-pipe synthetic file and the MULTI-PIPE synthetic file system module performs appropriate actions.
Collective operations in a file system based execution model

DOEpatents

Shinde, Pravin; Van Hensbergen, Eric

2013-02-19

A mechanism is provided for group communications using a MULTI-PIPE synthetic file system. A master application creates a multi-pipe synthetic file in the MULTI-PIPE synthetic file system, the master application indicating a multi-pipe operation to be performed. The master application then writes a header-control block of the multi-pipe synthetic file specifying at least one of a multi-pipe synthetic file system name, a message type, a message size, a specific destination, or a specification of the multi-pipe operation. Any other application participating in the group communications then opens the same multi-pipe synthetic file. A MULTI-PIPE file system module then implements the multi-pipe operation as identified by the master application. The master application and the other applications then either read or write operation messages to the multi-pipe synthetic file and the MULTI-PIPE synthetic file system module performs appropriate actions.
Aerobraking Maneuver (ABM) Report Generator

NASA Technical Reports Server (NTRS)

Fisher, Forrest; Gladden, Roy; Khanampornpan, Teerapat

2008-01-01

abmREPORT Version 3.1 is a Perl script that extracts vital summarization information from the Mars Reconnaissance Orbiter (MRO) aerobraking ABM build process. This information facilitates sequence reviews, and provides a high-level summarization of the sequence for mission management. The script extracts information from the ENV, SSF, FRF, SCMFmax, and OPTG files and burn magnitude configuration files and presents them in a single, easy-to-check report that provides the majority of the parameters necessary for cross check and verification during the sequence review process. This means that needed information, formerly spread across a number of different files and each in a different format, is all available in this one application. This program is built on the capabilities developed in dragReport and then the scripts evolved as the two tools continued to be developed in parallel.
A revision of the Neotropical tortoise beetle genus Eurypedus Gistel 1834 (Coleoptera: Chrysomelidae).

PubMed

Shin, Chulwoo

2016-09-06

The genus Eurypedus Gistel is revised based on detailed morphological study, including examination of the mouthparts and genitalia. Besides previously known diagnostic characters, such as an oblong and laterally parallel-sided body, narrow elytral lamella, narrow prosternal process between the procoxae, and angled pronotal base, new diagnostic characters are identified: antennal notches on the ventral surface of antennomeres V-XI, a stridulatory file on the vertex, and paired projections on the ventral surface of the pronotum. The distinct stridulatory file is found only in males. The number of ridges of the stridulatory file varies between 48 and 59. Eurypedus thoni Barber (= Cassida oblonga Sturm in Thon) syn. nov. is synonymized with E. peltoides (Boheman). The remaining two species E. peltoides and E. nigrosignatus (Boheman) show distinct distributions separated by the Amazon Basin.
Design and Implementation of a Metadata-rich File System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ames, S; Gokhale, M B; Maltzahn, C

2010-01-19

Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and semantic metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address thesemore » problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, user-defined attributes, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS incorporates Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the de facto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.« less
Monte Carlo event generators in atomic collisions: A new tool to tackle the few-body dynamics

NASA Astrophysics Data System (ADS)

Ciappina, M. F.; Kirchner, T.; Schulz, M.

2010-04-01

We present a set of routines to produce theoretical event files, for both single and double ionization of atoms by ion impact, based on a Monte Carlo event generator (MCEG) scheme. Such event files are the theoretical counterpart of the data obtained from a kinematically complete experiment; i.e. they contain the momentum components of all collision fragments for a large number of ionization events. Among the advantages of working with theoretical event files is the possibility to incorporate the conditions present in a real experiment, such as the uncertainties in the measured quantities. Additionally, by manipulating them it is possible to generate any type of cross sections, specially those that are usually too complicated to compute with conventional methods due to a lack of symmetry. Consequently, the numerical effort of such calculations is dramatically reduced. We show examples for both single and double ionization, with special emphasis on a new data analysis tool, called four-body Dalitz plots, developed very recently. Program summaryProgram title: MCEG Catalogue identifier: AEFV_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEFV_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 2695 No. of bytes in distributed program, including test data, etc.: 18 501 Distribution format: tar.gz Programming language: FORTRAN 77 with parallelization directives using scripting Computer: Single machines using Linux and Linux servers/clusters (with cores with any clock speed, cache memory and bits in a word) Operating system: Linux (any version and flavor) and FORTRAN 77 compilers Has the code been vectorised or parallelized?: Yes RAM: 64-128 kBytes (the codes are very cpu intensive) Classification: 2.6 Nature of problem: The code deals with single and double ionization of atoms by ion impact. Conventional theoretical approaches aim at a direct calculation of the corresponding cross sections. This has the important shortcoming that it is difficult to account for the experimental conditions when comparing results to measured data. In contrast, the present code generates theoretical event files of the same type as are obtained in a real experiment. From these event files any type of cross sections can be easily extracted. The theoretical schemes are based on distorted wave formalisms for both processes of interest. Solution method: The codes employ a Monte Carlo Event Generator based on theoretical formalisms to generate event files for both single and double ionization. One of the main advantages of having access to theoretical event files is the possibility of adding the conditions present in real experiments (parameter uncertainties, environmental conditions, etc.) and to incorporate additional physics in the resulting event files (e.g. elastic scattering or other interactions absent in the underlying calculations). Additional comments: The computational time can be dramatically reduced if a large number of processors is used. Since the codes has no communication between processes it is possible to achieve an efficiency of a 100% (this number certainly will be penalized by the queuing waiting time). Running time: Times vary according to the process, single or double ionization, to be simulated, the number of processors and the type of theoretical model. The typical running time is between several hours and up to a few weeks.
Reciproc versus Twisted file for root canal filling removal: assessment of apically extruded debris.

PubMed

Altunbas, Demet; Kutuk, Betul; Toyoglu, Mustafa; Kutlu, Gizem; Kustarci, Alper; Er, Kursat

2016-01-01

The aim of this study was to evaluate the amount of apically extruded debris during endodontic retreatment with different file systems. Sixty extracted human mandibular premolar teeth were used in this study. Root canals of the teeth were instrumented and filled before being randomly assigned to three groups. Guttapercha was removed using the Reciproc system, the Twisted File system (TF), and Hedström-files (H-file). Apically extruded debris was collected and dried in pre-weighed Eppendorf tubes. The amount of extruded debris was assessed with an electronic balance. Data were statistically analyzed using one-way ANOVA, Kruskal-Wallis, and Mann-Whitney U tests. The Reciproc and TF systems extruded significantly less debris than the H-file (p<0.05). However, no significant difference was found between the Reciproc and TF systems. All tested file systems caused apical extrusion of debris. Both the rotary file (TF) and the reciprocating single-file (Reciproc) systems were associated with less apical extrusion compared with the H-file.
Distributed PACS using distributed file system with hierarchical meta data servers.

PubMed

Hiroyasu, Tomoyuki; Minamitani, Yoshiyuki; Miki, Mitsunori; Yokouchi, Hisatake; Yoshimi, Masato

2012-01-01

In this research, we propose a new distributed PACS (Picture Archiving and Communication Systems) which is available to integrate several PACSs that exist in each medical institution. The conventional PACS controls DICOM file into one data-base. On the other hand, in the proposed system, DICOM file is separated into meta data and image data and those are stored individually. Using this mechanism, since file is not always accessed the entire data, some operations such as finding files, changing titles, and so on can be performed in high-speed. At the same time, as distributed file system is utilized, accessing image files can also achieve high-speed access and high fault tolerant. The introduced system has a more significant point. That is the simplicity to integrate several PACSs. In the proposed system, only the meta data servers are integrated and integrated system can be constructed. This system also has the scalability of file access with along to the number of file numbers and file sizes. On the other hand, because meta-data server is integrated, the meta data server is the weakness of this system. To solve this defect, hieratical meta data servers are introduced. Because of this mechanism, not only fault--tolerant ability is increased but scalability of file access is also increased. To discuss the proposed system, the prototype system using Gfarm was implemented. For evaluating the implemented system, file search operating time of Gfarm and NFS were compared.
Toward information management in corporations (2)

NASA Astrophysics Data System (ADS)

Shibata, Mitsuru

If construction of inhouse information management systems in an advanced information society should be positioned along with the social information management, its base making begins with reviewing current paper filing systems. Since the problems which inhere in inhouse information management systems utilizing OA equipments also inhere in paper filing systems, the first step toward full scale inhouse information management should be to grasp and solve the fundamental problems in current filing systems. This paper describes analysis of fundamental problems in filing systems, making new type of offices and analysis of improvement needs in filing systems, and some points in improving filing systems.
American Telephone and Telegraph System V/MLS Release 1.1.2 Running on Unix System V Release 3.1.1

DTIC Science & Technology

1989-10-18

Evaluation Report AT&T System V/MLS SYSTEM OVERVIEW what is specified in the /mls/ passwd file. For a complete description of how this works, see page 62...from the publicly readable files /etc/ passwd and /etclgroup, to the protected files /mlslpasswd and /mls/group. These protected files are ASCII...files which are referred to as "shadow files". October 18, 1989 62 Final Evaluation Report AT&T System V/MLS SYSTEM OVERVIEW Imls/ passwd contains the
Fast parallel algorithm for slicing STL based on pipeline

NASA Astrophysics Data System (ADS)

Ma, Xulong; Lin, Feng; Yao, Bo

2016-05-01

In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.
Parallel Wavefront Analysis for a 4D Interferometer

NASA Technical Reports Server (NTRS)

Rao, Shanti R.

2011-01-01

This software provides a programming interface for automating data collection with a PhaseCam interferometer from 4D Technology, and distributing the image-processing algorithm across a cluster of general-purpose computers. Multiple instances of 4Sight (4D Technology s proprietary software) run on a networked cluster of computers. Each connects to a single server (the controller) and waits for instructions. The controller directs the interferometer to several images, then assigns each image to a different computer for processing. When the image processing is finished, the server directs one of the computers to collate and combine the processed images, saving the resulting measurement in a file on a disk. The available software captures approximately 100 images and analyzes them immediately. This software separates the capture and analysis processes, so that analysis can be done at a different time and faster by running the algorithm in parallel across several processors. The PhaseCam family of interferometers can measure an optical system in milliseconds, but it takes many seconds to process the data so that it is usable. In characterizing an adaptive optics system, like the next generation of astronomical observatories, thousands of measurements are required, and the processing time quickly becomes excessive. A programming interface distributes data processing for a PhaseCam interferometer across a Windows computing cluster. A scriptable controller program coordinates data acquisition from the interferometer, storage on networked hard disks, and parallel processing. Idle time of the interferometer is minimized. This architecture is implemented in Python and JavaScript, and may be altered to fit a customer s needs.
78 FR 21930 - Aquenergy Systems, Inc.; Notice of Intent To File License Application, Filing of Pre-Application...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-04-12

... Systems, Inc.; Notice of Intent To File License Application, Filing of Pre-Application Document, and Approving Use of the Traditional Licensing Process a. Type of Filing: Notice of Intent to File License...: November 11, 2012. d. Submitted by: Aquenergy Systems, Inc., a fully owned subsidiaries of Enel Green Power...
Registered File Support for Critical Operations Files at (Space Infrared Telescope Facility) SIRTF

NASA Technical Reports Server (NTRS)

Turek, G.; Handley, Tom; Jacobson, J.; Rector, J.

2001-01-01

The SIRTF Science Center's (SSC) Science Operations System (SOS) has to contend with nearly one hundred critical operations files via comprehensive file management services. The management is accomplished via the registered file system (otherwise known as TFS) which manages these files in a registered file repository composed of a virtual file system accessible via a TFS server and a file registration database. The TFS server provides controlled, reliable, and secure file transfer and storage by registering all file transactions and meta-data in the file registration database. An API is provided for application programs to communicate with TFS servers and the repository. A command line client implementing this API has been developed as a client tool. This paper describes the architecture, current implementation, but more importantly, the evolution of these services based on evolving community use cases and emerging information system technology.
Distributed Finite Element Analysis Using a Transputer Network

NASA Technical Reports Server (NTRS)

Watson, James; Favenesi, James; Danial, Albert; Tombrello, Joseph; Yang, Dabby; Reynolds, Brian; Turrentine, Ronald; Shephard, Mark; Baehmann, Peggy

1989-01-01

The principal objective of this research effort was to demonstrate the extraordinarily cost effective acceleration of finite element structural analysis problems using a transputer-based parallel processing network. This objective was accomplished in the form of a commercially viable parallel processing workstation. The workstation is a desktop size, low-maintenance computing unit capable of supercomputer performance yet costs two orders of magnitude less. To achieve the principal research objective, a transputer based structural analysis workstation termed XPFEM was implemented with linear static structural analysis capabilities resembling commercially available NASTRAN. Finite element model files, generated using the on-line preprocessing module or external preprocessing packages, are downloaded to a network of 32 transputers for accelerated solution. The system currently executes at about one third Cray X-MP24 speed but additional acceleration appears likely. For the NASA selected demonstration problem of a Space Shuttle main engine turbine blade model with about 1500 nodes and 4500 independent degrees of freedom, the Cray X-MP24 required 23.9 seconds to obtain a solution while the transputer network, operated from an IBM PC-AT compatible host computer, required 71.7 seconds. Consequently, the $80,000 transputer network demonstrated a cost-performance ratio about 60 times better than the $15,000,000 Cray X-MP24 system.
PRATHAM: Parallel Thermal Hydraulics Simulations using Advanced Mesoscopic Methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joshi, Abhijit S; Jain, Prashant K; Mudrich, Jaime A

2012-01-01

At the Oak Ridge National Laboratory, efforts are under way to develop a 3D, parallel LBM code called PRATHAM (PaRAllel Thermal Hydraulic simulations using Advanced Mesoscopic Methods) to demonstrate the accuracy and scalability of LBM for turbulent flow simulations in nuclear applications. The code has been developed using FORTRAN-90, and parallelized using the message passing interface MPI library. Silo library is used to compact and write the data files, and VisIt visualization software is used to post-process the simulation data in parallel. Both the single relaxation time (SRT) and multi relaxation time (MRT) LBM schemes have been implemented in PRATHAM.more » To capture turbulence without prohibitively increasing the grid resolution requirements, an LES approach [5] is adopted allowing large scale eddies to be numerically resolved while modeling the smaller (subgrid) eddies. In this work, a Smagorinsky model has been used, which modifies the fluid viscosity by an additional eddy viscosity depending on the magnitude of the rate-of-strain tensor. In LBM, this is achieved by locally varying the relaxation time of the fluid.« less

McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression

DOE PAGES

Islam, Tanzima Zerin; Mohror, Kathryn; Bagchi, Saurabh; ...

2013-01-01

High performance computing (HPC) systems use checkpoint-restart to tolerate failures. Typically, applications store their states in checkpoints on a parallel file system (PFS). As applications scale up, checkpoint-restart incurs high overheads due to contention for PFS resources. The high overheads force large-scale applications to reduce checkpoint frequency, which means more compute time is lost in the event of failure. We alleviate this problem through a scalable checkpoint-restart system, mcrEngine. McrEngine aggregates checkpoints from multiple application processes with knowledge of the data semantics available through widely-used I/O libraries, e.g., HDF5 and netCDF, and compresses them. Our novel scheme improves compressibility ofmore » checkpoints up to 115% over simple concatenation and compression. Our evaluation with large-scale application checkpoints show that mcrEngine reduces checkpointing overhead by up to 87% and restart overhead by up to 62% over a baseline with no aggregation or compression.« less
The computerized OMAHA system in microsoft office excel.

PubMed

Lai, Xiaobin; Wong, Frances K Y; Zhang, Peiqiang; Leung, Carenx W Y; Lee, Lai H; Wong, Jessica S Y; Lo, Yim F; Ching, Shirley S Y

2014-01-01

The OMAHA System was adopted as the documentation system in an interventional study. To systematically record client care and facilitate data analysis, two Office Excel files were developed. The first Excel file (File A) was designed to record problems, care procedure, and outcomes for individual clients according to the OMAHA System. It was used by the intervention nurses in the study. The second Excel file (File B) was the summary of all clients that had been automatically extracted from File A. Data in File B can be analyzed directly in Excel or imported in PASW for further analysis. Both files have four parts to record basic information and the three parts of the OMAHA System. The computerized OMAHA System simplified the documentation procedure and facilitated the management and analysis of data.
Efficient system interrupt concept design at the microprogramming level

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fakharzadeh, M.M.

1989-01-01

Over the past decade the demand for high speed super microcomputers has been tremendously increased. To satisfy this demand many high speed 32-bit microcomputers have been designed. However, the currently available 32-bit systems do not provide an adequate solution to many highly demanding problems such as in multitasking, and in interrupt driven applications, which both require context switching. Systems for these purposes usually incorporate sophisticated software. In order to be efficient, a high end microprocessor based system must satisfy stringent software demands. Although these microprocessors use the latest technology in the fabrication design and run at a very high speed,more » they still suffer from insufficient hardware support for such applications. All too often, this lack also is the premier cause of execution inefficiency. In this dissertation a micro-programmable control unit and operation unit is considered in an advanced design. An automaton controller is designed for high speed micro-level interrupt handling. Different stack models are designed for the single task and multitasking environment. The stacks are used for storage of various components of the processor during the interrupt calls, procedure calls, and task switching. A universal (as an example seven port) register file is designed for high speed parameter passing, and intertask communication in the multitasking environment. In addition, the register file provides a direct path between ALU and the peripheral data which is important in real-time control applications. The overall system is a highly parallel architecture, with no pipeline and internal cache memory, which allows the designer to be able to predict the processor's behavior during the critical times.« less
Permanent-File-Validation Utility Computer Program

NASA Technical Reports Server (NTRS)

Derry, Stephen D.

1988-01-01

Errors in files detected and corrected during operation. Permanent File Validation (PFVAL) utility computer program provides CDC CYBER NOS sites with mechanism to verify integrity of permanent file base. Locates and identifies permanent file errors in Mass Storage Table (MST) and Track Reservation Table (TRT), in permanent file catalog entries (PFC's) in permit sectors, and in disk sector linkage. All detected errors written to listing file and system and job day files. Program operates by reading system tables , catalog track, permit sectors, and disk linkage bytes to vaidate expected and actual file linkages. Used extensively to identify and locate errors in permanent files and enable online correction, reducing computer-system downtime.
76 FR 66695 - Privacy Act of 1974; System of Records

Federal Register 2010, 2011, 2012, 2013, 2014

2011-10-27

.... DWHS P04 System name: Reduction-In-Force Case Files (February 11, 2011, 76 FR 7825). Changes....'' * * * * * DWHS P04 System name: Reduction-In-Force Case Files. System location: Human Resources Directorate... system: Storage: Paper file folders. Retrievability: Filed alphabetically by last name. Safeguards...
Personal File Management for the Health Sciences.

ERIC Educational Resources Information Center

Apostle, Lynne

Written as an introduction to the concepts of creating a personal or reprint file, this workbook discusses both manual and computerized systems, with emphasis on the preliminary groundwork that needs to be done before starting any filing system. A file assessment worksheet is provided; considerations in developing a personal filing system are…
Survey of MapReduce frame operation in bioinformatics.

PubMed

Zou, Quan; Li, Xu-Bin; Jiang, Wen-Rui; Lin, Zi-Yu; Li, Gui-Lin; Chen, Ke

2014-07-01

Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
47 CFR 1.10008 - What are IBFS file numbers?

Code of Federal Regulations, 2013 CFR

2013-10-01

... Random Selection International Bureau Filing System § 1.10008 What are IBFS file numbers? (a) We assign...) For a description of file number information, see The International Bureau Filing System File Number... 47 Telecommunication 1 2013-10-01 2013-10-01 false What are IBFS file numbers? 1.10008 Section 1...
47 CFR 1.10008 - What are IBFS file numbers?

Code of Federal Regulations, 2010 CFR

2010-10-01

... Bureau Filing System § 1.10008 What are IBFS file numbers? (a) We assign file numbers to electronic... information, see The International Bureau Filing System File Number Format Public Notice, DA-04-568 (released... 47 Telecommunication 1 2010-10-01 2010-10-01 false What are IBFS file numbers? 1.10008 Section 1...
47 CFR 1.10008 - What are IBFS file numbers?

Code of Federal Regulations, 2012 CFR

2012-10-01

... Random Selection International Bureau Filing System § 1.10008 What are IBFS file numbers? (a) We assign...) For a description of file number information, see The International Bureau Filing System File Number... 47 Telecommunication 1 2012-10-01 2012-10-01 false What are IBFS file numbers? 1.10008 Section 1...
47 CFR 1.10008 - What are IBFS file numbers?

Code of Federal Regulations, 2011 CFR

2011-10-01

... Bureau Filing System § 1.10008 What are IBFS file numbers? (a) We assign file numbers to electronic... information, see The International Bureau Filing System File Number Format Public Notice, DA-04-568 (released... 47 Telecommunication 1 2011-10-01 2011-10-01 false What are IBFS file numbers? 1.10008 Section 1...
47 CFR 1.10008 - What are IBFS file numbers?

Code of Federal Regulations, 2014 CFR

2014-10-01

... Random Selection International Bureau Filing System § 1.10008 What are IBFS file numbers? (a) We assign...) For a description of file number information, see The International Bureau Filing System File Number... 47 Telecommunication 1 2014-10-01 2014-10-01 false What are IBFS file numbers? 1.10008 Section 1...
BerkeleyGW: A massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures

NASA Astrophysics Data System (ADS)

Deslippe, Jack; Samsonidze, Georgy; Strubbe, David A.; Jain, Manish; Cohen, Marvin L.; Louie, Steven G.

2012-06-01

BerkeleyGW is a massively parallel computational package for electron excited-state properties that is based on the many-body perturbation theory employing the ab initio GW and GW plus Bethe-Salpeter equation methodology. It can be used in conjunction with many density-functional theory codes for ground-state properties, including PARATEC, PARSEC, Quantum ESPRESSO, SIESTA, and Octopus. The package can be used to compute the electronic and optical properties of a wide variety of material systems from bulk semiconductors and metals to nanostructured materials and molecules. The package scales to 10 000s of CPUs and can be used to study systems containing up to 100s of atoms. Program summaryProgram title: BerkeleyGW Catalogue identifier: AELG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Open source BSD License. See code for licensing details. No. of lines in distributed program, including test data, etc.: 576 540 No. of bytes in distributed program, including test data, etc.: 110 608 809 Distribution format: tar.gz Programming language: Fortran 90, C, C++, Python, Perl, BASH Computer: Linux/UNIX workstations or clusters Operating system: Tested on a variety of Linux distributions in parallel and serial as well as AIX and Mac OSX RAM: (50-2000) MB per CPU (Highly dependent on system size) Classification: 7.2, 7.3, 16.2, 18 External routines: BLAS, LAPACK, FFTW, ScaLAPACK (optional), MPI (optional). All available under open-source licenses. Nature of problem: The excited state properties of materials involve the addition or subtraction of electrons as well as the optical excitations of electron-hole pairs. The excited particles interact strongly with other electrons in a material system. This interaction affects the electronic energies, wavefunctions and lifetimes. It is well known that ground-state theories, such as standard methods based on density-functional theory, fail to correctly capture this physics. Solution method: We construct and solve the Dyson's equation for the quasiparticle energies and wavefunctions within the GW approximation for the electron self-energy. We additionally construct and solve the Bethe-Salpeter equation for the correlated electron-hole (exciton) wavefunctions and excitation energies. Restrictions: The material size is limited in practice by the computational resources available. Materials with up to 500 atoms per periodic cell can be studied on large HPCs. Additional comments: The distribution file for this program is approximately 110 Mbytes and therefore is not delivered directly when download or E-mail is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: 1-1000 minutes (depending greatly on system size and processor number).
MPgrafic: A parallel MPI version of Grafic-1

NASA Astrophysics Data System (ADS)

Prunet, Simon; Pichon, Christophe

2013-04-01

MPgrafic is a parallel MPI version of Grafic-1 which can produce large cosmological initial conditions on a cluster without requiring shared memory. The real Fourier transforms are carried in place using fftw while minimizing the amount of used memory (at the expense of performance) in the spirit of Grafic-1. The writing of the output file is also carried in parallel. In addition to the technical parallelization, it provides three extensions over Grafic-1: it can produce power spectra with baryon wiggles (DJ Eisenstein and W. Hu, Ap. J. 496);it has the optional ability to load a lower resolution noise map corresponding to the low frequency component which will fix the larger scale modes of the simulation (extra flag 0/1 at the end of the input process) in the spirit of Grafic-2;it can be used in conjunction with constrfield, which generates initial conditions phases from a list of local constraints on density, tidal field density gradient and velocity.
ParallelStructure: A R Package to Distribute Parallel Runs of the Population Genetics Program STRUCTURE on Multi-Core Computers

PubMed Central

Besnier, Francois; Glover, Kevin A.

2013-01-01

This software package provides an R-based framework to make use of multi-core computers when running analyses in the population genetics program STRUCTURE. It is especially addressed to those users of STRUCTURE dealing with numerous and repeated data analyses, and who could take advantage of an efficient script to automatically distribute STRUCTURE jobs among multiple processors. It also consists of additional functions to divide analyses among combinations of populations within a single data set without the need to manually produce multiple projects, as it is currently the case in STRUCTURE. The package consists of two main functions: MPI_structure() and parallel_structure() as well as an example data file. We compared the performance in computing time for this example data on two computer architectures and showed that the use of the present functions can result in several-fold improvements in terms of computation time. ParallelStructure is freely available at https://r-forge.r-project.org/projects/parallstructure/. PMID:23923012
Apollo: AN Automatic Procedure to Forecast Transport and Deposition of Tephra

NASA Astrophysics Data System (ADS)

Folch, A.; Costa, A.; Macedonio, G.

2007-05-01

Volcanic ash fallout represents a serious threat to communities around active volcanoes. Reliable short term predictions constitute a valuable support for to mitigate the effects of fallout on the surrounding area during an episode of crisis. We present a platform-independent automatic procedure aimed to daily forecast volcanic ash dispersal. The procedure builds on a series of programs and interfaces that allow an automatic data/results flow. Firstly the procedure downloads mesoscale meteorological forecasts for the region and period of interest, filters and converts data from its native format (typically GRIB format files), and sets up the CALMET diagnostic meteorological model to obtain hourly wind field and micro-meteorological variables on a finer mesh. Secondly a 1-D version of the buoyant plume equations assesses the distribution of mass along the eruptive column depending on the obtained wind field and on the conditions at the vent (granulometry, mass flow rate, etc.). All these data are used as input for the ash dispersion model(s). Any model able to face physical complexity and coupling processes with adequate solving times can be plugged into the system by means of an interface. Currently, the procedure contains the models HAZMAP, TEPHRA and FALL3D, the latter in both serial and parallel versions. Parallelization of FALL3D is done at two levels one for particle classes and one for spatial domain. The last step is to post-processes the model(s) outcomes to end up with homogeneous maps written on portable format files. Maps plot relevant quantities such as predicted ground load, expected deposit thickness or visual and flight safety concentration thresholds. Several applications are shown as examples.
A File Archival System

NASA Technical Reports Server (NTRS)

Fanselow, J. L.; Vavrus, J. L.

1984-01-01

ARCH, file archival system for DEC VAX, provides for easy offline storage and retrieval of arbitrary files on DEC VAX system. System designed to eliminate situations that tie up disk space and lead to confusion when different programers develop different versions of same programs and associated files.
Novel 3D Compression Methods for Geometry, Connectivity and Texture

NASA Astrophysics Data System (ADS)

Siddeq, M. M.; Rodrigues, M. A.

2016-06-01

A large number of applications in medical visualization, games, engineering design, entertainment, heritage, e-commerce and so on require the transmission of 3D models over the Internet or over local networks. 3D data compression is an important requirement for fast data storage, access and transmission within bandwidth limitations. The Wavefront OBJ (object) file format is commonly used to share models due to its clear simple design. Normally each OBJ file contains a large amount of data (e.g. vertices and triangulated faces, normals, texture coordinates and other parameters) describing the mesh surface. In this paper we introduce a new method to compress geometry, connectivity and texture coordinates by a novel Geometry Minimization Algorithm (GM-Algorithm) in connection with arithmetic coding. First, each vertex ( x, y, z) coordinates are encoded to a single value by the GM-Algorithm. Second, triangle faces are encoded by computing the differences between two adjacent vertex locations, which are compressed by arithmetic coding together with texture coordinates. We demonstrate the method on large data sets achieving compression ratios between 87 and 99 % without reduction in the number of reconstructed vertices and triangle faces. The decompression step is based on a Parallel Fast Matching Search Algorithm (Parallel-FMS) to recover the structure of the 3D mesh. A comparative analysis of compression ratios is provided with a number of commonly used 3D file formats such as VRML, OpenCTM and STL highlighting the performance and effectiveness of the proposed method.
75 FR 65467 - Combined Notice of Filings No. 1

Federal Register 2010, 2011, 2012, 2013, 2014

2010-10-25

...: Venice Gathering System, L.L.C. Description: Venice Gathering System, L.L.C. submits tariff filing per 154.203: Venice Gathering System Rate Settlement Compliance Filing to be effective 11/1/2010. Filed...
Register file soft error recovery

DOEpatents

Fleischer, Bruce M.; Fox, Thomas W.; Wait, Charles D.; Muff, Adam J.; Watson, III, Alfred T.

2013-10-15

Register file soft error recovery including a system that includes a first register file and a second register file that mirrors the first register file. The system also includes an arithmetic pipeline for receiving data read from the first register file, and error detection circuitry to detect whether the data read from the first register file includes corrupted data. The system further includes error recovery circuitry to insert an error recovery instruction into the arithmetic pipeline in response to detecting the corrupted data. The inserted error recovery instruction replaces the corrupted data in the first register file with a copy of the data from the second register file.

48 CFR 304.803-70 - Contract/order file organization and use of checklists.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 4 2013-10-01 2013-10-01 false Contract/order file organization and use of checklists. 304.803-70 Section 304.803-70 Federal Acquisition Regulations System HEALTH... content of HHS contract and order files, OPDIVs shall use the folder filing system and accompanying file...
48 CFR 304.803-70 - Contract/order file organization and use of checklists.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 4 2010-10-01 2010-10-01 false Contract/order file organization and use of checklists. 304.803-70 Section 304.803-70 Federal Acquisition Regulations System HEALTH... content of HHS contract and order files, OPDIVs shall use the folder filing system and accompanying file...
48 CFR 304.803-70 - Contract/order file organization and use of checklists.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 4 2011-10-01 2011-10-01 false Contract/order file organization and use of checklists. 304.803-70 Section 304.803-70 Federal Acquisition Regulations System HEALTH... content of HHS contract and order files, OPDIVs shall use the folder filing system and accompanying file...
48 CFR 304.803-70 - Contract/order file organization and use of checklists.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 4 2012-10-01 2012-10-01 false Contract/order file organization and use of checklists. 304.803-70 Section 304.803-70 Federal Acquisition Regulations System HEALTH... content of HHS contract and order files, OPDIVs shall use the folder filing system and accompanying file...
48 CFR 304.803-70 - Contract/order file organization and use of checklists.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 4 2014-10-01 2014-10-01 false Contract/order file organization and use of checklists. 304.803-70 Section 304.803-70 Federal Acquisition Regulations System HEALTH... content of HHS contract and order files, OPDIVs shall use the folder filing system and accompanying file...
To evaluate and compare the efficacy, cleaning ability of hand and two rotary systems in root canal retreatment.

PubMed

Shivanand, Sunita; Patil, Chetan R; Thangala, Venugopal; Kumar, Pabbati Ravi; Sachdeva, Jyoti; Krishna, Akash

2013-05-01

To evaluate and compare the efficacy, cleaning ability of hand and two rotary systems in root canal retreatment. Sixty extracted premolars were retreated with following systems: Group -ProTaper Universal retreatment files, Group 2-ProFile system, Group 3-H-file. Specimens were split longitudinally and amount of remaining gutta-percha on the canal walls was assessed using direct visual scoring with the aid of stereomicroscope. Results were statistically analyzed using ANOVA test. Completely clean root canal walls were not achieved with any of the techniques investigated. However, all three systems proved to be effective for gutta-percha removal. Significant difference was found between ProTaper universal retreatment file and H-file, and also between ProFile and H-file. Under the conditions of the present study, ProTaper Universal retreatment files left significantly less guttapercha and sealer than ProFile and H-file. Rotary systems in combination with gutta-percha solvents can perform superiorly as compared to the time tested traditional hand instrumentation in root canal retreatment.
LAMMPS framework for dynamic bonding and an application modeling DNA

NASA Astrophysics Data System (ADS)

Svaneborg, Carsten

2012-08-01

We have extended the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) to support directional bonds and dynamic bonding. The framework supports stochastic formation of new bonds, breakage of existing bonds, and conversion between bond types. Bond formation can be controlled to limit the maximal functionality of a bead with respect to various bond types. Concomitant with the bond dynamics, angular and dihedral interactions are dynamically introduced between newly connected triplets and quartets of beads, where the interaction type is determined from the local pattern of bead and bond types. When breaking bonds, all angular and dihedral interactions involving broken bonds are removed. The framework allows chemical reactions to be modeled, and use it to simulate a simplistic, coarse-grained DNA model. The resulting DNA dynamics illustrates the power of the present framework. Catalogue identifier: AEME_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEME_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public Licence No. of lines in distributed program, including test data, etc.: 2 243 491 No. of bytes in distributed program, including test data, etc.: 771 Distribution format: tar.gz Programming language: C++ Computer: Single and multiple core servers Operating system: Linux/Unix/Windows Has the code been vectorized or parallelized?: Yes. The code has been parallelized by the use of MPI directives. RAM: 1 Gb Classification: 16.11, 16.12 Nature of problem: Simulating coarse-grain models capable of chemistry e.g. DNA hybridization dynamics. Solution method: Extending LAMMPS to handle dynamic bonding and directional bonds. Unusual features: Allows bonds to be created and broken while angular and dihedral interactions are kept consistent. Additional comments: The distribution file for this program is approximately 36 Mbytes and therefore is not delivered directly when download or E-mail is requested. Instead an html file giving details of how the program can be obtained is sent. Running time: Hours to days. The examples provided in the distribution take just seconds to run.
A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.

PubMed

Siretskiy, Alexey; Sundqvist, Tore; Voznesenskiy, Mikhail; Spjuth, Ola

2015-01-01

New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology. In this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories. From our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable option for the common data sizes that are currently used in massively parallel sequencing. Given that datasets are expected to increase over time, Hadoop is a framework that we envision will have an increasingly important role in future biological data analysis.
High-Performance I/O: HDF5 for Lattice QCD

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kurth, Thorsten; Pochinsky, Andrew; Sarje, Abhinav

2015-01-01

Practitioners of lattice QCD/QFT have been some of the primary pioneer users of the state-of-the-art high-performance-computing systems, and contribute towards the stress tests of such new machines as soon as they become available. As with all aspects of high-performance-computing, I/O is becoming an increasingly specialized component of these systems. In order to take advantage of the latest available high-performance I/O infrastructure, to ensure reliability and backwards compatibility of data files, and to help unify the data structures used in lattice codes, we have incorporated parallel HDF5 I/O into the SciDAC supported USQCD software stack. Here we present the design andmore » implementation of this I/O framework. Our HDF5 implementation outperforms optimized QIO at the 10-20% level and leaves room for further improvement by utilizing appropriate dataset chunking.« less
High-Performance I/O: HDF5 for Lattice QCD

DOE PAGES

Kurth, Thorsten; Pochinsky, Andrew; Sarje, Abhinav; ...

2017-05-09

Practitioners of lattice QCD/QFT have been some of the primary pioneer users of the state-of-the-art high-performance-computing systems, and contribute towards the stress tests of such new machines as soon as they become available. As with all aspects of high-performance-computing, I/O is becoming an increasingly specialized component of these systems. In order to take advantage of the latest available high-performance I/O infrastructure, to ensure reliability and backwards compatibility of data files, and to help unify the data structures used in lattice codes, we have incorporated parallel HDF5 I/O into the SciDAC supported USQCD software stack. Here we present the design andmore » implementation of this I/O framework. Our HDF5 implementation outperforms optimized QIO at the 10-20% level and leaves room for further improvement by utilizing appropriate dataset chunking.« less
Archive Management of NASA Earth Observation Data to Support Cloud Analysis

NASA Technical Reports Server (NTRS)

Lynnes, Christopher; Baynes, Kathleen; McInerney, Mark A.

2017-01-01

NASA collects, processes and distributes petabytes of Earth Observation (EO) data from satellites, aircraft, in situ instruments and model output, with an order of magnitude increase expected by 2024. Cloud-based web object storage (WOS) of these data can simplify the execution of such an increase. More importantly, it can also facilitate user analysis of those volumes by making the data available to the massively parallel computing power in the cloud. However, storing EO data in cloud WOS has a ripple effect throughout the NASA archive system with unexpected challenges and opportunities. One challenge is modifying data servicing software (such as Web Coverage Service servers) to access and subset data that are no longer on a directly accessible file system, but rather in cloud WOS. Opportunities include refactoring of the archive software to a cloud-native architecture; virtualizing data products by computing on demand; and reorganizing data to be more analysis-friendly.
GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

NASA Astrophysics Data System (ADS)

Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

2017-08-01

The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy improvement, the KNL platform was 37.5 % more efficient on power consumption compared with the CPU platform. The optimisations also enabled much further parallel scalability on both the CPU cluster and the KNL cluster scaled to 40 CPU nodes and 30 KNL nodes, with a parallel efficiency of 70.4 and 42.2 %, respectively.
smallWig: parallel compression of RNA-seq WIG files.

PubMed

Wang, Zhiying; Weissman, Tsachy; Milenkovic, Olgica

2016-01-15

We developed a new lossless compression method for WIG data, named smallWig, offering the best known compression rates for RNA-seq data and featuring random access functionalities that enable visualization, summary statistics analysis and fast queries from the compressed files. Our approach results in order of magnitude improvements compared with bigWig and ensures compression rates only a fraction of those produced by cWig. The key features of the smallWig algorithm are statistical data analysis and a combination of source coding methods that ensure high flexibility and make the algorithm suitable for different applications. Furthermore, for general-purpose file compression, the compression rate of smallWig approaches the empirical entropy of the tested WIG data. For compression with random query features, smallWig uses a simple block-based compression scheme that introduces only a minor overhead in the compression rate. For archival or storage space-sensitive applications, the method relies on context mixing techniques that lead to further improvements of the compression rate. Implementations of smallWig can be executed in parallel on different sets of chromosomes using multiple processors, thereby enabling desirable scaling for future transcriptome Big Data platforms. The development of next-generation sequencing technologies has led to a dramatic decrease in the cost of DNA/RNA sequencing and expression profiling. RNA-seq has emerged as an important and inexpensive technology that provides information about whole transcriptomes of various species and organisms, as well as different organs and cellular communities. The vast volume of data generated by RNA-seq experiments has significantly increased data storage costs and communication bandwidth requirements. Current compression tools for RNA-seq data such as bigWig and cWig either use general-purpose compressors (gzip) or suboptimal compression schemes that leave significant room for improvement. To substantiate this claim, we performed a statistical analysis of expression data in different transform domains and developed accompanying entropy coding methods that bridge the gap between theoretical and practical WIG file compression rates. We tested different variants of the smallWig compression algorithm on a number of integer-and real- (floating point) valued RNA-seq WIG files generated by the ENCODE project. The results reveal that, on average, smallWig offers 18-fold compression rate improvements, up to 2.5-fold compression time improvements, and 1.5-fold decompression time improvements when compared with bigWig. On the tested files, the memory usage of the algorithm never exceeded 90 KB. When more elaborate context mixing compressors were used within smallWig, the obtained compression rates were as much as 23 times better than those of bigWig. For smallWig used in the random query mode, which also supports retrieval of the summary statistics, an overhead in the compression rate of roughly 3-17% was introduced depending on the chosen system parameters. An increase in encoding and decoding time of 30% and 55% represents an additional performance loss caused by enabling random data access. We also implemented smallWig using multi-processor programming. This parallelization feature decreases the encoding delay 2-3.4 times compared with that of a single-processor implementation, with the number of processors used ranging from 2 to 8; in the same parameter regime, the decoding delay decreased 2-5.2 times. The smallWig software can be downloaded from: http://stanford.edu/~zhiyingw/smallWig/smallwig.html, http://publish.illinois.edu/milenkovic/, http://web.stanford.edu/~tsachy/. zhiyingw@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
47 CFR 1.10006 - Is electronic filing mandatory?

Code of Federal Regulations, 2011 CFR

2011-10-01

... 47 Telecommunication 1 2011-10-01 2011-10-01 false Is electronic filing mandatory? 1.10006 Section... International Bureau Filing System § 1.10006 Is electronic filing mandatory? Electronic filing is mandatory for... System (IBFS) form is available. Applications for which an electronic form is not available must be filed...
An Ephemeral Burst-Buffer File System for Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Teng; Moody, Adam; Yu, Weikuan

BurstFS is a distributed file system for node-local burst buffers on high performance computing systems. BurstFS presents a shared file system space across the burst buffers so that applications that use shared files can access the highly-scalable burst buffers without changing their applications.
5 CFR 293.504 - Composition of, and access to, the Employee Medical File System.

Code of Federal Regulations, 2013 CFR

2013-01-01

... Employee Medical File System. 293.504 Section 293.504 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT CIVIL SERVICE REGULATIONS PERSONNEL RECORDS Employee Medical File System Records § 293.504 Composition of, and access to, the Employee Medical File System. (a) All employee occupational medical records...
5 CFR 293.504 - Composition of, and access to, the Employee Medical File System.

Code of Federal Regulations, 2014 CFR

2014-01-01

... Employee Medical File System. 293.504 Section 293.504 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT CIVIL SERVICE REGULATIONS PERSONNEL RECORDS Employee Medical File System Records § 293.504 Composition of, and access to, the Employee Medical File System. (a) All employee occupational medical records...
5 CFR 293.504 - Composition of, and access to, the Employee Medical File System.

Code of Federal Regulations, 2010 CFR

2010-01-01

... Employee Medical File System. 293.504 Section 293.504 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT CIVIL SERVICE REGULATIONS PERSONNEL RECORDS Employee Medical File System Records § 293.504 Composition of, and access to, the Employee Medical File System. (a) All employee occupational medical records...
5 CFR 293.504 - Composition of, and access to, the Employee Medical File System.

Code of Federal Regulations, 2012 CFR

2012-01-01

... Employee Medical File System. 293.504 Section 293.504 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT CIVIL SERVICE REGULATIONS PERSONNEL RECORDS Employee Medical File System Records § 293.504 Composition of, and access to, the Employee Medical File System. (a) All employee occupational medical records...
5 CFR 293.504 - Composition of, and access to, the Employee Medical File System.

Code of Federal Regulations, 2011 CFR

2011-01-01

... Employee Medical File System. 293.504 Section 293.504 Administrative Personnel OFFICE OF PERSONNEL MANAGEMENT CIVIL SERVICE REGULATIONS PERSONNEL RECORDS Employee Medical File System Records § 293.504 Composition of, and access to, the Employee Medical File System. (a) All employee occupational medical records...

Quantitative evaluation of apically extruded debris with different single-file systems: Reciproc, F360 and OneShape versus Mtwo.

PubMed

Bürklein, S; Benten, S; Schäfer, E

2014-05-01

To assess in a laboratory setting the amount of apically extruded debris associated with different single-file nickel-titanium instrumentation systems compared to one multiple-file rotary system. Eighty human mandibular central incisors were randomly assigned to four groups (n = 20 teeth per group). The root canals were instrumented according to the manufacturers' instructions using the reciprocating single-file system Reciproc, the single-file rotary systems F360 and OneShape and the multiple-file rotary Mtwo instruments. The apically extruded debris was collected and dried in pre-weighed glass vials. The amount of debris was assessed with a micro balance and statistically analysed using anova and post hoc Student-Newman-Keuls test. The time required to prepare the canals with the different instruments was also recorded. Reciproc produced significantly more debris compared to all other systems (P < 0.05). No significant difference was noted between the two single-file rotary systems and the multiple-file rotary system (P > 0.05). Instrumentation with the three single-file systems was significantly faster than with Mtwo (P < 0.05). Under the condition of this study, all systems caused apical debris extrusion. Rotary instrumentation was associated with less debris extrusion compared to reciprocal instrumentation. © 2013 International Endodontic Journal. Published by John Wiley & Sons Ltd.
Usage analysis of user files in UNIX

NASA Technical Reports Server (NTRS)

Devarakonda, Murthy V.; Iyer, Ravishankar K.

1987-01-01

Presented is a user-oriented analysis of short term file usage in a 4.2 BSD UNIX environment. The key aspect of this analysis is a characterization of users and files, which is a departure from the traditional approach of analyzing file references. Two characterization measures are employed: accesses-per-byte (combining fraction of a file referenced and number of references) and file size. This new approach is shown to distinguish differences in files as well as users, which cam be used in efficient file system design, and in creating realistic test workloads for simulations. A multi-stage gamma distribution is shown to closely model the file usage measures. Even though overall file sharing is small, some files belonging to a bulletin board system are accessed by many users, simultaneously and otherwise. Over 50% of users referenced files owned by other users, and over 80% of all files were involved in such references. Based on the differences in files and users, suggestions to improve the system performance were also made.
48 CFR 204.802 - Contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 3 2014-10-01 2014-10-01 false Contract files. 204.802 Section 204.802 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.802 Contract files. Official contract...
48 CFR 204.802 - Contract files.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 3 2011-10-01 2011-10-01 false Contract files. 204.802 Section 204.802 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.802 Contract files. Official contract...
48 CFR 204.802 - Contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false Contract files. 204.802 Section 204.802 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.802 Contract files. Official contract...
48 CFR 204.802 - Contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 3 2012-10-01 2012-10-01 false Contract files. 204.802 Section 204.802 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.802 Contract files. Official contract...
48 CFR 204.802 - Contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 3 2013-10-01 2013-10-01 false Contract files. 204.802 Section 204.802 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.802 Contract files. Official contract...
Twin-tailed fail-over for fileservers maintaining full performance in the presence of a failure

DOEpatents

Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Steinmacher-Burow, Burkhard D.

2008-02-12

A method for maintaining full performance of a file system in the presence of a failure is provided. The file system having N storage devices, where N is an integer greater than zero and N primary file servers where each file server is operatively connected to a corresponding storage device for accessing files therein. The file system further having a secondary file server operatively connected to at least one of the N storage devices. The method including: switching the connection of one of the N storage devices to the secondary file server upon a failure of one of the N primary file servers; and switching the connections of one or more of the remaining storage devices to a primary file server other than the failed file server as necessary so as to prevent a loss in performance and to provide each storage device with an operating file server.
10 CFR 110.89 - Filing and service.

Code of Federal Regulations, 2010 CFR

2010-01-01

...: Rulemakings and Adjudications Staff or via the E-Filing system, following the procedure set forth in 10 CFR 2.302. Filing by mail is complete upon deposit in the mail. Filing via the E-Filing system is completed... residence with some occupant of suitable age and discretion; (2) Following the requirements for E-Filing in...
75 FR 27986 - Electronic Filing System-Web (EFS-Web) Contingency Option

Federal Register 2010, 2011, 2012, 2013, 2014

2010-05-19

...] Electronic Filing System--Web (EFS-Web) Contingency Option AGENCY: United States Patent and Trademark Office... availability of its patent electronic filing system, Electronic Filing System--Web (EFS-Web) by providing a new contingency option when the primary portal to EFS-Web has an unscheduled outage. Previously, the entire EFS...
29 CFR 102.119 - Privacy Act Regulations: notification as to whether a system of records contains records...

Code of Federal Regulations, 2012 CFR

2012-07-01

... Regional Office Files (NLRB-25), Regional Advice and Injunction Litigation System (RAILS) and Associated Headquarters Files (NLRB-28), and Appeals Case Tracking System (ACTS) and Associated Headquarters Files (NLRB... Judicial Case Management Systems-Pending Case List (JCMS-PCL) and Associated Headquarters Files (NLRB-21...
29 CFR 102.119 - Privacy Act Regulations: notification as to whether a system of records contains records...

Code of Federal Regulations, 2011 CFR

2011-07-01

... Regional Office Files (NLRB-25), Regional Advice and Injunction Litigation System (RAILS) and Associated Headquarters Files (NLRB-28), and Appeals Case Tracking System (ACTS) and Associated Headquarters Files (NLRB... Judicial Case Management Systems-Pending Case List (JCMS-PCL) and Associated Headquarters Files (NLRB-21...
29 CFR 102.119 - Privacy Act Regulations: notification as to whether a system of records contains records...

Code of Federal Regulations, 2014 CFR

2014-07-01

... Regional Office Files (NLRB-25), Regional Advice and Injunction Litigation System (RAILS) and Associated Headquarters Files (NLRB-28), and Appeals Case Tracking System (ACTS) and Associated Headquarters Files (NLRB... Judicial Case Management Systems-Pending Case List (JCMS-PCL) and Associated Headquarters Files (NLRB-21...
29 CFR 102.119 - Privacy Act Regulations: notification as to whether a system of records contains records...

Code of Federal Regulations, 2013 CFR

2013-07-01

... Regional Office Files (NLRB-25), Regional Advice and Injunction Litigation System (RAILS) and Associated Headquarters Files (NLRB-28), and Appeals Case Tracking System (ACTS) and Associated Headquarters Files (NLRB... Judicial Case Management Systems-Pending Case List (JCMS-PCL) and Associated Headquarters Files (NLRB-21...
Deceit: A flexible distributed file system

NASA Technical Reports Server (NTRS)

Siegel, Alex; Birman, Kenneth; Marzullo, Keith

1989-01-01

Deceit, a distributed file system (DFS) being developed at Cornell, focuses on flexible file semantics in relation to efficiency, scalability, and reliability. Deceit servers are interchangeable and collectively provide the illusion of a single, large server machine to any clients of the Deceit service. Non-volatile replicas of each file are stored on a subset of the file servers. The user is able to set parameters on a file to achieve different levels of availability, performance, and one-copy serializability. Deceit also supports a file version control mechanism. In contrast with many recent DFS efforts, Deceit can behave like a plain Sun Network File System (NFS) server and can be used by any NFS client without modifying any client software. The current Deceit prototype uses the ISIS Distributed Programming Environment for all communication and process group management, an approach that reduces system complexity and increases system robustness.
48 CFR 204.805 - Disposal of contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false Disposal of contract files. 204.805 Section 204.805 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.805 Disposal of contract files. (1...
48 CFR 204.804 - Closeout of contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 3 2014-10-01 2014-10-01 false Closeout of contract files. 204.804 Section 204.804 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.804 Closeout of contract files. (1...
48 CFR 204.804 - Closeout of contract files.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 3 2011-10-01 2011-10-01 false Closeout of contract files. 204.804 Section 204.804 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.804 Closeout of contract files...
48 CFR 204.805 - Disposal of contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 3 2014-10-01 2014-10-01 false Disposal of contract files. 204.805 Section 204.805 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.805 Disposal of contract files. (1...
48 CFR 204.805 - Disposal of contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 3 2012-10-01 2012-10-01 false Disposal of contract files. 204.805 Section 204.805 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.805 Disposal of contract files. (1...

48 CFR 204.805 - Disposal of contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 3 2013-10-01 2013-10-01 false Disposal of contract files. 204.805 Section 204.805 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.805 Disposal of contract files. (1...
48 CFR 204.804 - Closeout of contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false Closeout of contract files. 204.804 Section 204.804 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.804 Closeout of contract files...
48 CFR 204.804 - Closeout of contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 3 2012-10-01 2012-10-01 false Closeout of contract files. 204.804 Section 204.804 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.804 Closeout of contract files. (1...
48 CFR 204.805 - Disposal of contract files.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 3 2011-10-01 2011-10-01 false Disposal of contract files. 204.805 Section 204.805 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.805 Disposal of contract files. (1...
48 CFR 204.804 - Closeout of contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 3 2013-10-01 2013-10-01 false Closeout of contract files. 204.804 Section 204.804 Federal Acquisition Regulations System DEFENSE ACQUISITION REGULATIONS SYSTEM, DEPARTMENT OF DEFENSE GENERAL ADMINISTRATIVE MATTERS Contract Files 204.804 Closeout of contract files. (1...
EOS MLS Level 2 Data Processing Software Version 3

NASA Technical Reports Server (NTRS)

Livesey, Nathaniel J.; VanSnyder, Livesey W.; Read, William G.; Schwartz, Michael J.; Lambert, Alyn; Santee, Michelle L.; Nguyen, Honghanh T.; Froidevaux, Lucien; wang, Shuhui; Manney, Gloria L.;

2011-01-01

This software accepts the EOS MLS calibrated measurements of microwave radiances products and operational meteorological data, and produces a set of estimates of atmospheric temperature and composition. This version has been designed to be as flexible as possible. The software is controlled by a Level 2 Configuration File that controls all aspects of the software: defining the contents of state and measurement vectors, defining the configurations of the various forward models available, reading appropriate a priori spectroscopic and calibration data, performing retrievals, post-processing results, computing diagnostics, and outputting results in appropriate files. In production mode, the software operates in a parallel form, with one instance of the program acting as a master, coordinating the work of multiple slave instances on a cluster of computers, each computing the results for individual chunks of data. In addition, to do conventional retrieval calculations and producing geophysical products, the Level 2 Configuration File can instruct the software to produce files of simulated radiances based on a state vector formed from a set of geophysical product files taken as input. Combining both the retrieval and simulation tasks in a single piece of software makes it far easier to ensure that identical forward model algorithms and parameters are used in both tasks. This also dramatically reduces the complexity of the code maintenance effort.

Multiplexer/Demultiplexer Loading Tool (MDMLT)

NASA Technical Reports Server (NTRS)

Brewer, Lenox Allen; Hale, Elizabeth; Martella, Robert; Gyorfi, Ryan

2012-01-01

The purpose of the MDMLT is to improve the reliability and speed of loading multiplexers/demultiplexers (MDMs) in the Software Development and Integration Laboratory (SDIL) by automating the configuration management (CM) of the loads in the MDMs, automating the loading procedure, and providing the capability to load multiple or all MDMs concurrently. This loading may be accomplished in parallel, or single MDMs (remote). The MDMLT is a Web-based tool that is capable of loading the entire International Space Station (ISS) MDM configuration in parallel. It is able to load Flight Equivalent Units (FEUs), enhanced, standard, and prototype MDMs as well as both EEPROM (Electrically Erasable Programmable Read-Only Memory) and SSMMU (Solid State Mass Memory Unit) (MASS Memory). This software has extensive configuration management to track loading history, and the performance improvement means of loading the entire ISS MDM configuration of 49 MDMs in approximately 30 minutes, as opposed to 36 hours, which is what it took previously utilizing the flight method of S-Band uplink. The laptop version recently added to the MDMLT suite allows remote lab loading with the CM of information entered into a common database when it is reconnected to the network. This allows the program to reconfigure the test rigs quickly between shifts, allowing the lab to support a variety of onboard configurations during a single day, based on upcoming or current missions. The MDMLT Computer Software Configuration Item (CSCI) supports a Web-based command and control interface to the user. An interface to the SDIL File Transfer Protocol (FTP) server is supported to import Integrated Flight Loads (IFLs) and Internal Product Release Notes (IPRNs) into the database. An interface to the Monitor and Control System (MCS) is supported to control the power state, and to enable or disable the debug port of the MDMs to be loaded. Two direct interfaces to the MDM are supported: a serial interface (debug port) to receive MDM memory dump data and the calculated checksum, and the Small Computer System Interface (SCSI) to transfer load files to MDMs with hard disks. File transfer from the MDM Loading Tool to EEPROM within the MDM is performed via the MILSTD- 1553 bus, making use of the Real- Time Input/Output Processors (RTIOP) when using the rig-based MDMLT, and via a bus box when using the laptop MDMLT. The bus box is a cost-effective alternative to PC-1553 cards for the laptop. It is noted that this system can be modified and adapted to any avionic laboratory for spacecraft computer loading, ship avionics, or aircraft avionics where multiple configurations and strong configuration management of software/firmware loads are required.
Automated Concurrent Blackboard System Generation in C++

NASA Technical Reports Server (NTRS)

Kaplan, J. A.; McManus, J. W.; Bynum, W. L.

1999-01-01

In his 1992 Ph.D. thesis, "Design and Analysis Techniques for Concurrent Blackboard Systems", John McManus defined several performance metrics for concurrent blackboard systems and developed a suite of tools for creating and analyzing such systems. These tools allow a user to analyze a concurrent blackboard system design and predict the performance of the system before any code is written. The design can be modified until simulated performance is satisfactory. Then, the code generator can be invoked to generate automatically all of the code required for the concurrent blackboard system except for the code implementing the functionality of each knowledge source. We have completed the port of the source code generator and a simulator for a concurrent blackboard system. The source code generator generates the necessary C++ source code to implement the concurrent blackboard system using Parallel Virtual Machine (PVM) running on a heterogeneous network of UNIX(trademark) workstations. The concurrent blackboard simulator uses the blackboard specification file to predict the performance of the concurrent blackboard design. The only part of the source code for the concurrent blackboard system that the user must supply is the code implementing the functionality of the knowledge sources.
National Centers for Environmental Prediction

Science.gov Websites

the number of threads used. HWRF group cannot access Zeus and Jet for real-time data transfers from nodes used.). All single jobs will be run on one rack and will not share with parallel jobs. No official change the group when using tag_rstprod (-g option). autotag_rstprod is a script that tags all files. It
Apically extruded dentin debris by reciprocating single-file and multi-file rotary system.

PubMed

De-Deus, Gustavo; Neves, Aline; Silva, Emmanuel João; Mendonça, Thais Accorsi; Lourenço, Caroline; Calixto, Camila; Lima, Edson Jorge Moreira

2015-03-01

This study aims to evaluate the apical extrusion of debris by the two reciprocating single-file systems: WaveOne and Reciproc. Conventional multi-file rotary system was used as a reference for comparison. The hypotheses tested were (i) the reciprocating single-file systems extrude more than conventional multi-file rotary system and (ii) the reciprocating single-file systems extrude similar amounts of dentin debris. After solid selection criteria, 80 mesial roots of lower molars were included in the present study. The use of four different instrumentation techniques resulted in four groups (n = 20): G1 (hand-file technique), G2 (ProTaper), G3 (WaveOne), and G4 (Reciproc). The apparatus used to evaluate the collection of apically extruded debris was typical double-chamber collector. Statistical analysis was performed for multiple comparisons. No significant difference was found in the amount of the debris extruded between the two reciprocating systems. In contrast, conventional multi-file rotary system group extruded significantly more debris than both reciprocating groups. Hand instrumentation group extruded significantly more debris than all other groups. The present results yielded favorable input for both reciprocation single-file systems, inasmuch as they showed an improved control of apically extruded debris. Apical extrusion of debris has been studied extensively because of its clinical relevance, particularly since it may cause flare-ups, originated by the introduction of bacteria, pulpal tissue, and irrigating solutions into the periapical tissues.
Sacramento-Watt Avenue transit priority and mobility enhancement demonstration project: phase III evaluation report

DOT National Transportation Integrated Search

2001-02-01

The Minnesota data system includes the following basic files: Accident data (Accident File, Vehicle File, Occupant File); Roadlog File; Reference Post File; Traffic File; Intersection File; Bridge (Structures) File; and RR Grade Crossing File. For ea...
On-Board File Management and Its Application in Flight Operations

NASA Technical Reports Server (NTRS)

Kuo, N.

1998-01-01

In this paper, the author presents the minimum functions required for an on-board file management system. We explore file manipulation processes and demonstrate how the file transfer along with the file management system will be utilized to support flight operations and data delivery.
47 CFR 1.10006 - Is electronic filing mandatory?

Code of Federal Regulations, 2014 CFR

2014-10-01

... 47 Telecommunication 1 2014-10-01 2014-10-01 false Is electronic filing mandatory? 1.10006 Section... Random Selection International Bureau Filing System § 1.10006 Is electronic filing mandatory? Electronic... International Bureau Filing System (IBFS) form is available. Applications for which an electronic form is not...
47 CFR 1.10006 - Is electronic filing mandatory?

Code of Federal Regulations, 2012 CFR

2012-10-01

... 47 Telecommunication 1 2012-10-01 2012-10-01 false Is electronic filing mandatory? 1.10006 Section... Random Selection International Bureau Filing System § 1.10006 Is electronic filing mandatory? Electronic... International Bureau Filing System (IBFS) form is available. Applications for which an electronic form is not...
47 CFR 1.10006 - Is electronic filing mandatory?

Code of Federal Regulations, 2013 CFR

2013-10-01

... 47 Telecommunication 1 2013-10-01 2013-10-01 false Is electronic filing mandatory? 1.10006 Section... Random Selection International Bureau Filing System § 1.10006 Is electronic filing mandatory? Electronic... International Bureau Filing System (IBFS) form is available. Applications for which an electronic form is not...
10 CFR 2.302 - Filing of documents.

Code of Federal Regulations, 2010 CFR

2010-01-01

... this part shall be electronically transmitted through the E-Filing system, unless the Commission or... all methods of filing have been completed. (e) For filings by electronic transmission, the filer must... digital ID certificates, the NRC permits participants in the proceeding to access the E-Filing system to...
48 CFR 1404.805 - Disposal of contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 5 2013-10-01 2013-10-01 false Disposal of contract files. 1404.805 Section 1404.805 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.805 Disposal of contract files. Disposition of files shall be...
48 CFR 1404.802 - Contract files.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 5 2011-10-01 2011-10-01 false Contract files. 1404.802 Section 1404.802 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.802 Contract files. In addition to the requirements in FAR 4.802, files shall...
48 CFR 1404.802 - Contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 5 2013-10-01 2013-10-01 false Contract files. 1404.802 Section 1404.802 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.802 Contract files. In addition to the requirements in FAR 4.802, files shall...
48 CFR 1404.802 - Contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 5 2012-10-01 2012-10-01 false Contract files. 1404.802 Section 1404.802 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.802 Contract files. In addition to the requirements in FAR 4.802, files shall...

48 CFR 1404.805 - Disposal of contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 5 2014-10-01 2014-10-01 false Disposal of contract files. 1404.805 Section 1404.805 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.805 Disposal of contract files. Disposition of files shall be...
48 CFR 1404.805 - Disposal of contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 5 2010-10-01 2010-10-01 false Disposal of contract files. 1404.805 Section 1404.805 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.805 Disposal of contract files. Disposition of files shall be...
48 CFR 1404.805 - Disposal of contract files.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 5 2011-10-01 2011-10-01 false Disposal of contract files. 1404.805 Section 1404.805 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.805 Disposal of contract files. Disposition of files shall be...
48 CFR 1404.802 - Contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 5 2014-10-01 2014-10-01 false Contract files. 1404.802 Section 1404.802 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.802 Contract files. In addition to the requirements in FAR 4.802, files shall...
48 CFR 1404.805 - Disposal of contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 5 2012-10-01 2012-10-01 false Disposal of contract files. 1404.805 Section 1404.805 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.805 Disposal of contract files. Disposition of files shall be...
48 CFR 1404.802 - Contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 5 2010-10-01 2010-10-01 false Contract files. 1404.802 Section 1404.802 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.802 Contract files. In addition to the requirements in FAR 4.802, files shall...
Units of Instruction for Vocational Office Education. Volume 1. Filing, Office Machines, and General Office Clerical Occupations. Teacher's Guide.

ERIC Educational Resources Information Center

East Texas State Univ., Commerce. Occupational Curriculum Lab.

Nineteen units on filing, office machines, and general office clerical occupations are presented in this teacher's guide. The unit topics include indexing, alphabetizing, and filing (e.g., business names); labeling and positioning file folders and guides; establishing a correspondence filing system; utilizing charge-out and follow-up file systems;…
V&V of MCNP 6.1.1 Beta Against Intermediate and High-Energy Experimental Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mashnik, Stepan G

This report presents a set of validation and verification (V&V) MCNP 6.1.1 beta results calculated in parallel, with MPI, obtained using its event generators at intermediate and high-energies compared against various experimental data. It also contains several examples of results using the models at energies below 150 MeV, down to 10 MeV, where data libraries are normally used. This report can be considered as the forth part of a set of MCNP6 Testing Primers, after its first, LA-UR-11-05129, and second, LA-UR-11-05627, and third, LA-UR-26944, publications, but is devoted to V&V with the latest, 1.1 beta version of MCNP6. The MCNP6more » test-problems discussed here are presented in the /VALIDATION_CEM/and/VALIDATION_LAQGSM/subdirectories in the MCNP6/Testing/directory. README files that contain short descriptions of every input file, the experiment, the quantity of interest that the experiment measures and its description in the MCNP6 output files, and the publication reference of that experiment are presented for every test problem. Templates for plotting the corresponding results with xmgrace as well as pdf files with figures representing the final results of our V&V efforts are presented. Several technical “bugs” in MCNP 6.1.1 beta were discovered during our current V&V of MCNP6 while running it in parallel with MPI using its event generators. These “bugs” are to be fixed in the following version of MCNP6. Our results show that MCNP 6.1.1 beta using its CEM03.03, LAQGSM03.03, Bertini, and INCL+ABLA, event generators describes, as a rule, reasonably well different intermediate- and high-energy measured data. This primer isn’t meant to be read from cover to cover. Readers may skip some sections and go directly to any test problem in which they are interested.« less
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

PubMed Central

2011-01-01

Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples. PMID:21352538
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines.

PubMed

Cieślik, Marcin; Mura, Cameron

2011-02-25

Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats). PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.
JUPITER: Joint Universal Parameter IdenTification and Evaluation of Reliability - An Application Programming Interface (API) for Model Analysis

USGS Publications Warehouse

Banta, Edward R.; Poeter, Eileen P.; Doherty, John E.; Hill, Mary C.

2006-01-01

he Joint Universal Parameter IdenTification and Evaluation of Reliability Application Programming Interface (JUPITER API) improves the computer programming resources available to those developing applications (computer programs) for model analysis.The JUPITER API consists of eleven Fortran-90 modules that provide for encapsulation of data and operations on that data. Each module contains one or more entities: data, data types, subroutines, functions, and generic interfaces. The modules do not constitute computer programs themselves; instead, they are used to construct computer programs. Such computer programs are called applications of the API. The API provides common modeling operations for use by a variety of computer applications.The models being analyzed are referred to here as process models, and may, for example, represent the physics, chemistry, and(or) biology of a field or laboratory system. Process models commonly are constructed using published models such as MODFLOW (Harbaugh et al., 2000; Harbaugh, 2005), MT3DMS (Zheng and Wang, 1996), HSPF (Bicknell et al., 1997), PRMS (Leavesley and Stannard, 1995), and many others. The process model may be accessed by a JUPITER API application as an external program, or it may be implemented as a subroutine within a JUPITER API application . In either case, execution of the model takes place in a framework designed by the application programmer. This framework can be designed to take advantage of any parallel processing capabilities possessed by the process model, as well as the parallel-processing capabilities of the JUPITER API.Model analyses for which the JUPITER API could be useful include, for example: Compare model results to observed values to determine how well the model reproduces system processes and characteristics.Use sensitivity analysis to determine the information provided by observations to parameters and predictions of interest.Determine the additional data needed to improve selected model predictions.Use calibration methods to modify parameter values and other aspects of the model.Compare predictions to regulatory limits.Quantify the uncertainty of predictions based on the results of one or many simulations using inferential or Monte Carlo methods.Determine how to manage the system to achieve stated objectives.The capabilities provided by the JUPITER API include, for example, communication with process models, parallel computations, compressed storage of matrices, and flexible input capabilities. The input capabilities use input blocks suitable for lists or arrays of data. The input blocks needed for one application can be included within one data file or distributed among many files. Data exchange between different JUPITER API applications or between applications and other programs is supported by data-exchange files.The JUPITER API has already been used to construct a number of applications. Three simple example applications are presented in this report. More complicated applications include the universal inverse code UCODE_2005 (Poeter et al., 2005), the multi-model analysis MMA (Eileen P. Poeter, Mary C. Hill, E.R. Banta, S.W. Mehl, and Steen Christensen, written commun., 2006), and a code named OPR_PPR (Matthew J. Tonkin, Claire R. Tiedeman, Mary C. Hill, and D. Matthew Ely, written communication, 2006).This report describes a set of underlying organizational concepts and complete specifics about the JUPITER API. While understanding the organizational concept presented is useful to understanding the modules, other organizational concepts can be used in applications constructed using the JUPITER API.
NASIS data base management system - IBM 360/370 OS MVT implementation. 6: NASIS message file

NASA Technical Reports Server (NTRS)

1973-01-01

The message file for the NASA Aerospace Safety Information System (NASIS) is discussed. The message file contains all the message and term explanations for the system. The data contained in the file can be broken down into three separate sections: (1) global terms, (2) local terms, and (3) system messages. The various terms are defined and their use within the system is explained.
NASIS data base management system: IBM 360 TSS implementation. Volume 6: NASIS message file

NASA Technical Reports Server (NTRS)

1973-01-01

The message file for the NASA Aerospace Safety Information System (NASIS) is discussed. The message file contains all the message and term explanations for the system. The data contained in the file can be broken down into three separate sections: (1) global terms, (2) local terms, and (3) system messages. The various terms are defined and their use within the system is explained.
PELE web server: atomistic study of biomolecular systems at your fingertips.

PubMed

Madadkar-Sobhani, Armin; Guallar, Victor

2013-07-01

PELE, Protein Energy Landscape Exploration, our novel technology based on protein structure prediction algorithms and a Monte Carlo sampling, is capable of modelling the all-atom protein-ligand dynamical interactions in an efficient and fast manner, with two orders of magnitude reduced computational cost when compared with traditional molecular dynamics techniques. PELE's heuristic approach generates trial moves based on protein and ligand perturbations followed by side chain sampling and global/local minimization. The collection of accepted steps forms a stochastic trajectory. Furthermore, several processors may be run in parallel towards a collective goal or defining several independent trajectories; the whole procedure has been parallelized using the Message Passing Interface. Here, we introduce the PELE web server, designed to make the whole process of running simulations easier and more practical by minimizing input file demand, providing user-friendly interface and producing abstract outputs (e.g. interactive graphs and tables). The web server has been implemented in C++ using Wt (http://www.webtoolkit.eu) and MySQL (http://www.mysql.com). The PELE web server, accessible at http://pele.bsc.es, is free and open to all users with no login requirement.
Post-extension shortening strains preserved in calcites of the Keweenawan rift

DOE Office of Scientific and Technical Information (OSTI.GOV)

Donnelly, K.; Craddock, J.; McGovern, M.

1993-02-01

The Keweenawan rift is part of failed triple junction system that underlies Lake Superior and the Michigan Basin. The rift experienced extensional stresses dating about 1.1 Ga, which were followed by compressional stresses from about 1,060 Ma to < 350 Ma. Associated with the rift are two thrust faults: the Douglas (dipping southeast) and the Keweenawan-Lake Owen (dipping northwest). To determine the direction of rifting, calcite twins were used to calculate strain ellipsoids (Groshong method) which are indicative of the intensity and direction of the stress applied to a rocks in a region at a given time. Rock samples whichmore » contain significant calcite within the zone of rifting were collected, slabbed, and made into thin sections. Calcite appears as amygdule, vein, and cement filings, as well as limestones. Analyses show that different calcite types show different stain orientations. Two principle directions of sub-horizontal shortening are present: one parallel to rift, and one normal to the rift, indicating that rifting motion varied out the time in which different calcite types were deposited. Shortening parallel to the rift is seen predominantly on the western margin while shortening normal to the rift is seen predominantly on the eastern margin.« less
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing.

PubMed

Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui

2017-01-08

Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4_ speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration.
A Filing System for Medical Literature

PubMed Central

Cumming, Millie

1988-01-01

The author reviews the types of systems available for personal literature files and makes specific recommendations for filing systems for family physicians. A personal filing system can be an integral part of family practice, and need not require time out of proportion to the worth of the system. Because it is a personal system, different types will suit different users; some systems, however, are more reliable than others for use in family practice. (Can Fam Physician 1988; 34:425-433.) PMID:21253062
An overview of the National Space Science data Center Standard Information Retrieval System (SIRS)

NASA Technical Reports Server (NTRS)

Shapiro, A.; Blecher, S.; Verson, E. E.; King, M. L. (Editor)

1974-01-01

A general overview is given of the National Space Science Data Center (NSSDC) Standard Information Retrieval System. A description, in general terms, the information system that contains the data files and the software system that processes and manipulates the files maintained at the Data Center. Emphasis is placed on providing users with an overview of the capabilities and uses of the NSSDC Standard Information Retrieval System (SIRS). Examples given are taken from the files at the Data Center. Detailed information about NSSDC data files is documented in a set of File Users Guides, with one user's guide prepared for each file processed by SIRS. Detailed information about SIRS is presented in the SIRS Users Guide.
29 CFR 4902.11 - Specific exemptions: Office of Inspector General Investigative File System.

Code of Federal Regulations, 2012 CFR

2012-07-01

... Investigative File System. 4902.11 Section 4902.11 Labor Regulations Relating to Labor (Continued) PENSION... General Investigative File System. (a) Criminal Law Enforcement. (1) Exemption. Under the authority... Inspector General Investigative File System—PBGC” from the provisions of 5 U.S.C. 552a (c)(3), (c)(4), (d)(1...
29 CFR 4902.11 - Specific exemptions: Office of Inspector General Investigative File System.

Code of Federal Regulations, 2010 CFR

2010-07-01

... Investigative File System. 4902.11 Section 4902.11 Labor Regulations Relating to Labor (Continued) PENSION... General Investigative File System. (a) Criminal Law Enforcement. (1) Exemption. Under the authority... Inspector General Investigative File System—PBGC” from the provisions of 5 U.S.C. 552a (c)(3), (c)(4), (d)(1...

29 CFR 4902.11 - Specific exemptions: Office of Inspector General Investigative File System.

Code of Federal Regulations, 2014 CFR

2014-07-01

... Investigative File System. 4902.11 Section 4902.11 Labor Regulations Relating to Labor (Continued) PENSION... General Investigative File System. (a) Criminal Law Enforcement. (1) Exemption. Under the authority... Inspector General Investigative File System—PBGC” from the provisions of 5 U.S.C. 552a (c)(3), (c)(4), (d)(1...
29 CFR 4902.11 - Specific exemptions: Office of Inspector General Investigative File System.

Code of Federal Regulations, 2013 CFR

2013-07-01

... Investigative File System. 4902.11 Section 4902.11 Labor Regulations Relating to Labor (Continued) PENSION... General Investigative File System. (a) Criminal Law Enforcement. (1) Exemption. Under the authority... Inspector General Investigative File System—PBGC” from the provisions of 5 U.S.C. 552a (c)(3), (c)(4), (d)(1...
29 CFR 4902.11 - Specific exemptions: Office of Inspector General Investigative File System.

Code of Federal Regulations, 2011 CFR

2011-07-01

... Investigative File System. 4902.11 Section 4902.11 Labor Regulations Relating to Labor (Continued) PENSION... General Investigative File System. (a) Criminal Law Enforcement. (1) Exemption. Under the authority... Inspector General Investigative File System—PBGC” from the provisions of 5 U.S.C. 552a (c)(3), (c)(4), (d)(1...
75 FR 61741 - Combined Notice of Filings No. 1

Federal Register 2010, 2011, 2012, 2013, 2014

2010-10-06

... Refund Report filings: Docket Numbers: RP10-1305-000. Applicants: Venice Gathering System, L.L.C. Description: Venice Gathering System, L.L.C. submits tariff filing per 154.203: NAESB 1.9 Compliance Filing to...
First Experiences with CMS Data Storage on the GEMSS System at the INFN-CNAF Tier-1

NASA Astrophysics Data System (ADS)

Andreotti, D.; Bonacorsi, D.; Cavalli, A.; Pra, S. Dal; Dell'Agnello, L.; Forti, Alberto; Grandi, C.; Gregori, D.; Gioi, L. Li; Martelli, B.; Prosperini, A.; Ricci, P. P.; Ronchieri, Elisabetta; Sapunenko, V.; Sartirana, A.; Vagnoni, V.; Zappi, Riccardo

A brand new Mass Storage System solution called "Grid-Enabled Mass Storage System" (GEMSS) -based on the Storage Resource Manager (StoRM) developed by INFN, on the General Parallel File System by IBM and on the Tivoli Storage Manager by IBM -has been tested and deployed at the INFNCNAF Tier-1 Computing Centre in Italy. After a successful stress test phase, the solution is now being used in production for the data custodiality of the CMS experiment at CNAF. All data previously recorded on the CASTOR system have been transferred to GEMSS. As final validation of the GEMSS system, some of the computing tests done in the context of the WLCG "Scale Test for the Experiment Program" (STEP'09) challenge were repeated in September-October 2009 and compared with the results previously obtained with CASTOR in June 2009. In this paper, the GEMSS system basics, the stress test activity and the deployment phase -as well as the reliability and performance of the system -are overviewed. The experiences in the use of GEMSS at CNAF in preparing for the first months of data taking of the CMS experiment at the Large Hadron Collider are also presented.
Efficacy of ProTaper universal retreatment files in removing filling materials during root canal retreatment.

PubMed

Giuliani, Valentina; Cocchetti, Roberto; Pagavino, Gabriella

2008-11-01

The aim of this study was to evaluate the efficacy of the ProTaper Universal System rotary retreatment system and of Profile 0.06 and hand instruments (K-file) in the removal of root filling materials. Forty-two extracted single-rooted anterior teeth were selected. The root canals were enlarged with nickel-titanium (NiTi) rotary files, filled with gutta-percha and sealer, and randomly divided into 3 experimental groups. The filling materials were removed with solvent in conjunction with one of the following devices and techniques: the ProTaper Universal System for retreatment, ProFile 0.06, and hand instruments (K-file). The roots were longitudinally sectioned, and the image of the root surface was photographed. The images were captured in JPEG format; the areas of the remaining filling materials and the time required for removing the gutta-percha and sealer were calculated by using the nonparametric one-way Kruskal-Wallis test and Tukey-Kramer tests, respectively. The group that showed better results for removing filling materials was the ProTaper Universal System for retreatment files, whereas the group of ProFile rotary instruments yielded better root canal cleanliness than the hand instruments, even though there was no statistically significant difference. The ProTaper Universal System for retreatment and ProFile rotary instruments worked significantly faster than the K-file. The ProTaper Universal System for retreatment files left cleaner root canal walls than the K-file hand instruments and the ProFile Rotary instruments, although none of the devices used guaranteed complete removal of the filling materials. The rotary NiTi system proved to be faster than hand instruments in removing root filling materials.
Improving File System Performance by Striping

NASA Technical Reports Server (NTRS)

Lam, Terance L.; Kutler, Paul (Technical Monitor)

1998-01-01

This document discusses the performance and advantages of striped file systems on the SGI AD workstations. Performance of several striped file system configurations are compared and guidelines for optimal striping are recommended.
Assessment of elemental composition, microstructure, and hardness of stainless steel endodontic files and reamers.

PubMed

Darabara, Myrsini; Bourithis, Lefteris; Zinelis, Spiros; Papadimitriou, George D

2004-07-01

The purpose of this study was to determine the elemental composition, microstructure, and hardness of commercially available reamers, K files, and H files. Five instruments of each type from different manufacturers (Antaeos, FKG, Maillefer, Mani, and Micromega) were embedded in epoxy resin along their longitudinal axis. After metallographic grinding and polishing, the specimens were chemically etched and their microstructure investigated under an incident light microscope. The specimens were studied under a scanning electron microscope, and their elemental compositions were determined by energy dispersive X-ray microanalysis. The same surfaces were repolished and X-ray diffraction was performed. The same specimen surface was used for the assessment of the Vickers hardness (HV200) by using a microhardness tester with a 200-g load and 20-s contact time. The hardness results were statistically analyzed with two-way ANOVA and Tukey's test (a = 0.05). All files demonstrated extensively elongated grains parallel to longitudinal file axis because of cold drawing. The elemental composition of Maillefer and Mani reamers, Antaeos K files, and Mani H files were found in the range of AISI 303 SS, whereas all the rest were determined as AISI 304 SS. Two different phases (austenite SSt and martensite SSt) were identified with X-ray diffraction for all files tested. The results of hardness classified reamers in the following decreasing order (HMV200): Micromega = 673 +/- 29, Mani = 662 +/- 24, Maillefer = 601 +/- 34, Antaeos = 586 +/- 18, FKG = 557 +/- 19, and the K files (HV200): FKG = 673 +/- 16, Mani = 647 +/- 19, Maillefer = 603 +/- 41, Antaeos = 566 +/- 21, Micromega = 555 +/- 15, and the H files (HMV200): Mani = 640 +/- 12, FKG = 583 +/- 31, Maillefer = 581 +/- 5, Antaeos = 573 +/- 3, Micromega = 546 +/- 14. Although only two stainless steel alloys were used for the production of endodontic files, the differences in hardness are independent to the alloys used, implying that other factors, such as the production method or the thermomechanical history of the alloys, play an important role on the mechanical properties of endodontic files.
Accuracy of a digital impression system based on parallel confocal laser technology for implants with consideration of operator experience and implant angulation and depth.

PubMed

Giménez, Beatriz; Özcan, Mutlu; Martínez-Rus, Francisco; Pradíes, Guillermo

2014-01-01

To evaluate the accuracy of a digital impression system based on parallel confocal red laser technology, taking into consideration clinical parameters such as operator experience and angulation and depth of implants. A maxillary master model with six implants (located bilaterally in the second molar, second premolar, and lateral incisor positions) was fitted with six polyether ether ketone scan bodies. One second premolar implant was placed with 30 degrees of mesial angulation; the opposite implant was positioned with 30 degrees of distal angulation. The lateral incisor implants were placed 2 or 4 mm subgingivally. Two experienced and two inexperienced operators performed intraoral scanning. Five different interimplant distances were then measured. The files obtained from the scans were imported with reverse-engineering software. Measurements were then made with a coordinate measurement machine, with values from the master model used as reference values. The deviations from the actual values were then calculated. The differences between experienced and inexperienced operators and the effects of different implant angulations and depths were compared statistically. Overall, operator 3 obtained significantly less accurate results. The angulated implants did not significantly influence accuracy compared to the parallel implants. Differences were found in the amount of error in the different quadrants. The second scanned quadrant had significantly worse results than the first scanned quadrant. Impressions of the implants placed at the tissue level were less accurate than implants placed 2 and 4 mm subgingivally. The operator affected the accuracy of measurements, but the performance of the operator was not necessarily dependent on experience. Angulated implants did not decrease the accuracy of the digital impression system tested. The scanned distance affected the predictability of the accuracy of the scanner, and the error increased with the increased length of the scanned section.
Reliable file sharing in distributed operating system using web RTC

NASA Astrophysics Data System (ADS)

Dukiya, Rajesh

2017-12-01

Since, the evolution of distributed operating system, distributed file system is come out to be important part in operating system. P2P is a reliable way in Distributed Operating System for file sharing. It was introduced in 1999, later it became a high research interest topic. Peer to Peer network is a type of network, where peers share network workload and other load related tasks. A P2P network can be a period of time connection, where a bunch of computers connected by a USB (Universal Serial Bus) port to transfer or enable disk sharing i.e. file sharing. Currently P2P requires special network that should be designed in P2P way. Nowadays, there is a big influence of browsers in our life. In this project we are going to study of file sharing mechanism in distributed operating system in web browsers, where we will try to find performance bottlenecks which our research will going to be an improvement in file sharing by performance and scalability in distributed file systems. Additionally, we will discuss the scope of Web Torrent file sharing and free-riding in peer to peer networks.
New algorithms for processing time-series big EEG data within mobile health monitoring systems.

PubMed

Serhani, Mohamed Adel; Menshawy, Mohamed El; Benharref, Abdelghani; Harous, Saad; Navaz, Alramzana Nujum

2017-10-01

Recent advances in miniature biomedical sensors, mobile smartphones, wireless communications, and distributed computing technologies provide promising techniques for developing mobile health systems. Such systems are capable of monitoring epileptic seizures reliably, which are classified as chronic diseases. Three challenging issues raised in this context with regard to the transformation, compression, storage, and visualization of big data, which results from a continuous recording of epileptic seizures using mobile devices. In this paper, we address the above challenges by developing three new algorithms to process and analyze big electroencephalography data in a rigorous and efficient manner. The first algorithm is responsible for transforming the standard European Data Format (EDF) into the standard JavaScript Object Notation (JSON) and compressing the transformed JSON data to decrease the size and time through the transfer process and to increase the network transfer rate. The second algorithm focuses on collecting and storing the compressed files generated by the transformation and compression algorithm. The collection process is performed with respect to the on-the-fly technique after decompressing files. The third algorithm provides relevant real-time interaction with signal data by prospective users. It particularly features the following capabilities: visualization of single or multiple signal channels on a smartphone device and query data segments. We tested and evaluated the effectiveness of our approach through a software architecture model implementing a mobile health system to monitor epileptic seizures. The experimental findings from 45 experiments are promising and efficiently satisfy the approach's objectives in a price of linearity. Moreover, the size of compressed JSON files and transfer times are reduced by 10% and 20%, respectively, while the average total time is remarkably reduced by 67% through all performed experiments. Our approach successfully develops efficient algorithms in terms of processing time, memory usage, and energy consumption while maintaining a high scalability of the proposed solution. Our approach efficiently supports data partitioning and parallelism relying on the MapReduce platform, which can help in monitoring and automatic detection of epileptic seizures. Copyright © 2017 Elsevier B.V. All rights reserved.
ms2: A molecular simulation tool for thermodynamic properties

NASA Astrophysics Data System (ADS)

Deublein, Stephan; Eckl, Bernhard; Stoll, Jürgen; Lishchuk, Sergey V.; Guevara-Carrion, Gabriela; Glass, Colin W.; Merker, Thorsten; Bernreuther, Martin; Hasse, Hans; Vrabec, Jadran

2011-11-01

This work presents the molecular simulation program ms2 that is designed for the calculation of thermodynamic properties of bulk fluids in equilibrium consisting of small electro-neutral molecules. ms2 features the two main molecular simulation techniques, molecular dynamics (MD) and Monte-Carlo. It supports the calculation of vapor-liquid equilibria of pure fluids and multi-component mixtures described by rigid molecular models on the basis of the grand equilibrium method. Furthermore, it is capable of sampling various classical ensembles and yields numerous thermodynamic properties. To evaluate the chemical potential, Widom's test molecule method and gradual insertion are implemented. Transport properties are determined by equilibrium MD simulations following the Green-Kubo formalism. ms2 is designed to meet the requirements of academia and industry, particularly achieving short response times and straightforward handling. It is written in Fortran90 and optimized for a fast execution on a broad range of computer architectures, spanning from single processor PCs over PC-clusters and vector computers to high-end parallel machines. The standard Message Passing Interface (MPI) is used for parallelization and ms2 is therefore easily portable to different computing platforms. Feature tools facilitate the interaction with the code and the interpretation of input and output files. The accuracy and reliability of ms2 has been shown for a large variety of fluids in preceding work. Program summaryProgram title:ms2 Catalogue identifier: AEJF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special Licence supplied by the authors No. of lines in distributed program, including test data, etc.: 82 794 No. of bytes in distributed program, including test data, etc.: 793 705 Distribution format: tar.gz Programming language: Fortran90 Computer: The simulation tool ms2 is usable on a wide variety of platforms, from single processor machines over PC-clusters and vector computers to vector-parallel architectures. (Tested with Fortran compilers: gfortran, Intel, PathScale, Portland Group and Sun Studio.) Operating system: Unix/Linux, Windows Has the code been vectorized or parallelized?: Yes. Message Passing Interface (MPI) protocol Scalability. Excellent scalability up to 16 processors for molecular dynamics and >512 processors for Monte-Carlo simulations. RAM:ms2 runs on single processors with 512 MB RAM. The memory demand rises with increasing number of processors used per node and increasing number of molecules. Classification: 7.7, 7.9, 12 External routines: Message Passing Interface (MPI) Nature of problem: Calculation of application oriented thermodynamic properties for rigid electro-neutral molecules: vapor-liquid equilibria, thermal and caloric data as well as transport properties of pure fluids and multi-component mixtures. Solution method: Molecular dynamics, Monte-Carlo, various classical ensembles, grand equilibrium method, Green-Kubo formalism. Restrictions: No. The system size is user-defined. Typical problems addressed by ms2 can be solved by simulating systems containing typically 2000 molecules or less. Unusual features: Feature tools are available for creating input files, analyzing simulation results and visualizing molecular trajectories. Additional comments: Sample makefiles for multiple operation platforms are provided. Documentation is provided with the installation package and is available at http://www.ms-2.de. Running time: The running time of ms2 depends on the problem set, the system size and the number of processes used in the simulation. Running four processes on a "Nehalem" processor, simulations calculating VLE data take between two and twelve hours, calculating transport properties between six and 24 hours.
An alternative to sneakernet

DOE Office of Scientific and Technical Information (OSTI.GOV)

Orrell, S.; Ralstin, S.

1992-04-01

Many computer security plans specify that only a small percentage of the data processed will be classified. Thus, the bulk of the data on secure systems must be unclassified. Secure limited access sites operating approved classified computing systems sometimes also have a system ostensibly containing only unclassified files but operating within the secure environment. That system could be networked or otherwise connected to a classified system(s) in order that both be able to use common resources for file storage or computing power. Such a system must operate under the same rules as the secure classified systems. It is in themore » nature of unclassified files that they either came from, or will eventually migrate to, a non-secure system. Today, unclassified files are exported from systems within the secure environment typically by loading transport media and carrying them to an open system. Import of unclassified files is handled similarly. This media transport process, sometimes referred to as sneaker net, often is manually logged and controlled only by administrative procedures. A comprehensive system for secure bi-directional transfer of unclassified files between secure and open environments has yet to be developed. Any such secure file transport system should be required to meet several stringent criteria. It is the purpose of this document to begin a definition of these criteria.« less
An alternative to sneakernet

DOE Office of Scientific and Technical Information (OSTI.GOV)

Orrell, S.; Ralstin, S.

1992-01-01

Many computer security plans specify that only a small percentage of the data processed will be classified. Thus, the bulk of the data on secure systems must be unclassified. Secure limited access sites operating approved classified computing systems sometimes also have a system ostensibly containing only unclassified files but operating within the secure environment. That system could be networked or otherwise connected to a classified system(s) in order that both be able to use common resources for file storage or computing power. Such a system must operate under the same rules as the secure classified systems. It is in themore » nature of unclassified files that they either came from, or will eventually migrate to, a non-secure system. Today, unclassified files are exported from systems within the secure environment typically by loading transport media and carrying them to an open system. Import of unclassified files is handled similarly. This media transport process, sometimes referred to as sneaker net, often is manually logged and controlled only by administrative procedures. A comprehensive system for secure bi-directional transfer of unclassified files between secure and open environments has yet to be developed. Any such secure file transport system should be required to meet several stringent criteria. It is the purpose of this document to begin a definition of these criteria.« less
29 CFR 1602.43 - Commission's remedy for school systems' or districts' failure to file report.

Code of Federal Regulations, 2013 CFR

2013-07-01

...' failure to file report. Any school system or district failing or refusing to file report EEO-5 when... 29 Labor 4 2013-07-01 2013-07-01 false Commission's remedy for school systems' or districts' failure to file report. 1602.43 Section 1602.43 Labor Regulations Relating to Labor (Continued) EQUAL...
29 CFR 1602.43 - Commission's remedy for school systems' or districts' failure to file report.

Code of Federal Regulations, 2011 CFR

2011-07-01

...' failure to file report. Any school system or district failing or refusing to file report EEO-5 when... 29 Labor 4 2011-07-01 2011-07-01 false Commission's remedy for school systems' or districts' failure to file report. 1602.43 Section 1602.43 Labor Regulations Relating to Labor (Continued) EQUAL...
76 FR 9780 - Notification of Deletion of System of Records; EPA Parking Control Office File (EPA-10) and EPA...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-02-22

... System of Records; EPA Parking Control Office File (EPA-10) and EPA Transit and Guaranteed Ride Home Program Files (EPA-35) AGENCY: Environmental Protection Agency (EPA). ACTION: Notice. SUMMARY: The Environmental Protection Agency (EPA) is deleting the systems of records for EPA Parking Control Office File...
29 CFR 1602.43 - Commission's remedy for school systems' or districts' failure to file report.

Code of Federal Regulations, 2012 CFR

2012-07-01

...' failure to file report. Any school system or district failing or refusing to file report EEO-5 when... 29 Labor 4 2012-07-01 2012-07-01 false Commission's remedy for school systems' or districts' failure to file report. 1602.43 Section 1602.43 Labor Regulations Relating to Labor (Continued) EQUAL...
29 CFR 1602.43 - Commission's remedy for school systems' or districts' failure to file report.

Code of Federal Regulations, 2014 CFR

2014-07-01

...' failure to file report. Any school system or district failing or refusing to file report EEO-5 when... 29 Labor 4 2014-07-01 2014-07-01 false Commission's remedy for school systems' or districts' failure to file report. 1602.43 Section 1602.43 Labor Regulations Relating to Labor (Continued) EQUAL...
29 CFR 1602.43 - Commission's remedy for school systems' or districts' failure to file report.

Code of Federal Regulations, 2010 CFR

2010-07-01

...' failure to file report. Any school system or district failing or refusing to file report EEO-5 when... 29 Labor 4 2010-07-01 2010-07-01 false Commission's remedy for school systems' or districts' failure to file report. 1602.43 Section 1602.43 Labor Regulations Relating to Labor (Continued) EQUAL...

VizieR Online Data Catalog: Dense cores in Taurus L1495 cloud (Marsh+, 2016)

NASA Astrophysics Data System (ADS)

Marsh, K. A.; Kirk, J. M.; Andre, P.; Griffin, M. J.; Konyves, V.; Palmeirim, P.; Men'shchikov, A.; Ward-Thompson, D.; Benedettini, M.; Bresnahan, D. W.; di, Francesco J.; Elia, D.; Motte, F.; Peretto, N.; Pezzuto, S.; Roy, A.; Sadavoy, S.; Schneider, N.; Spinoglio, L.; White, G. J.

2017-04-01

The observational data on which the present catalogue is based consists of a set of images of the L1495 cloud in the Taurus star-forming region, made as part of the HGBS (Andre et al. 2010). The data were taken using PACS at 70, 160, 250, 350 and 500 microns in fast-scanning (60"/s) parallel mode. (2 data files).
77 FR 14507 - Privacy Act of 1974 System of Records Notice

Federal Register 2010, 2011, 2012, 2013, 2014

2012-03-12

... Complaint and Reasonable Accommodation Files, from its inventory of record systems because the relevant... Employment Discrimination Complaint and Reasonable Accommodation Files. The Commission is retiring the system... government-wide system of records notice entitled OPM/GOV-10, Employee Medical File System Records (71 FR...
NASA Tech Briefs, June 2013

NASA Technical Reports Server (NTRS)

2013-01-01

Topics include: Cloud Absorption Radiometer Autonomous Navigation System - CANS, Software Method for Computed Tomography Cylinder Data Unwrapping, Re-slicing, and Analysis, Discrete Data Qualification System and Method Comprising Noise Series Fault Detection, Simple Laser Communications Terminal for Downlink from Earth Orbit at Rates Exceeding 10 Gb/s, Application Program Interface for the Orion Aerodynamics Database, Hyperspectral Imager-Tracker, Web Application Software for Ground Operations Planning Database (GOPDb) Management, Software Defined Radio with Parallelized Software Architecture, Compact Radar Transceiver with Included Calibration, Software Defined Radio with Parallelized Software Architecture, Phase Change Material Thermal Power Generator, The Thermal Hogan - A Means of Surviving the Lunar Night, Micromachined Active Magnetic Regenerator for Low-Temperature Magnetic Coolers, Nano-Ceramic Coated Plastics, Preparation of a Bimetal Using Mechanical Alloying for Environmental or Industrial Use, Phase Change Material for Temperature Control of Imager or Sounder on GOES Type Satellites in GEO, Dual-Compartment Inflatable Suitlock, Modular Connector Keying Concept, Genesis Ultrapure Water Megasonic Wafer Spin Cleaner, Piezoelectrically Initiated Pyrotechnic Igniter, Folding Elastic Thermal Surface - FETS, Multi-Pass Quadrupole Mass Analyzer, Lunar Sulfur Capture System, Environmental Qualification of a Single-Crystal Silicon Mirror for Spaceflight Use, Planar Superconducting Millimeter-Wave/Terahertz Channelizing Filter, Qualification of UHF Antenna for Extreme Martian Thermal Environments, Ensemble Eclipse: A Process for Prefab Development Environment for the Ensemble Project, ISS Live!, Space Operations Learning Center (SOLC) iPhone/iPad Application, Software to Compare NPP HDF5 Data Files, Planetary Data Systems (PDS) Imaging Node Atlas II, Automatic Calibration of an Airborne Imaging System to an Inertial Navigation Unit, Translating MAPGEN to ASPEN for MER, Support Routines for In Situ Image Processing, and Semi-Supervised Eigenbasis Novelty Detection.
48 CFR 4.804 - Closeout of contract files.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 1 2011-10-01 2011-10-01 false Closeout of contract files. 4.804 Section 4.804 Federal Acquisition Regulations System FEDERAL ACQUISITION REGULATION GENERAL ADMINISTRATIVE MATTERS Government Contract Files 4.804 Closeout of contract files. ...
48 CFR 1404.804 - Closeout of contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 5 2013-10-01 2013-10-01 false Closeout of contract files. 1404.804 Section 1404.804 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.804 Closeout of contract files. ...
48 CFR 904.804 - Closeout of contract files.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 5 2011-10-01 2011-10-01 false Closeout of contract files. 904.804 Section 904.804 Federal Acquisition Regulations System DEPARTMENT OF ENERGY GENERAL ADMINISTRATIVE MATTERS Government Contract Files 904.804 Closeout of contract files. ...
48 CFR 904.804 - Closeout of contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 5 2010-10-01 2010-10-01 false Closeout of contract files. 904.804 Section 904.804 Federal Acquisition Regulations System DEPARTMENT OF ENERGY GENERAL ADMINISTRATIVE MATTERS Government Contract Files 904.804 Closeout of contract files. ...
48 CFR 904.804 - Closeout of contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 5 2014-10-01 2014-10-01 false Closeout of contract files. 904.804 Section 904.804 Federal Acquisition Regulations System DEPARTMENT OF ENERGY GENERAL ADMINISTRATIVE MATTERS Government Contract Files 904.804 Closeout of contract files. ...
48 CFR 1504.804 - Closeout of contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 6 2014-10-01 2014-10-01 false Closeout of contract files. 1504.804 Section 1504.804 Federal Acquisition Regulations System ENVIRONMENTAL PROTECTION AGENCY GENERAL ADMINISTRATIVE MATTERS Contract Files 1504.804 Closeout of contract files. ...
48 CFR 1304.804 - Closeout of contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 5 2014-10-01 2014-10-01 false Closeout of contract files. 1304.804 Section 1304.804 Federal Acquisition Regulations System DEPARTMENT OF COMMERCE GENERAL ADMINISTRATIVE MATTERS Government Contract Files 1304.804 Closeout of contract files. ...
48 CFR 1504.804 - Closeout of contract files.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 6 2011-10-01 2011-10-01 false Closeout of contract files. 1504.804 Section 1504.804 Federal Acquisition Regulations System ENVIRONMENTAL PROTECTION AGENCY GENERAL ADMINISTRATIVE MATTERS Contract Files 1504.804 Closeout of contract files. ...
48 CFR 4.804 - Closeout of contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 1 2010-10-01 2010-10-01 false Closeout of contract files. 4.804 Section 4.804 Federal Acquisition Regulations System FEDERAL ACQUISITION REGULATION GENERAL ADMINISTRATIVE MATTERS Government Contract Files 4.804 Closeout of contract files. ...
48 CFR 4.804 - Closeout of contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 1 2013-10-01 2013-10-01 false Closeout of contract files. 4.804 Section 4.804 Federal Acquisition Regulations System FEDERAL ACQUISITION REGULATION GENERAL ADMINISTRATIVE MATTERS Government Contract Files 4.804 Closeout of contract files. ...
48 CFR 4.804 - Closeout of contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 1 2014-10-01 2014-10-01 false Closeout of contract files. 4.804 Section 4.804 Federal Acquisition Regulations System FEDERAL ACQUISITION REGULATION GENERAL ADMINISTRATIVE MATTERS Government Contract Files 4.804 Closeout of contract files. ...
48 CFR 1304.804 - Closeout of contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 5 2013-10-01 2013-10-01 false Closeout of contract files. 1304.804 Section 1304.804 Federal Acquisition Regulations System DEPARTMENT OF COMMERCE GENERAL ADMINISTRATIVE MATTERS Government Contract Files 1304.804 Closeout of contract files. ...
48 CFR 1404.804 - Closeout of contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 5 2010-10-01 2010-10-01 false Closeout of contract files. 1404.804 Section 1404.804 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.804 Closeout of contract files. ...
48 CFR 1304.804 - Closeout of contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 5 2010-10-01 2010-10-01 false Closeout of contract files. 1304.804 Section 1304.804 Federal Acquisition Regulations System DEPARTMENT OF COMMERCE GENERAL ADMINISTRATIVE MATTERS Government Contract Files 1304.804 Closeout of contract files. ...
48 CFR 1404.804 - Closeout of contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 5 2014-10-01 2014-10-01 false Closeout of contract files. 1404.804 Section 1404.804 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.804 Closeout of contract files. ...
48 CFR 1504.804 - Closeout of contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 6 2013-10-01 2013-10-01 false Closeout of contract files. 1504.804 Section 1504.804 Federal Acquisition Regulations System ENVIRONMENTAL PROTECTION AGENCY GENERAL ADMINISTRATIVE MATTERS Contract Files 1504.804 Closeout of contract files. ...
48 CFR 1304.804 - Closeout of contract files.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 5 2011-10-01 2011-10-01 false Closeout of contract files. 1304.804 Section 1304.804 Federal Acquisition Regulations System DEPARTMENT OF COMMERCE GENERAL ADMINISTRATIVE MATTERS Government Contract Files 1304.804 Closeout of contract files. ...

48 CFR 1504.804 - Closeout of contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 6 2010-10-01 2010-10-01 true Closeout of contract files. 1504.804 Section 1504.804 Federal Acquisition Regulations System ENVIRONMENTAL PROTECTION AGENCY GENERAL ADMINISTRATIVE MATTERS Contract Files 1504.804 Closeout of contract files. ...
48 CFR 4.804 - Closeout of contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 1 2012-10-01 2012-10-01 false Closeout of contract files. 4.804 Section 4.804 Federal Acquisition Regulations System FEDERAL ACQUISITION REGULATION GENERAL ADMINISTRATIVE MATTERS Government Contract Files 4.804 Closeout of contract files. ...
48 CFR 1504.804 - Closeout of contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 6 2012-10-01 2012-10-01 false Closeout of contract files. 1504.804 Section 1504.804 Federal Acquisition Regulations System ENVIRONMENTAL PROTECTION AGENCY GENERAL ADMINISTRATIVE MATTERS Contract Files 1504.804 Closeout of contract files. ...
48 CFR 1304.804 - Closeout of contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 5 2012-10-01 2012-10-01 false Closeout of contract files. 1304.804 Section 1304.804 Federal Acquisition Regulations System DEPARTMENT OF COMMERCE GENERAL ADMINISTRATIVE MATTERS Government Contract Files 1304.804 Closeout of contract files. ...
48 CFR 904.804 - Closeout of contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 5 2013-10-01 2013-10-01 false Closeout of contract files. 904.804 Section 904.804 Federal Acquisition Regulations System DEPARTMENT OF ENERGY GENERAL ADMINISTRATIVE MATTERS Government Contract Files 904.804 Closeout of contract files. ...
48 CFR 1404.804 - Closeout of contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 5 2012-10-01 2012-10-01 false Closeout of contract files. 1404.804 Section 1404.804 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.804 Closeout of contract files. ...
48 CFR 1404.804 - Closeout of contract files.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 5 2011-10-01 2011-10-01 false Closeout of contract files. 1404.804 Section 1404.804 Federal Acquisition Regulations System DEPARTMENT OF THE INTERIOR GENERAL ADMINISTRATIVE MATTERS Contract Files 1404.804 Closeout of contract files. ...
48 CFR 904.804 - Closeout of contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 5 2012-10-01 2012-10-01 false Closeout of contract files. 904.804 Section 904.804 Federal Acquisition Regulations System DEPARTMENT OF ENERGY GENERAL ADMINISTRATIVE MATTERS Government Contract Files 904.804 Closeout of contract files. ...
Office automation: The administrative window into the integrated DBMS

NASA Technical Reports Server (NTRS)

Brock, G. H.

1985-01-01

In parallel to the evolution of Management Information Systems from simple data files to complex data bases, the stand-alone computer systems have been migrating toward fully integrated systems serving the work force. The next major productivity gain may very well be to make these highly sophisticated working level Data Base Management Systems (DMBS) serve all levels of management with reports of varying levels of detail. Most attempts by the DBMS development organization to provide useful information to management seem to bog down in the quagmire of competing working level requirements. Most large DBMS development organizations possess three to five year backlogs. Perhaps Office Automation is the vehicle that brings to pass the Management Information System that really serves management. A good office automation system manned by a team of facilitators seeking opportunities to serve end-users could go a long way toward defining a DBMS that serves management. This paper will briefly discuss the problems of the DBMS organization, alternative approaches to solving some of the major problems, a debate about problems that may have no solution, and finally how office automation fits into the development of the Manager's Management Information System.
Construction of the radiation oncology teaching files system for charged particle radiotherapy.

PubMed

Masami, Mukai; Yutaka, Ando; Yasuo, Okuda; Naoto, Takahashi; Yoshihisa, Yoda; Hiroshi, Tsuji; Tadashi, Kamada

2013-01-01

Our hospital started the charged particle therapy since 1996. New institutions for charged particle therapy are planned in the world. Our hospital are accepting many visitors from those newly planned medical institutions and having many opportunities to provide with the training to them. Based upon our experiences, we have developed the radiation oncology teaching files system for charged particle therapy. We adopted the PowerPoint of Microsoft as a basic framework of our teaching files system. By using our export function of the viewer any physician can create teaching files easily and effectively. Now our teaching file system has 33 cases for clinical and physics contents. We expect that we can improve the safety and accuracy of charged particle therapy by using our teaching files system substantially.
A price and performance comparison of three different storage architectures for data in cloud-based systems

NASA Astrophysics Data System (ADS)

Gallagher, J. H. R.; Jelenak, A.; Potter, N.; Fulker, D. W.; Habermann, T.

2017-12-01

Providing data services based on cloud computing technology that is equivalent to those developed for traditional computing and storage systems is critical for successful migration to cloud-based architectures for data production, scientific analysis and storage. OPeNDAP Web-service capabilities (comprising the Data Access Protocol (DAP) specification plus open-source software for realizing DAP in servers and clients) are among the most widely deployed means for achieving data-as-service functionality in the Earth sciences. OPeNDAP services are especially common in traditional data center environments where servers offer access to datasets stored in (very large) file systems, and a preponderance of the source data for these services is being stored in the Hierarchical Data Format Version 5 (HDF5). Three candidate architectures for serving NASA satellite Earth Science HDF5 data via Hyrax running on Amazon Web Services (AWS) were developed and their performance examined for a set of representative use cases. The performance was based both on runtime and incurred cost. The three architectures differ in how HDF5 files are stored in the Amazon Simple Storage Service (S3) and how the Hyrax server (as an EC2 instance) retrieves their data. The results for both the serial and parallel access to HDF5 data in the S3 will be presented. While the study focused on HDF5 data, OPeNDAP and the Hyrax data server, the architectures are generic and the analysis can be extrapolated to many different data formats, web APIs, and data servers.
Information retrieval system

NASA Technical Reports Server (NTRS)

Berg, R. F.; Holcomb, J. E.; Kelroy, E. A.; Levine, D. A.; Mee, C., III

1970-01-01

Generalized information storage and retrieval system capable of generating and maintaining a file, gathering statistics, sorting output, and generating final reports for output is reviewed. File generation and file maintenance programs written for the system are general purpose routines.
Final Report for File System Support for Burst Buffers on HPC Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yu, W.; Mohror, K.

Distributed burst buffers are a promising storage architecture for handling I/O workloads for exascale computing. As they are being deployed on more supercomputers, a file system that efficiently manages these burst buffers for fast I/O operations carries great consequence. Over the past year, FSU team has undertaken several efforts to design, prototype and evaluate distributed file systems for burst buffers on HPC systems. These include MetaKV: a Key-Value Store for Metadata Management of Distributed Burst Buffers, a user-level file system with multiple backends, and a specialized file system for large datasets of deep neural networks. Our progress for these respectivemore » efforts are elaborated further in this report.« less
SU-E-T-99: Design and Development of Isocenter Parameter System for CT Simulation Laser Based On DICOM RT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Luo, G

2014-06-01

Purpose: In order to receive DICOM files from treatment planning system and generate patient isocenter positioning parameter file for CT laser system automatically, this paper presents a method for communication with treatment planning system and calculation of isocenter parameter for each radiation field. Methods: Coordinate transformation and laser positioning file formats were analyzed, isocenter parameter was calculated via data from DICOM CT Data and DICOM RTPLAN file. An in-house software-DicomGenie was developed based on the object-oriented program platform-Qt with DCMTK SDK (Germany OFFIS company DICOM SDK) . DicomGenie was tested for accuracy using Philips CT simulation plan system (Tumor LOC,more » Philips) and A2J CT positioning laser system (Thorigny Sur Marne, France). Results: DicomGenie successfully established DICOM communication between treatment planning system, DICOM files were received by DicomGenie and patient laser isocenter information was generated accurately. Patient laser parameter data files can be used for for CT laser system directly. Conclusion: In-house software DicomGenie received and extracted DICOM data, isocenter laser positioning data files were created by DicomGenie and can be use for A2J laser positioning system.« less
Comparative evaluation of effect of rotary and reciprocating single-file systems on pericervical dentin: A cone-beam computed tomography study.

PubMed

Zinge, Priyanka Ramdas; Patil, Jayaprakash

2017-01-01

The aim of this study is to evaluate and compare the effect of one shape, Neolix rotary single-file systems and WaveOne, Reciproc reciprocating single-file systems on pericervical dentin (PCD) using cone-beam computed tomography (CBCT). A total of 40 freshly extracted mandibular premolars were collected and divided into two groups, namely, Group A - Rotary: A 1 - Neolix and A 2 - OneShape and Group B - Reciprocating: B 1 - WaveOne and B 2 - Reciproc. Preoperative scans of each were taken followed by conventional access cavity preparation and working length determination with 10-k file. Instrumentation of the canal was done according to the respective file system, and postinstrumentation CBCT scans of teeth were obtained. 90 μm thick slices were obtained 4 mm apical and coronal to the cementoenamel junction. The PCD thickness was calculated as the shortest distance from the canal outline to the closest adjacent root surface, which was measured in four surfaces, i.e., facial, lingual, mesial, and distal for all the groups in the two obtained scans. There was no significant difference found between rotary single-file systems and reciprocating single-file systems in their effect on PCD, but in Group B 2 , there was most significant loss of tooth structure in the mesial, lingual, and distal surface ( P < 0.05). Reciproc single-file system removes more PCD as compared to other experimental groups, whereas Neolix single file system had the least effect on PCD.
Design and implementation of a reliable and cost-effective cloud computing infrastructure: the INFN Napoli experience

NASA Astrophysics Data System (ADS)

Capone, V.; Esposito, R.; Pardi, S.; Taurino, F.; Tortone, G.

2012-12-01

Over the last few years we have seen an increasing number of services and applications needed to manage and maintain cloud computing facilities. This is particularly true for computing in high energy physics, which often requires complex configurations and distributed infrastructures. In this scenario a cost effective rationalization and consolidation strategy is the key to success in terms of scalability and reliability. In this work we describe an IaaS (Infrastructure as a Service) cloud computing system, with high availability and redundancy features, which is currently in production at INFN-Naples and ATLAS Tier-2 data centre. The main goal we intended to achieve was a simplified method to manage our computing resources and deliver reliable user services, reusing existing hardware without incurring heavy costs. A combined usage of virtualization and clustering technologies allowed us to consolidate our services on a small number of physical machines, reducing electric power costs. As a result of our efforts we developed a complete solution for data and computing centres that can be easily replicated using commodity hardware. Our architecture consists of 2 main subsystems: a clustered storage solution, built on top of disk servers running GlusterFS file system, and a virtual machines execution environment. GlusterFS is a network file system able to perform parallel writes on multiple disk servers, providing this way live replication of data. High availability is also achieved via a network configuration using redundant switches and multiple paths between hypervisor hosts and disk servers. We also developed a set of management scripts to easily perform basic system administration tasks such as automatic deployment of new virtual machines, adaptive scheduling of virtual machines on hypervisor hosts, live migration and automated restart in case of hypervisor failures.
75 FR 3493 - Notice of Acceptance for Docketing of the Application, Notice of Opportunity for Hearing for...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-01-21

...), Rockville, Maryland 20852 and is accessible from the NRC's Agencywide Documents Access and Management System... receipt of the document. The E-Filing system also distributes an e-mail notice that provides access to the... intervene is filed so that they can obtain access to the document via the E-Filing system. A person filing...
75 FR 42462 - Notice of Acceptance for Docketing of the Application and Notice of Opportunity for Hearing...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-07-21

... NRC's Agencywide Documents Access and Management System (ADAMS) Public Electronic Reading Room on the... receipt of the document. The E-Filing system also distributes an e-mail notice that provides access to the... intervene is filed so that they can obtain access to the document via the E-Filing system. A person filing...
Air Traffic Complexity Measurement Environment (ACME): Software User's Guide

NASA Technical Reports Server (NTRS)

1996-01-01

A user's guide for the Air Traffic Complexity Measurement Environment (ACME) software is presented. The ACME consists of two major components, a complexity analysis tool and user interface. The Complexity Analysis Tool (CAT) analyzes complexity off-line, producing data files which may be examined interactively via the Complexity Data Analysis Tool (CDAT). The Complexity Analysis Tool is composed of three independently executing processes that communicate via PVM (Parallel Virtual Machine) and Unix sockets. The Runtime Data Management and Control process (RUNDMC) extracts flight plan and track information from a SAR input file, and sends the information to GARP (Generate Aircraft Routes Process) and CAT (Complexity Analysis Task). GARP in turn generates aircraft trajectories, which are utilized by CAT to calculate sector complexity. CAT writes flight plan, track and complexity data to an output file, which can be examined interactively. The Complexity Data Analysis Tool (CDAT) provides an interactive graphic environment for examining the complexity data produced by the Complexity Analysis Tool (CAT). CDAT can also play back track data extracted from System Analysis Recording (SAR) tapes. The CDAT user interface consists of a primary window, a controls window, and miscellaneous pop-ups. Aircraft track and position data is displayed in the main viewing area of the primary window. The controls window contains miscellaneous control and display items. Complexity data is displayed in pop-up windows. CDAT plays back sector complexity and aircraft track and position data as a function of time. Controls are provided to start and stop playback, adjust the playback rate, and reposition the display to a specified time.
Central Satellite Data Repository Supporting Research and Development

NASA Astrophysics Data System (ADS)

Han, W.; Brust, J.

2015-12-01

Near real-time satellite data is critical to many research and development activities of atmosphere, land, and ocean processes. Acquiring and managing huge volumes of satellite data without (or with less) latency in an organization is always a challenge in the big data age. An organization level data repository is a practical solution to meeting this challenge. The STAR (Center for Satellite Applications and Research of NOAA) Central Data Repository (SCDR) is a scalable, stable, and reliable repository to acquire, manipulate, and disseminate various types of satellite data in an effective and efficient manner. SCDR collects more than 200 data products, which are commonly used by multiple groups in STAR, from NOAA, GOES, Metop, Suomi NPP, Sentinel, Himawari, and other satellites. The processes of acquisition, recording, retrieval, organization, and dissemination are performed in parallel. Multiple data access interfaces, like FTP, FTPS, HTTP, HTTPS, and RESTful, are supported in the SCDR to obtain satellite data from their providers through high speed internet. The original satellite data in various raster formats can be parsed in the respective adapter to retrieve data information. The data information is ingested to the corresponding partitioned tables in the central database. All files are distributed equally on the Network File System (NFS) disks to balance the disk load. SCDR provides consistent interfaces (including Perl utility, portal, and RESTful Web service) to locate files of interest easily and quickly and access them directly by over 200 compute servers via NFS. SCDR greatly improves collection and integration of near real-time satellite data, addresses satellite data requirements of scientists and researchers, and facilitates their primary research and development activities.

Introduction of a Data System at the Universite Paul Sabatier, Toulouse (France). Programme on Institutional Management in Higher Education.

ERIC Educational Resources Information Center

Prineau, J. P.

The data system and its branches, computerized in 1970, provide information from the following: student records file, accountancy file, an experimental-stage personnel file, and a planning-stage facilities file. The files not only cope with the university's daily management duties but also supply the French Ministry with statistics. Two types of…
76 FR 11465 - Privacy Act of 1974; System of Records

Federal Register 2010, 2011, 2012, 2013, 2014

2011-03-02

... separate systems of records: ``FHFA-OIG Audit Files Database,'' ``FHFA-OIG Investigative & Evaluative Files Database,'' ``FHFA-OIG Investigative & Evaluative MIS Database,'' and ``FHFA-OIG Hotline Database.'' These... Audit Files Database. FHFA-OIG-2: FHFA-OIG Investigative & Evaluative Files Database. FHFA-OIG-3: FHFA...
Generation and use of the Goddard trajectory determination system SLP ephemeris files

NASA Technical Reports Server (NTRS)

Armstrong, M. G.; Tomaszewski, I. B.

1973-01-01

Information is presented to acquaint users of the Goddard Trajectory Determination System Solar/Lunar/Planetary ephemeris files with the details connected with the generation and use of these files. In particular, certain sections constitute a user's manual for the ephemeris files.
National Automotive Sampling System (NASS) Crashworthiness Data System : analytical user's manual 2007 file

DOT National Transportation Integrated Search

2008-01-01

Comparing the 1988-2007 files with files from years prior to 1988 is not recommended. The : principal attributes of the NASS CDS 1988-2007 files include: focusing on crashes involving : automobiles and automobile derivatives, light trucks and vans wi...
Digital Libraries: The Next Generation in File System Technology.

ERIC Educational Resources Information Center

Bowman, Mic; Camargo, Bill

1998-01-01

Examines file sharing within corporations that use wide-area, distributed file systems. Applications and user interactions strongly suggest that the addition of services typically associated with digital libraries (content-based file location, strongly typed objects, representation of complex relationships between documents, and extrinsic…
Summary of Documentation for DYNA3D-ParaDyn's Software Quality Assurance Regression Test Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zywicz, Edward

The Software Quality Assurance (SQA) regression test suite for DYNA3D (Zywicz and Lin, 2015) and ParaDyn (DeGroot, et al., 2015) currently contains approximately 600 problems divided into 21 suites, and is a required component of ParaDyn’s SQA plan (Ferencz and Oliver, 2013). The regression suite allows developers to ensure that software modifications do not unintentionally alter the code response. The entire regression suite is run prior to permanently incorporating any software modification or addition. When code modifications alter test problem results, the specific cause must be determined and fully understood before the software changes and revised test answers can bemore » incorporated. The regression suite is executed on LLNL platforms using a Python script and an associated data file. The user specifies the DYNA3D or ParaDyn executable, number of processors to use, test problems to run, and other options to the script. The data file details how each problem and its answer extraction scripts are executed. For each problem in the regression suite there exists an input deck, an eight-processor partition file, an answer file, and various extraction scripts. These scripts assemble a temporary answer file in a specific format from the simulation results. The temporary and stored answer files are compared to a specific level of numerical precision, and when differences are detected the test problem is flagged as failed. Presently, numerical results are stored and compared to 16 digits. At this accuracy level different processor types, compilers, number of partitions, etc. impact the results to various degrees. Thus, for consistency purposes the regression suite is run with ParaDyn using 8 processors on machines with a specific processor type (currently the Intel Xeon E5530 processor). For non-parallel regression problems, i.e., the two XFEM problems, DYNA3D is used instead. When environments or platforms change, executables using the current source code and the new resource are created and the regression suite is run. If differences in answers arise, the new answers are retained provided that the differences are inconsequential. This bootstrap approach allows the test suite answers to evolve in a controlled manner with a high level of confidence. Developers also run the entire regression suite with (serial) DYNA3D. While these results normally differ from the stored (parallel) answers, abnormal termination or wildly different values are strong indicators of potential issues.« less
A new approach to the film library: time-unit filing.

PubMed

Palmucci, J A

2000-01-01

The installation of a new radiology information system (RIS) at Children's Hospital Medical Center of Akron in Akron, Ohio, took the radiology department into a new world of technology, but raised issues we never anticipated. The major problem the new RIS forced the department to overcome was how to eliminate the film file's reliance on a proprietary radiology numbering system. Previously, the department had used its own numbering system--a proprietary x-ray number--to file film jackets and had used the hospital-issued medical record number to access patient and payer information from the hospital information system. It became clear that we should use a single number--the medical record number--to access all data, but we wondered how that would affect our film file room. An RIS consultant suggested that we consider filing films by last date of service, a system called "time-unit filing." Time-unit filing means keeping the most recent two-weeks worth of films in the main file room. They are organized by gender in blue or pink jackets and marked alphabetically by the patient's last name in a way that makes mis-files easy to see. If a patient's film jacket is activated again, it is refiled in the current two-week time unit. Inactive jackets remain in their two-week time unit indefinitely. Time-unit filing has had many benefits for the radiology department at Children's Hospital Medical Center of Akron: fewer mis-files, less time needed for filing and searching, and successful implementation of the new RIS.
An operating system for future aerospace vehicle computer systems

NASA Technical Reports Server (NTRS)

Foudriat, E. C.; Berman, W. J.; Will, R. W.; Bynum, W. L.

1984-01-01

The requirements for future aerospace vehicle computer operating systems are examined in this paper. The computer architecture is assumed to be distributed with a local area network connecting the nodes. Each node is assumed to provide a specific functionality. The network provides for communication so that the overall tasks of the vehicle are accomplished. The O/S structure is based upon the concept of objects. The mechanisms for integrating node unique objects with node common objects in order to implement both the autonomy and the cooperation between nodes is developed. The requirements for time critical performance and reliability and recovery are discussed. Time critical performance impacts all parts of the distributed operating system; e.g., its structure, the functional design of its objects, the language structure, etc. Throughout the paper the tradeoffs - concurrency, language structure, object recovery, binding, file structure, communication protocol, programmer freedom, etc. - are considered to arrive at a feasible, maximum performance design. Reliability of the network system is considered. A parallel multipath bus structure is proposed for the control of delivery time for time critical messages. The architecture also supports immediate recovery for the time critical message system after a communication failure.
Evaluation of Apache Hadoop for parallel data analysis with ROOT

NASA Astrophysics Data System (ADS)

Lehrack, S.; Duckeck, G.; Ebke, J.

2014-06-01

The Apache Hadoop software is a Java based framework for distributed processing of large data sets across clusters of computers, using the Hadoop file system (HDFS) for data storage and backup and MapReduce as a processing platform. Hadoop is primarily designed for processing large textual data sets which can be processed in arbitrary chunks, and must be adapted to the use case of processing binary data files which cannot be split automatically. However, Hadoop offers attractive features in terms of fault tolerance, task supervision and control, multi-user functionality and job management. For this reason, we evaluated Apache Hadoop as an alternative approach to PROOF for ROOT data analysis. Two alternatives in distributing analysis data were discussed: either the data was stored in HDFS and processed with MapReduce, or the data was accessed via a standard Grid storage system (dCache Tier-2) and MapReduce was used only as execution back-end. The focus in the measurements were on the one hand to safely store analysis data on HDFS with reasonable data rates and on the other hand to process data fast and reliably with MapReduce. In the evaluation of the HDFS, read/write data rates from local Hadoop cluster have been measured and compared to standard data rates from the local NFS installation. In the evaluation of MapReduce, realistic ROOT analyses have been used and event rates have been compared to PROOF.
Final Report: CNC Micromachines LDRD No.10793

DOE Office of Scientific and Technical Information (OSTI.GOV)

JOKIEL JR., BERNHARD; BENAVIDES, GILBERT L.; BIEG, LOTHAR F.

2003-04-01

The three-year LDRD ''CNC Micromachines'' was successfully completed at the end of FY02. The project had four major breakthroughs in spatial motion control in MEMS: (1) A unified method for designing scalable planar and spatial on-chip motion control systems was developed. The method relies on the use of parallel kinematic mechanisms (PKMs) that when properly designed provide different types of motion on-chip without the need for post-fabrication assembly, (2) A new type of actuator was developed--the linear stepping track drive (LSTD) that provides open loop linear position control that is scalable in displacement, output force and step size. Several versionsmore » of this actuator were designed, fabricated and successfully tested. (3) Different versions of XYZ translation only and PTT motion stages were designed, successfully fabricated and successfully tested demonstrating absolutely that on-chip spatial motion control systems are not only possible, but are a reality. (4) Control algorithms, software and infrastructure based on MATLAB were created and successfully implemented to drive the XYZ and PTT motion platforms in a controlled manner. The control software is capable of reading an M/G code machine tool language file, decode the instructions and correctly calculate and apply position and velocity trajectories to the motion devices linear drive inputs to position the device platform along the trajectory as specified by the input file. A full and detailed account of design methodology, theory and experimental results (failures and successes) is provided.« less
78 FR 64488 - Combined Notice of Filings #2

Federal Register 2010, 2011, 2012, 2013, 2014

2013-10-29

... DEPARTMENT OF ENERGY Federal Energy Regulatory Commission Combined Notice of Filings 2 Take notice that the Commission received the following electric rate filings: Docket Numbers: ER13-1999-001. Applicants: Midcontinent Independent System Operator, Inc. Description: Midcontinent Independent System Operator, Inc. submits tariff filing per 35: 10...
48 CFR 3004.804-5 - Procedures for closing out contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 7 2012-10-01 2012-10-01 false Procedures for closing out contract files. 3004.804-5 Section 3004.804-5 Federal Acquisition Regulations System DEPARTMENT OF HOMELAND... Contract Files 3004.804-5 Procedures for closing out contract files. ...
48 CFR 904.803 - Contents of contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 5 2014-10-01 2014-10-01 false Contents of contract files. 904.803 Section 904.803 Federal Acquisition Regulations System DEPARTMENT OF ENERGY GENERAL ADMINISTRATIVE MATTERS Government Contract Files 904.803 Contents of contract files. (a) (29) The record copy of...
48 CFR 3004.804-5 - Procedures for closing out contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 7 2014-10-01 2014-10-01 false Procedures for closing out contract files. 3004.804-5 Section 3004.804-5 Federal Acquisition Regulations System DEPARTMENT OF HOMELAND... Contract Files 3004.804-5 Procedures for closing out contract files. ...
48 CFR 4.803 - Contents of contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 1 2014-10-01 2014-10-01 false Contents of contract files. 4.803 Section 4.803 Federal Acquisition Regulations System FEDERAL ACQUISITION REGULATION GENERAL ADMINISTRATIVE MATTERS Government Contract Files 4.803 Contents of contract files. The following are examples of...
48 CFR 3004.804-5 - Procedures for closing out contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 7 2013-10-01 2012-10-01 true Procedures for closing out contract files. 3004.804-5 Section 3004.804-5 Federal Acquisition Regulations System DEPARTMENT OF HOMELAND... Contract Files 3004.804-5 Procedures for closing out contract files. ...
48 CFR 4.803 - Contents of contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 1 2013-10-01 2013-10-01 false Contents of contract files. 4.803 Section 4.803 Federal Acquisition Regulations System FEDERAL ACQUISITION REGULATION GENERAL ADMINISTRATIVE MATTERS Government Contract Files 4.803 Contents of contract files. The following are examples of...
48 CFR 904.803 - Contents of contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 5 2010-10-01 2010-10-01 false Contents of contract files. 904.803 Section 904.803 Federal Acquisition Regulations System DEPARTMENT OF ENERGY GENERAL ADMINISTRATIVE MATTERS Government Contract Files 904.803 Contents of contract files. (a) (29) The record copy of...
48 CFR 3004.804-5 - Procedures for closing out contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 7 2010-10-01 2010-10-01 false Procedures for closing out contract files. 3004.804-5 Section 3004.804-5 Federal Acquisition Regulations System DEPARTMENT OF HOMELAND... Contract Files 3004.804-5 Procedures for closing out contract files. ...
48 CFR 3004.804-5 - Procedures for closing out contract files.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 7 2011-10-01 2011-10-01 false Procedures for closing out contract files. 3004.804-5 Section 3004.804-5 Federal Acquisition Regulations System DEPARTMENT OF HOMELAND... Contract Files 3004.804-5 Procedures for closing out contract files. ...

48 CFR 904.803 - Contents of contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 5 2012-10-01 2012-10-01 false Contents of contract files. 904.803 Section 904.803 Federal Acquisition Regulations System DEPARTMENT OF ENERGY GENERAL ADMINISTRATIVE MATTERS Government Contract Files 904.803 Contents of contract files. (a) (29) The record copy of...
A Parallel Pipelined Renderer for the Time-Varying Volume Data

NASA Technical Reports Server (NTRS)

Chiueh, Tzi-Cker; Ma, Kwan-Liu

1997-01-01

This paper presents a strategy for efficiently rendering time-varying volume data sets on a distributed-memory parallel computer. Time-varying volume data take large storage space and visualizing them requires reading large files continuously or periodically throughout the course of the visualization process. Instead of using all the processors to collectively render one volume at a time, a pipelined rendering process is formed by partitioning processors into groups to render multiple volumes concurrently. In this way, the overall rendering time may be greatly reduced because the pipelined rendering tasks are overlapped with the I/O required to load each volume into a group of processors; moreover, parallelization overhead may be reduced as a result of partitioning the processors. We modify an existing parallel volume renderer to exploit various levels of rendering parallelism and to study how the partitioning of processors may lead to optimal rendering performance. Two factors which are important to the overall execution time are re-source utilization efficiency and pipeline startup latency. The optimal partitioning configuration is the one that balances these two factors. Tests on Intel Paragon computers show that in general optimal partitionings do exist for a given rendering task and result in 40-50% saving in overall rendering time.
TIMEDELN: A programme for the detection and parametrization of overlapping resonances using the time-delay method

NASA Astrophysics Data System (ADS)

Little, Duncan A.; Tennyson, Jonathan; Plummer, Martin; Noble, Clifford J.; Sunderland, Andrew G.

2017-06-01

TIMEDELN implements the time-delay method of determining resonance parameters from the characteristic Lorentzian form displayed by the largest eigenvalues of the time-delay matrix. TIMEDELN constructs the time-delay matrix from input K-matrices and analyses its eigenvalues. This new version implements multi-resonance fitting and may be run serially or as a high performance parallel code with three levels of parallelism. TIMEDELN takes K-matrices from a scattering calculation, either read from a file or calculated on a dynamically adjusted grid, and calculates the time-delay matrix. This is then diagonalized, with the largest eigenvalue representing the longest time-delay experienced by the scattering particle. A resonance shows up as a characteristic Lorentzian form in the time-delay: the programme searches the time-delay eigenvalues for maxima and traces resonances when they pass through different eigenvalues, separating overlapping resonances. It also performs the fitting of the calculated data to the Lorentzian form and outputs resonance positions and widths. Any remaining overlapping resonances can be fitted jointly. The branching ratios of decay into the open channels can also be found. The programme may be run serially or in parallel with three levels of parallelism. The parallel code modules are abstracted from the main physics code and can be used independently.
Solving data-at-rest for the storage and retrieval of files in ad hoc networks

NASA Astrophysics Data System (ADS)

Knobler, Ron; Scheffel, Peter; Williams, Jonathan; Gaj, Kris; Kaps, Jens-Peter

2013-05-01

Based on current trends for both military and commercial applications, the use of mobile devices (e.g. smartphones and tablets) is greatly increasing. Several military applications consist of secure peer to peer file sharing without a centralized authority. For these military applications, if one or more of these mobile devices are lost or compromised, sensitive files can be compromised by adversaries, since COTS devices and operating systems are used. Complete system files cannot be stored on a device, since after compromising a device, an adversary can attack the data at rest, and eventually obtain the original file. Also after a device is compromised, the existing peer to peer system devices must still be able to access all system files. McQ has teamed with the Cryptographic Engineering Research Group at George Mason University to develop a custom distributed file sharing system to provide a complete solution to the data at rest problem for resource constrained embedded systems and mobile devices. This innovative approach scales very well to a large number of network devices, without a single point of failure. We have implemented the approach on representative mobile devices as well as developed an extensive system simulator to benchmark expected system performance based on detailed modeling of the network/radio characteristics, CONOPS, and secure distributed file system functionality. The simulator is highly customizable for the purpose of determining expected system performance for other network topologies and CONOPS.
Design of a steganographic virtual operating system

NASA Astrophysics Data System (ADS)

Ashendorf, Elan; Craver, Scott

2015-03-01

A steganographic file system is a secure file system whose very existence on a disk is concealed. Customarily, these systems hide an encrypted volume within unused disk blocks, slack space, or atop conventional encrypted volumes. These file systems are far from undetectable, however: aside from their ciphertext footprint, they require a software or driver installation whose presence can attract attention and then targeted surveillance. We describe a new steganographic operating environment that requires no visible software installation, launching instead from a concealed bootstrap program that can be extracted and invoked with a chain of common Unix commands. Our system conceals its payload within innocuous files that typically contain high-entropy data, producing a footprint that is far less conspicuous than existing methods. The system uses a local web server to provide a file system, user interface and applications through a web architecture.
77 FR 43592 - System Energy Resources, Inc.; Notice of Filing

Federal Register 2010, 2011, 2012, 2013, 2014

2012-07-25

... DEPARTMENT OF ENERGY Federal Energy Regulatory Commission [Docket No. EL12-52-001] System Energy Resources, Inc.; Notice of Filing Take notice that on July 18, 2012, System Energy Resources, Inc. (System Energy Resources), submitted a supplement to its petition filed on March 28, 2012 (March 28 petition...
14 CFR 1212.501 - Record systems determined to be exempt.

Code of Federal Regulations, 2013 CFR

2013-01-01

... Act from which exempted. (i) The Inspector General Investigations Case Files system of records is... criminal laws. (ii) To the extent that noncriminal investigative files may exist within this system of records, the Inspector General Investigations Case Files system of records is exempt from the following...
14 CFR 1212.501 - Record systems determined to be exempt.

Code of Federal Regulations, 2012 CFR

2012-01-01

... Act from which exempted. (i) The Inspector General Investigations Case Files system of records is... extent that there may exist noncriminal investigative files within this system of records, the Inspector General Investigations Case Files system of records is exempt from the following sections of the Privacy...
14 CFR 1212.501 - Record systems determined to be exempt.

Code of Federal Regulations, 2010 CFR

2010-01-01

... Act from which exempted. (i) The Inspector General Investigations Case Files system of records is... extent that there may exist noncriminal investigative files within this system of records, the Inspector General Investigations Case Files system of records is exempt from the following sections of the Privacy...
14 CFR § 1212.501 - Record systems determined to be exempt.

Code of Federal Regulations, 2014 CFR

2014-01-01

... Act from which exempted. (i) The Inspector General Investigations Case Files system of records is... criminal laws. (ii) To the extent that noncriminal investigative files may exist within this system of records, the Inspector General Investigations Case Files system of records is exempt from the following...
P2P watch: personal health information detection in peer-to-peer file-sharing networks.

PubMed

Sokolova, Marina; El Emam, Khaled; Arbuckle, Luk; Neri, Emilio; Rose, Sean; Jonker, Elizabeth

2012-07-09

Users of peer-to-peer (P2P) file-sharing networks risk the inadvertent disclosure of personal health information (PHI). In addition to potentially causing harm to the affected individuals, this can heighten the risk of data breaches for health information custodians. Automated PHI detection tools that crawl the P2P networks can identify PHI and alert custodians. While there has been previous work on the detection of personal information in electronic health records, there has been a dearth of research on the automated detection of PHI in heterogeneous user files. To build a system that accurately detects PHI in files sent through P2P file-sharing networks. The system, which we call P2P Watch, uses a pipeline of text processing techniques to automatically detect PHI in files exchanged through P2P networks. P2P Watch processes unstructured texts regardless of the file format, document type, and content. We developed P2P Watch to extract and analyze PHI in text files exchanged on P2P networks. We labeled texts as PHI if they contained identifiable information about a person (eg, name and date of birth) and specifics of the person's health (eg, diagnosis, prescriptions, and medical procedures). We evaluated the system's performance through its efficiency and effectiveness on 3924 files gathered from three P2P networks. P2P Watch successfully processed 3924 P2P files of unknown content. A manual examination of 1578 randomly selected files marked by the system as non-PHI confirmed that these files indeed did not contain PHI, making the false-negative detection rate equal to zero. Of 57 files marked by the system as PHI, all contained both personally identifiable information and health information: 11 files were PHI disclosures, and 46 files contained organizational materials such as unfilled insurance forms, job applications by medical professionals, and essays. PHI can be successfully detected in free-form textual files exchanged through P2P networks. Once the files with PHI are detected, affected individuals or data custodians can be alerted to take remedial action.
78 FR 38710 - Combined Notice of Filings #2

Federal Register 2010, 2011, 2012, 2013, 2014

2013-06-27

... DEPARTMENT OF ENERGY Federal Energy Regulatory Commission Combined Notice of Filings 2 Take notice that the Commission received the following electric rate filings: Docket Numbers: ER12-360-003. Applicants: New York Independent System Operator, Inc. Description: New York Independent System Operator, Inc. submits NYISO compliance filing in...
78 FR 67137 - Combined Notice of Filings #2

Federal Register 2010, 2011, 2012, 2013, 2014

2013-11-08

... DEPARTMENT OF ENERGY Federal Energy Regulatory Commission Combined Notice of Filings 2 Take notice that the Commission received the following electric rate filings: Docket Numbers: ER14-263-000. Applicants: Midcontinent Independent System Operator, Inc. Description: Midcontinent Independent System Operator, Inc. submits tariff filing per 35.13(a...
48 CFR 2904.800-70 - Contents of contract files.

Code of Federal Regulations, 2013 CFR

2013-10-01

... 48 Federal Acquisition Regulations System 7 2013-10-01 2012-10-01 true Contents of contract files. 2904.800-70 Section 2904.800-70 Federal Acquisition Regulations System DEPARTMENT OF LABOR GENERAL ADMINISTRATIVE MATTERS Government Contract Files 2904.800-70 Contents of contract files. (a) The reports listed...
48 CFR 2904.800-70 - Contents of contract files.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 48 Federal Acquisition Regulations System 7 2011-10-01 2011-10-01 false Contents of contract files. 2904.800-70 Section 2904.800-70 Federal Acquisition Regulations System DEPARTMENT OF LABOR GENERAL ADMINISTRATIVE MATTERS Government Contract Files 2904.800-70 Contents of contract files. (a) The reports listed...
48 CFR 2904.800-70 - Contents of contract files.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 48 Federal Acquisition Regulations System 7 2010-10-01 2010-10-01 false Contents of contract files. 2904.800-70 Section 2904.800-70 Federal Acquisition Regulations System DEPARTMENT OF LABOR GENERAL ADMINISTRATIVE MATTERS Government Contract Files 2904.800-70 Contents of contract files. (a) The reports listed...
47 CFR 76.1716 - Subscriber records and public inspection file.

Code of Federal Regulations, 2014 CFR

2014-10-01

... § 76.1716 Subscriber records and public inspection file. The operator of a cable television system shall make the system, its public inspection file, and its records of subscribers available for... 47 Telecommunication 4 2014-10-01 2014-10-01 false Subscriber records and public inspection file...
47 CFR 76.1716 - Subscriber records and public inspection file.

Code of Federal Regulations, 2013 CFR

2013-10-01

... § 76.1716 Subscriber records and public inspection file. The operator of a cable television system shall make the system, its public inspection file, and its records of subscribers available for... 47 Telecommunication 4 2013-10-01 2013-10-01 false Subscriber records and public inspection file...
48 CFR 2904.800-70 - Contents of contract files.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 48 Federal Acquisition Regulations System 7 2014-10-01 2014-10-01 false Contents of contract files. 2904.800-70 Section 2904.800-70 Federal Acquisition Regulations System DEPARTMENT OF LABOR GENERAL ADMINISTRATIVE MATTERS Government Contract Files 2904.800-70 Contents of contract files. (a) The reports listed...
48 CFR 2904.800-70 - Contents of contract files.

Code of Federal Regulations, 2012 CFR

2012-10-01

... 48 Federal Acquisition Regulations System 7 2012-10-01 2012-10-01 false Contents of contract files. 2904.800-70 Section 2904.800-70 Federal Acquisition Regulations System DEPARTMENT OF LABOR GENERAL ADMINISTRATIVE MATTERS Government Contract Files 2904.800-70 Contents of contract files. (a) The reports listed...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.