A survey of compiler optimization techniques
NASA Technical Reports Server (NTRS)
Schneck, P. B.
1972-01-01
Major optimization techniques of compilers are described and grouped into three categories: machine dependent, architecture dependent, and architecture independent. Machine-dependent optimizations tend to be local and are performed upon short spans of generated code by using particular properties of an instruction set to reduce the time or space required by a program. Architecture-dependent optimizations are global and are performed while generating code. These optimizations consider the structure of a computer, but not its detailed instruction set. Architecture independent optimizations are also global but are based on analysis of the program flow graph and the dependencies among statements of source program. A conceptual review of a universal optimizer that performs architecture-independent optimizations at source-code level is also presented.
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Calore, Enrico; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core Graphics Processor Units (GPUs), exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work, we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenAcc, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.
Connecting Architecture and Implementation
NASA Astrophysics Data System (ADS)
Buchgeher, Georg; Weinreich, Rainer
Software architectures are still typically defined and described independently from implementation. To avoid architectural erosion and drift, architectural representation needs to be continuously updated and synchronized with system implementation. Existing approaches for architecture representation like informal architecture documentation, UML diagrams, and Architecture Description Languages (ADLs) provide only limited support for connecting architecture descriptions and implementations. Architecture management tools like Lattix, SonarJ, and Sotoarc and UML-tools tackle this problem by extracting architecture information directly from code. This approach works for low-level architectural abstractions like classes and interfaces in object-oriented systems but fails to support architectural abstractions not found in programming languages. In this paper we present an approach for linking and continuously synchronizing a formalized architecture representation to an implementation. The approach is a synthesis of functionality provided by code-centric architecture management and UML tools and higher-level architecture analysis approaches like ADLs.
Object-oriented approach for gas turbine engine simulation
NASA Technical Reports Server (NTRS)
Curlett, Brian P.; Felder, James L.
1995-01-01
An object-oriented gas turbine engine simulation program was developed. This program is a prototype for a more complete, commercial grade engine performance program now being proposed as part of the Numerical Propulsion System Simulator (NPSS). This report discusses architectural issues of this complex software system and the lessons learned from developing the prototype code. The prototype code is a fully functional, general purpose engine simulation program, however, only the component models necessary to model a transient compressor test rig have been written. The production system will be capable of steady state and transient modeling of almost any turbine engine configuration. Chief among the architectural considerations for this code was the framework in which the various software modules will interact. These modules include the equation solver, simulation code, data model, event handler, and user interface. Also documented in this report is the component based design of the simulation module and the inter-component communication paradigm. Object class hierarchies for some of the code modules are given.
Cave, John W; Xia, Li; Caudy, Michael
2011-01-01
In Drosophila melanogaster, achaete (ac) and m8 are model basic helix-loop-helix activator (bHLH A) and repressor genes, respectively, that have the opposite cell expression pattern in proneural clusters during Notch signaling. Previous studies have shown that activation of m8 transcription in specific cells within proneural clusters by Notch signaling is programmed by a "combinatorial" and "architectural" DNA transcription code containing binding sites for the Su(H) and proneural bHLH A proteins. Here we show the novel result that the ac promoter contains a similar combinatorial code of Su(H) and bHLH A binding sites but contains a different Su(H) site architectural code that does not mediate activation during Notch signaling, thus programming a cell expression pattern opposite that of m8 in proneural clusters.
Effects of cacheing on multitasking efficiency and programming strategy on an ELXSI 6400
DOE Office of Scientific and Technical Information (OSTI.GOV)
Montry, G.R.; Benner, R.E.
1985-12-01
The impact of a cache/shared memory architecture, and, in particular, the cache coherency problem, upon concurrent algorithm and program development is discussed. In this context, a simple set of programming strategies are proposed which streamline code development and improve code performance when multitasking in a cache/shared memory or distributed memory environment.
Planning for Pre-Exascale Platform Environment (Fiscal Year 2015 Level 2 Milestone 5216)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Springmeyer, R.; Lang, M.; Noe, J.
This Plan for ASC Pre-Exascale Platform Environments document constitutes the deliverable for the fiscal year 2015 (FY15) Advanced Simulation and Computing (ASC) Program Level 2 milestone Planning for Pre-Exascale Platform Environment. It acknowledges and quantifies challenges and recognized gaps for moving the ASC Program towards effective use of exascale platforms and recommends strategies to address these gaps. This document also presents an update to the concerns, strategies, and plans presented in the FY08 predecessor document that dealt with the upcoming (at the time) petascale high performance computing (HPC) platforms. With the looming push towards exascale systems, a review of themore » earlier document was appropriate in light of the myriad architectural choices currently under consideration. The ASC Program believes the platforms to be fielded in the 2020s will be fundamentally different systems that stress ASC’s ability to modify codes to take full advantage of new or unique features. In addition, the scale of components will increase the difficulty of maintaining an errorfree system, thus driving new approaches to resilience and error detection/correction. The code revamps of the past, from serial- to vector-centric code to distributed memory to threaded implementations, will be revisited as codes adapt to a new message passing interface (MPI) plus “x” or more advanced and dynamic programming models based on architectural specifics. Development efforts are already underway in some cases, and more difficult or uncertain aspects of the new architectures will require research and analysis that may inform future directions for program choices. In addition, the potential diversity of system architectures may require parallel if not duplicative efforts to analyze and modify environments, codes, subsystems, libraries, debugging tools, and performance analysis techniques as well as exploring new monitoring methodologies. It is difficult if not impossible to selectively eliminate some of these activities until more information is available through simulations of potential architectures, analysis of systems designs, and informed study of commodity technologies that will be the constituent parts of future platforms.« less
Developing Information Power Grid Based Algorithms and Software
NASA Technical Reports Server (NTRS)
Dongarra, Jack
1998-01-01
This exploratory study initiated our effort to understand performance modeling on parallel systems. The basic goal of performance modeling is to understand and predict the performance of a computer program or set of programs on a computer system. Performance modeling has numerous applications, including evaluation of algorithms, optimization of code implementations, parallel library development, comparison of system architectures, parallel system design, and procurement of new systems. Our work lays the basis for the construction of parallel libraries that allow for the reconstruction of application codes on several distinct architectures so as to assure performance portability. Following our strategy, once the requirements of applications are well understood, one can then construct a library in a layered fashion. The top level of this library will consist of architecture-independent geometric, numerical, and symbolic algorithms that are needed by the sample of applications. These routines should be written in a language that is portable across the targeted architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Neely, J. R.; Hornung, R.; Black, A.
This document serves as a detailed companion to the powerpoint slides presented as part of the ASC L2 milestone review for Integrated Codes milestone #4782 titled “Assess Newly Emerging Programming and Memory Models for Advanced Architectures on Integrated Codes”, due on 9/30/2014, and presented for formal program review on 9/12/2014. The program review committee is represented by Mike Zika (A Program Project Lead for Kull), Brian Pudliner (B Program Project Lead for Ares), Scott Futral (DEG Group Lead in LC), and Mike Glass (Sierra Project Lead at Sandia). This document, along with the presentation materials, and a letter of completionmore » signed by the review committee will act as proof of completion for this milestone.« less
An Object-Oriented Network-Centric Software Architecture for Physical Computing
NASA Astrophysics Data System (ADS)
Palmer, Richard
1997-08-01
Recent developments in object-oriented computer languages and infrastructure such as the Internet, Web browsers, and the like provide an opportunity to define a more productive computational environment for scientific programming that is based more closely on the underlying mathematics describing physics than traditional programming languages such as FORTRAN or C++. In this talk I describe an object-oriented software architecture for representing physical problems that includes classes for such common mathematical objects as geometry, boundary conditions, partial differential and integral equations, discretization and numerical solution methods, etc. In practice, a scientific program written using this architecture looks remarkably like the mathematics used to understand the problem, is typically an order of magnitude smaller than traditional FORTRAN or C++ codes, and hence easier to understand, debug, describe, etc. All objects in this architecture are ``network-enabled,'' which means that components of a software solution to a physical problem can be transparently loaded from anywhere on the Internet or other global network. The architecture is expressed as an ``API,'' or application programmers interface specification, with reference embeddings in Java, Python, and C++. A C++ class library for an early version of this API has been implemented for machines ranging from PC's to the IBM SP2, meaning that phidentical codes run on all architectures.
Applying a Service-Oriented Architecture to Operational Flight Program Development
2007-09-01
using two Java 2 Enterprise Edition (J2EE) Web servers. The weapon models were accessed using a SUN Microsystems Java Web Services Development Pack...Oriented Architectures 22 CROSSTALK The Journal of Defense Software Engineering September 2007 tion, and Spring/ Hibernate to provide the data access...tion since a major coding effort was avoided. The majority of the effort was tweaking pre-existing Java source code and editing of eXtensible Markup
Supporting shared data structures on distributed memory architectures
NASA Technical Reports Server (NTRS)
Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John
1990-01-01
Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described.
Portable LQCD Monte Carlo code using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Calore, Enrico; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Fabio Schifano, Sebastiano; Silvi, Giorgio; Tripiccione, Raffaele
2018-03-01
Varying from multi-core CPU processors to many-core GPUs, the present scenario of HPC architectures is extremely heterogeneous. In this context, code portability is increasingly important for easy maintainability of applications; this is relevant in scientific computing where code changes are numerous and frequent. In this talk we present the design and optimization of a state-of-the-art production level LQCD Monte Carlo application, using the OpenACC directives model. OpenACC aims to abstract parallel programming to a descriptive level, where programmers do not need to specify the mapping of the code on the target machine. We describe the OpenACC implementation and show that the same code is able to target different architectures, including state-of-the-art CPUs and GPUs.
Hardware-Independent Proofs of Numerical Programs
NASA Technical Reports Server (NTRS)
Boldo, Sylvie; Nguyen, Thi Minh Tuyen
2010-01-01
On recent architectures, a numerical program may give different answers depending on the execution hardware and the compilation. Our goal is to formally prove properties about numerical programs that are true for multiple architectures and compilers. We propose an approach that states the rounding error of each floating-point computation whatever the environment. This approach is implemented in the Frama-C platform for static analysis of C code. Small case studies using this approach are entirely and automatically proved
MILC Code Performance on High End CPU and GPU Supercomputer Clusters
NASA Astrophysics Data System (ADS)
DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug
2018-03-01
With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoekstra, Robert J.; Hammond, Simon David; Richards, David
2017-09-01
This milestone is a tri-lab deliverable supporting ongoing Co-Design efforts impacting applications in the Integrated Codes (IC) program element Advanced Technology Development and Mitigation (ATDM) program element. In FY14, the trilabs looked at porting proxy application to technologies of interest for ATS procurements. In FY15, a milestone was completed evaluating proxy applications in multiple programming models and in FY16, a milestone was completed focusing on the migration of lessons learned back into production code development. This year, the co-design milestone focuses on extracting the knowledge gained and/or code revisions back into production applications.
STGT program: Ada coding and architecture lessons learned
NASA Technical Reports Server (NTRS)
Usavage, Paul; Nagurney, Don
1992-01-01
STGT (Second TDRSS Ground Terminal) is currently halfway through the System Integration Test phase (Level 4 Testing). To date, many software architecture and Ada language issues have been encountered and solved. This paper, which is the transcript of a presentation at the 3 Dec. meeting, attempts to define these lessons plus others learned regarding software project management and risk management issues, training, performance, reuse, and reliability. Observations are included regarding the use of particular Ada coding constructs, software architecture trade-offs during the prototyping, development and testing stages of the project, and dangers inherent in parallel or concurrent systems, software, hardware, and operations engineering.
Exploring Asynchronous Many-Task Runtime Systems toward Extreme Scales
DOE Office of Scientific and Technical Information (OSTI.GOV)
Knight, Samuel; Baker, Gavin Matthew; Gamell, Marc
2015-10-01
Major exascale computing reports indicate a number of software challenges to meet the dramatic change of system architectures in near future. While several-orders-of-magnitude increase in parallelism is the most commonly cited of those, hurdles also include performance heterogeneity of compute nodes across the system, increased imbalance between computational capacity and I/O capabilities, frequent system interrupts, and complex hardware architectures. Asynchronous task-parallel programming models show a great promise in addressing these issues, but are not yet fully understood nor developed su ciently for computational science and engineering application codes. We address these knowledge gaps through quantitative and qualitative exploration of leadingmore » candidate solutions in the context of engineering applications at Sandia. In this poster, we evaluate MiniAero code ported to three leading candidate programming models (Charm++, Legion and UINTAH) to examine the feasibility of these models that permits insertion of new programming model elements into an existing code base.« less
A visual programming environment for the Navier-Stokes computer
NASA Technical Reports Server (NTRS)
Tomboulian, Sherryl; Crockett, Thomas W.; Middleton, David
1988-01-01
The Navier-Stokes computer is a high-performance, reconfigurable, pipelined machine designed to solve large computational fluid dynamics problems. Due to the complexity of the architecture, development of effective, high-level language compilers for the system appears to be a very difficult task. Consequently, a visual programming methodology has been developed which allows users to program the system at an architectural level by constructing diagrams of the pipeline configuration. These schematic program representations can then be checked for validity and automatically translated into machine code. The visual environment is illustrated by using a prototype graphical editor to program an example problem.
Copy Hiding Application Interface
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, Holger; Poliakoff, David; Robinson, Peter
2016-10-06
CHAI is a light-weight framework which abstracts the automated movement of data (e.g. to/from Host/Device) via RAJA like performance portability programming model constructs. It can be viewed as a utility framework and an adjunct to FAJA (A Performance Portability Framework). Performance Portability is a technique that abstracts the complexities of modern Heterogeneous Architectures while allowing the original program to undergo incremental minimally invasive code changes in order to adapt to the newer architectures.
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry
1998-01-01
This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Draeger, E. W.
The Advanced Architecture and Portability Specialists team (AAPS) worked with a select set of LLNL application teams to develop and/or implement a portability strategy for next-generation architectures. The team also investigated new and updated programming models and helped develop programming abstractions targeting maintainability and performance portability. Significant progress was made on both fronts in FY17, resulting in multiple applications being significantly more prepared for the nextgeneration machines than before.
Porting plasma physics simulation codes to modern computing architectures using the
NASA Astrophysics Data System (ADS)
Germaschewski, Kai; Abbott, Stephen
2015-11-01
Available computing power has continued to grow exponentially even after single-core performance satured in the last decade. The increase has since been driven by more parallelism, both using more cores and having more parallelism in each core, e.g. in GPUs and Intel Xeon Phi. Adapting existing plasma physics codes is challenging, in particular as there is no single programming model that covers current and future architectures. We will introduce the open-source
MODTRAN6: a major upgrade of the MODTRAN radiative transfer code
NASA Astrophysics Data System (ADS)
Berk, Alexander; Conforti, Patrick; Kennett, Rosemary; Perkins, Timothy; Hawes, Frederick; van den Bosch, Jeannette
2014-06-01
The MODTRAN6 radiative transfer (RT) code is a major advancement over earlier versions of the MODTRAN atmospheric transmittance and radiance model. This version of the code incorporates modern software ar- chitecture including an application programming interface, enhanced physics features including a line-by-line algorithm, a supplementary physics toolkit, and new documentation. The application programming interface has been developed for ease of integration into user applications. The MODTRAN code has been restructured towards a modular, object-oriented architecture to simplify upgrades as well as facilitate integration with other developers' codes. MODTRAN now includes a line-by-line algorithm for high resolution RT calculations as well as coupling to optical scattering codes for easy implementation of custom aerosols and clouds.
PLAYGROUND: Preparing Students for the Cyber Battleground
ERIC Educational Resources Information Center
Nielson, Seth James
2017-01-01
Attempting to educate practitioners of computer security can be difficult if for no other reason than the breadth of knowledge required today. The security profession includes widely diverse subfields including cryptography, network architectures, programming, programming languages, design, coding practices, software testing, pattern recognition,…
Automation of Data Traffic Control on DSM Architecture
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry
2001-01-01
The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
NASA Lewis Steady-State Heat Pipe Code Architecture
NASA Technical Reports Server (NTRS)
Mi, Ye; Tower, Leonard K.
2013-01-01
NASA Glenn Research Center (GRC) has developed the LERCHP code. The PC-based LERCHP code can be used to predict the steady-state performance of heat pipes, including the determination of operating temperature and operating limits which might be encountered under specified conditions. The code contains a vapor flow algorithm which incorporates vapor compressibility and axially varying heat input. For the liquid flow in the wick, Darcy s formula is employed. Thermal boundary conditions and geometric structures can be defined through an interactive input interface. A variety of fluid and material options as well as user defined options can be chosen for the working fluid, wick, and pipe materials. This report documents the current effort at GRC to update the LERCHP code for operating in a Microsoft Windows (Microsoft Corporation) environment. A detailed analysis of the model is presented. The programming architecture for the numerical calculations is explained and flowcharts of the key subroutines are given
Program optimizations: The interplay between power, performance, and energy
Leon, Edgar A.; Karlin, Ian; Grant, Ryan E.; ...
2016-05-16
Practical considerations for future supercomputer designs will impose limits on both instantaneous power consumption and total energy consumption. Working within these constraints while providing the maximum possible performance, application developers will need to optimize their code for speed alongside power and energy concerns. This paper analyzes the effectiveness of several code optimizations including loop fusion, data structure transformations, and global allocations. A per component measurement and analysis of different architectures is performed, enabling the examination of code optimizations on different compute subsystems. Using an explicit hydrodynamics proxy application from the U.S. Department of Energy, LULESH, we show how code optimizationsmore » impact different computational phases of the simulation. This provides insight for simulation developers into the best optimizations to use during particular simulation compute phases when optimizing code for future supercomputing platforms. Here, we examine and contrast both x86 and Blue Gene architectures with respect to these optimizations.« less
Programming for 1.6 Millon cores: Early experiences with IBM's BG/Q SMP architecture
NASA Astrophysics Data System (ADS)
Glosli, James
2013-03-01
With the stall in clock cycle improvements a decade ago, the drive for computational performance has continues along a path of increasing core counts on a processor. The multi-core evolution has been expressed in both a symmetric multi processor (SMP) architecture and cpu/GPU architecture. Debates rage in the high performance computing (HPC) community which architecture best serves HPC. In this talk I will not attempt to resolve that debate but perhaps fuel it. I will discuss the experience of exploiting Sequoia, a 98304 node IBM Blue Gene/Q SMP at Lawrence Livermore National Laboratory. The advantages and challenges of leveraging the computational power BG/Q will be detailed through the discussion of two applications. The first application is a Molecular Dynamics code called ddcMD. This is a code developed over the last decade at LLNL and ported to BG/Q. The second application is a cardiac modeling code called Cardioid. This is a code that was recently designed and developed at LLNL to exploit the fine scale parallelism of BG/Q's SMP architecture. Through the lenses of these efforts I'll illustrate the need to rethink how we express and implement our computational approaches. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit
NASA Technical Reports Server (NTRS)
Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;
1998-01-01
Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.
Guidelines for developing vectorizable computer programs
NASA Technical Reports Server (NTRS)
Miner, E. W.
1982-01-01
Some fundamental principles for developing computer programs which are compatible with array-oriented computers are presented. The emphasis is on basic techniques for structuring computer codes which are applicable in FORTRAN and do not require a special programming language or exact a significant penalty on a scalar computer. Researchers who are using numerical techniques to solve problems in engineering can apply these basic principles and thus develop transportable computer programs (in FORTRAN) which contain much vectorizable code. The vector architecture of the ASC is discussed so that the requirements of array processing can be better appreciated. The "vectorization" of a finite-difference viscous shock-layer code is used as an example to illustrate the benefits and some of the difficulties involved. Increases in computing speed with vectorization are illustrated with results from the viscous shock-layer code and from a finite-element shock tube code. The applicability of these principles was substantiated through running programs on other computers with array-associated computing characteristics, such as the Hewlett-Packard (H-P) 1000-F.
The BLAZE language: A parallel language for scientific programming
NASA Technical Reports Server (NTRS)
Mehrotra, P.; Vanrosendale, J.
1985-01-01
A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
The Visualization Toolkit (VTK): Rewriting the rendering code for modern graphics cards
NASA Astrophysics Data System (ADS)
Hanwell, Marcus D.; Martin, Kenneth M.; Chaudhary, Aashish; Avila, Lisa S.
2015-09-01
The Visualization Toolkit (VTK) is an open source, permissively licensed, cross-platform toolkit for scientific data processing, visualization, and data analysis. It is over two decades old, originally developed for a very different graphics card architecture. Modern graphics cards feature fully programmable, highly parallelized architectures with large core counts. VTK's rendering code was rewritten to take advantage of modern graphics cards, maintaining most of the toolkit's programming interfaces. This offers the opportunity to compare the performance of old and new rendering code on the same systems/cards. Significant improvements in rendering speeds and memory footprints mean that scientific data can be visualized in greater detail than ever before. The widespread use of VTK means that these improvements will reap significant benefits.
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Federal Register 2010, 2011, 2012, 2013, 2014
2012-06-20
... analysis and design, and computer software design and coding. Given the fact that over $500 million were... acoustic algorithms, computer architecture, and source code that dated to the 1970s. Since that time... 2012. Version 3.0 is an entirely new, state-of-the-art computer program used for predicting noise...
New Developments in Modeling MHD Systems on High Performance Computing Architectures
NASA Astrophysics Data System (ADS)
Germaschewski, K.; Raeder, J.; Larson, D. J.; Bhattacharjee, A.
2009-04-01
Modeling the wide range of time and length scales present even in fluid models of plasmas like MHD and X-MHD (Extended MHD including two fluid effects like Hall term, electron inertia, electron pressure gradient) is challenging even on state-of-the-art supercomputers. In the last years, HPC capacity has continued to grow exponentially, but at the expense of making the computer systems more and more difficult to program in order to get maximum performance. In this paper, we will present a new approach to managing the complexity caused by the need to write efficient codes: Separating the numerical description of the problem, in our case a discretized right hand side (r.h.s.), from the actual implementation of efficiently evaluating it. An automatic code generator is used to describe the r.h.s. in a quasi-symbolic form while leaving the translation into efficient and parallelized code to a computer program itself. We implemented this approach for OpenGGCM (Open General Geospace Circulation Model), a model of the Earth's magnetosphere, which was accelerated by a factor of three on regular x86 architecture and a factor of 25 on the Cell BE architecture (commonly known for its deployment in Sony's PlayStation 3).
NASA Technical Reports Server (NTRS)
Nosenchuck, D. M.; Littman, M. G.
1986-01-01
The Navier-Stokes computer (NSC) has been developed for solving problems in fluid mechanics involving complex flow simulations that require more speed and capacity than provided by current and proposed Class VI supercomputers. The machine is a parallel processing supercomputer with several new architectural elements which can be programmed to address a wide range of problems meeting the following criteria: (1) the problem is numerically intensive, and (2) the code makes use of long vectors. A simulation of two-dimensional nonsteady viscous flows is presented to illustrate the architecture, programming, and some of the capabilities of the NSC.
The BLAZE language - A parallel language for scientific programming
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Van Rosendale, John
1987-01-01
A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
Interfacing Computer Aided Parallelization and Performance Analysis
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)
2003-01-01
When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-06-22
... analysis and design, and computer software design and coding. Given the fact that over $500 million were... acoustic algorithms, computer architecture, and source code that dated to the 1970s. Since that time... towards the end of 2012. Version 3.0 is an entirely new, state-of-the-art computer program used for...
A Review of Lightweight Thread Approaches for High Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castello, Adrian; Pena, Antonio J.; Seo, Sangmin
High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, wemore » study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.« less
Rosetta3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules
Leaver-Fay, Andrew; Tyka, Michael; Lewis, Steven M.; Lange, Oliver F.; Thompson, James; Jacak, Ron; Kaufman, Kristian; Renfrew, P. Douglas; Smith, Colin A.; Sheffler, Will; Davis, Ian W.; Cooper, Seth; Treuille, Adrien; Mandell, Daniel J.; Richter, Florian; Ban, Yih-En Andrew; Fleishman, Sarel J.; Corn, Jacob E.; Kim, David E.; Lyskov, Sergey; Berrondo, Monica; Mentzer, Stuart; Popović, Zoran; Havranek, James J.; Karanicolas, John; Das, Rhiju; Meiler, Jens; Kortemme, Tanja; Gray, Jeffrey J.; Kuhlman, Brian; Baker, David; Bradley, Philip
2013-01-01
We have recently completed a full re-architecturing of the Rosetta molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy to use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as Rosetta3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This document describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform. PMID:21187238
Some Problems and Solutions in Transferring Ecosystem Simulation Codes to Supercomputers
NASA Technical Reports Server (NTRS)
Skiles, J. W.; Schulbach, C. H.
1994-01-01
Many computer codes for the simulation of ecological systems have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Recent recognition of ecosystem science as a High Performance Computing and Communications Program Grand Challenge area emphasizes supercomputers (both parallel and distributed systems) as the next set of tools for ecological simulation. Transferring ecosystem simulation codes to such systems is not a matter of simply compiling and executing existing code on the supercomputer since there are significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers. To more appropriately match the application to the architecture (necessary to achieve reasonable performance), the parallelism (if it exists) of the original application must be exploited. We discuss our work in transferring a general grassland simulation model (developed on a VAX in the FORTRAN computer programming language) to a Cray Y-MP. We show the Cray shared-memory vector-architecture, and discuss our rationale for selecting the Cray. We describe porting the model to the Cray and executing and verifying a baseline version, and we discuss the changes we made to exploit the parallelism in the application and to improve code execution. As a result, the Cray executed the model 30 times faster than the VAX 11/785 and 10 times faster than a Sun 4 workstation. We achieved an additional speed-up of approximately 30 percent over the original Cray run by using the compiler's vectorizing capabilities and the machine's ability to put subroutines and functions "in-line" in the code. With the modifications, the code still runs at only about 5% of the Cray's peak speed because it makes ineffective use of the vector processing capabilities of the Cray. We conclude with a discussion and future plans.
A New Internet Tool for Automatic Evaluation in Control Systems and Programming
ERIC Educational Resources Information Center
Munoz de la Pena, D.; Gomez-Estern, F.; Dormido, S.
2012-01-01
In this paper we present a web-based innovative education tool designed for automating the collection, evaluation and error detection in practical exercises assigned to computer programming and control engineering students. By using a student/instructor code-fusion architecture, the conceptual limits of multiple-choice tests are overcome by far.…
SU (2) lattice gauge theory simulations on Fermi GPUs
NASA Astrophysics Data System (ADS)
Cardoso, Nuno; Bicudo, Pedro
2011-05-01
In this work we explore the performance of CUDA in quenched lattice SU (2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes for the Monte Carlo generation of SU (2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (50,000) without smearing and almost 2000 configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of 200× the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than 2× slower) than single precision computations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ebert, D.
1997-07-01
This is a report on the CSNI Workshop on Transient Thermal-Hydraulic and Neutronic Codes Requirements held at Annapolis, Maryland, USA November 5-8, 1996. This experts` meeting consisted of 140 participants from 21 countries; 65 invited papers were presented. The meeting was divided into five areas: (1) current and prospective plans of thermal hydraulic codes development; (2) current and anticipated uses of thermal-hydraulic codes; (3) advances in modeling of thermal-hydraulic phenomena and associated additional experimental needs; (4) numerical methods in multi-phase flows; and (5) programming language, code architectures and user interfaces. The workshop consensus identified the following important action items tomore » be addressed by the international community in order to maintain and improve the calculational capability: (a) preserve current code expertise and institutional memory, (b) preserve the ability to use the existing investment in plant transient analysis codes, (c) maintain essential experimental capabilities, (d) develop advanced measurement capabilities to support future code validation work, (e) integrate existing analytical capabilities so as to improve performance and reduce operating costs, (f) exploit the proven advances in code architecture, numerics, graphical user interfaces, and modularization in order to improve code performance and scrutibility, and (g) more effectively utilize user experience in modifying and improving the codes.« less
Implementation of context independent code on a new array processor: The Super-65
NASA Technical Reports Server (NTRS)
Colbert, R. O.; Bowhill, S. A.
1981-01-01
The feasibility of rewriting standard uniprocessor programs into code which contains no context-dependent branches is explored. Context independent code (CIC) would contain no branches that might require different processing elements to branch different ways. In order to investigate the possibilities and restrictions of CIC, several programs were recoded into CIC and a four-element array processor was built. This processor (the Super-65) consisted of three 6502 microprocessors and the Apple II microcomputer. The results obtained were somewhat dependent upon the specific architecture of the Super-65 but within bounds, the throughput of the array processor was found to increase linearly with the number of processing elements (PEs). The slope of throughput versus PEs is highly dependent on the program and varied from 0.33 to 1.00 for the sample programs.
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)
1997-01-01
Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
1991-05-31
benchmarks ............ .... . .. .. . . .. 220 Appendix G : Source code of the Aquarius Prolog compiler ........ . 224 Chapter I Introduction "You’re given...notation, a tool that is used throughout the compiler’s implementation. Appendix F lists the source code of the C and Prolog benchmarks. Appendix G lists the...source code of the compilcr. 5 "- standard form Prolog / a-sfomadon / head umrvln Convert to tmeikernel Prol g vrans~fonaon 1symbolic execution
Towards Implementation of a Generalized Architecture for High-Level Quantum Programming Language
NASA Astrophysics Data System (ADS)
Ameen, El-Mahdy M.; Ali, Hesham A.; Salem, Mofreh M.; Badawy, Mahmoud
2017-08-01
This paper investigates a novel architecture to the problem of quantum computer programming. A generalized architecture for a high-level quantum programming language has been proposed. Therefore, the programming evolution from the complicated quantum-based programming to the high-level quantum independent programming will be achieved. The proposed architecture receives the high-level source code and, automatically transforms it into the equivalent quantum representation. This architecture involves two layers which are the programmer layer and the compilation layer. These layers have been implemented in the state of the art of three main stages; pre-classification, classification, and post-classification stages respectively. The basic building block of each stage has been divided into subsequent phases. Each phase has been implemented to perform the required transformations from one representation to another. A verification process was exposed using a case study to investigate the ability of the compiler to perform all transformation processes. Experimental results showed that the efficacy of the proposed compiler achieves a correspondence correlation coefficient about R ≈ 1 between outputs and the targets. Also, an obvious achievement has been utilized with respect to the consumed time in the optimization process compared to other techniques. In the online optimization process, the consumed time has increased exponentially against the amount of accuracy needed. However, in the proposed offline optimization process has increased gradually.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Architectural Visualization of C/C++ Source Code for Program Comprehension
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panas, T; Epperly, T W; Quinlan, D
2006-09-01
Structural and behavioral visualization of large-scale legacy systems to aid program comprehension is still a major challenge. The challenge is even greater when applications are implemented in flexible and expressive languages such as C and C++. In this paper, we consider visualization of static and dynamic aspects of large-scale scientific C/C++ applications. For our investigation, we reuse and integrate specialized analysis and visualization tools. Furthermore, we present a novel layout algorithm that permits a compressive architectural view of a large-scale software system. Our layout is unique in that it allows traditional program visualizations, i.e., graph structures, to be seen inmore » relation to the application's file structure.« less
Telerobotic rendezvous and docking vision system architecture
NASA Technical Reports Server (NTRS)
Gravely, Ben; Myers, Donald; Moody, David
1992-01-01
This research program has successfully demonstrated a new target label architecture that allows a microcomputer to determine the position, orientation, and identity of an object. It contains a CAD-like database with specific geometric information about the object for approach, grasping, and docking maneuvers. Successful demonstrations were performed selecting and docking an ORU box with either of two ORU receptacles. Small, but significant differences were seen in the two camera types used in the program, and camera sensitive program elements have been identified. The software has been formatted into a new co-autonomy system which provides various levels of operator interaction and promises to allow effective application of telerobotic systems while code improvements are continuing.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.
2003-01-01
With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
NASA Astrophysics Data System (ADS)
Work, Paul R.
1991-12-01
This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
NASA Astrophysics Data System (ADS)
Wallace, William; Miller, Jared; Diallo, Ahmed
2015-11-01
MultiPoint Thomson Scattering (MPTS) is an established, accurate method of finding the temperature, density, and pressure of a magnetically confined plasma. Two Nd:YAG (1064 nm) lasers are fired into the plasma with a effective frequency of 60 Hz, and the light is Doppler shifted by Thomson scattering. Polychromators on the NSTX-U midplane collect the scattered photons at various radii/scattering angles, and the avalanche photodiode voltages are saved to an MDSplus tree for later analysis. IDL code is then used to determine plasma temperature, pressure, and density from the captured polychromator measurements via Selden formulas. [1] Previous work [2] converted the single-processor IDL code into Python code, and prepared a new architecture for multiprocessing MPTS in parallel. However, that work was not completed to the generation of output data and curve fits that match with the previous IDL. This project refactored the Python code into a object-oriented architecture, and created a software test suite for the new architecture which allowed identification of the code which generated the difference in output. Another effort currently underway is to display the Thomson data in an intuitive, interactive format. This work was supported in part by the U.S. Department of Energy, Office of Science, Office of Workforce Development for Teachers and Scientists (WDTS) under the Community College Internship (CCI) program.
NASA Astrophysics Data System (ADS)
Fabien-Ouellet, Gabriel; Gloaguen, Erwan; Giroux, Bernard
2017-03-01
Full Waveform Inversion (FWI) aims at recovering the elastic parameters of the Earth by matching recordings of the ground motion with the direct solution of the wave equation. Modeling the wave propagation for realistic scenarios is computationally intensive, which limits the applicability of FWI. The current hardware evolution brings increasing parallel computing power that can speed up the computations in FWI. However, to take advantage of the diversity of parallel architectures presently available, new programming approaches are required. In this work, we explore the use of OpenCL to develop a portable code that can take advantage of the many parallel processor architectures now available. We present a program called SeisCL for 2D and 3D viscoelastic FWI in the time domain. The code computes the forward and adjoint wavefields using finite-difference and outputs the gradient of the misfit function given by the adjoint state method. To demonstrate the code portability on different architectures, the performance of SeisCL is tested on three different devices: Intel CPUs, NVidia GPUs and Intel Xeon PHI. Results show that the use of GPUs with OpenCL can speed up the computations by nearly two orders of magnitudes over a single threaded application on the CPU. Although OpenCL allows code portability, we show that some device-specific optimization is still required to get the best performance out of a specific architecture. Using OpenCL in conjunction with MPI allows the domain decomposition of large models on several devices located on different nodes of a cluster. For large enough models, the speedup of the domain decomposition varies quasi-linearly with the number of devices. Finally, we investigate two different approaches to compute the gradient by the adjoint state method and show the significant advantages of using OpenCL for FWI.
Performance and Architecture Lab Modeling Tool
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-06-19
Analytical application performance models are critical for diagnosing performance-limiting resources, optimizing systems, and designing machines. Creating models, however, is difficult. Furthermore, models are frequently expressed in forms that are hard to distribute and validate. The Performance and Architecture Lab Modeling tool, or Palm, is a modeling tool designed to make application modeling easier. Palm provides a source code modeling annotation language. Not only does the modeling language divide the modeling task into sub problems, it formally links an application's source code with its model. This link is important because a model's purpose is to capture application behavior. Furthermore, this linkmore » makes it possible to define rules for generating models according to source code organization. Palm generates hierarchical models according to well-defined rules. Given an application, a set of annotations, and a representative execution environment, Palm will generate the same model. A generated model is a an executable program whose constituent parts directly correspond to the modeled application. Palm generates models by combining top-down (human-provided) semantic insight with bottom-up static and dynamic analysis. A model's hierarchy is defined by static and dynamic source code structure. Because Palm coordinates models and source code, Palm's models are 'first-class' and reproducible. Palm automates common modeling tasks. For instance, Palm incorporates measurements to focus attention, represent constant behavior, and validate models. Palm's workflow is as follows. The workflow's input is source code annotated with Palm modeling annotations. The most important annotation models an instance of a block of code. Given annotated source code, the Palm Compiler produces executables and the Palm Monitor collects a representative performance profile. The Palm Generator synthesizes a model based on the static and dynamic mapping of annotations to program behavior. The model -- an executable program -- is a hierarchical composition of annotation functions, synthesized functions, statistics for runtime values, and performance measurements.« less
Mission Systems Open Architecture Science and Technology (MOAST) program
NASA Astrophysics Data System (ADS)
Littlejohn, Kenneth; Rajabian-Schwart, Vahid; Kovach, Nicholas; Satterthwaite, Charles P.
2017-04-01
The Mission Systems Open Architecture Science and Technology (MOAST) program is an AFRL effort that is developing and demonstrating Open System Architecture (OSA) component prototypes, along with methods and tools, to strategically evolve current OSA standards and technical approaches, promote affordable capability evolution, reduce integration risk, and address emerging challenges [1]. Within the context of open architectures, the program is conducting advanced research and concept development in the following areas: (1) Evolution of standards; (2) Cyber-Resiliency; (3) Emerging Concepts and Technologies; (4) Risk Reduction Studies and Experimentation; and (5) Advanced Technology Demonstrations. Current research includes the development of methods, tools, and techniques to characterize the performance of OMS data interconnection methods for representative mission system applications. Of particular interest are the OMS Critical Abstraction Layer (CAL), the Avionics Service Bus (ASB), and the Bulk Data Transfer interconnects, as well as to develop and demonstrate cybersecurity countermeasures techniques to detect and mitigate cyberattacks against open architecture based mission systems and ensure continued mission operations. Focus is on cybersecurity techniques that augment traditional cybersecurity controls and those currently defined within the Open Mission System and UCI standards. AFRL is also developing code generation tools and simulation tools to support evaluation and experimentation of OSA-compliant implementations.
Portable multi-node LQCD Monte Carlo simulations using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Calore, Enrico; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Sanfilippo, Francesco; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
This paper describes a state-of-the-art parallel Lattice QCD Monte Carlo code for staggered fermions, purposely designed to be portable across different computer architectures, including GPUs and commodity CPUs. Portability is achieved using the OpenACC parallel programming model, used to develop a code that can be compiled for several processor architectures. The paper focuses on parallelization on multiple computing nodes using OpenACC to manage parallelism within the node, and OpenMPI to manage parallelism among the nodes. We first discuss the available strategies to be adopted to maximize performances, we then describe selected relevant details of the code, and finally measure the level of performance and scaling-performance that we are able to achieve. The work focuses mainly on GPUs, which offer a significantly high level of performances for this application, but also compares with results measured on other processors.
SU (2) lattice gauge theory simulations on Fermi GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cardoso, Nuno, E-mail: nunocardoso@cftp.ist.utl.p; Bicudo, Pedro, E-mail: bicudo@ist.utl.p
2011-05-10
In this work we explore the performance of CUDA in quenched lattice SU (2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes formore » the Monte Carlo generation of SU (2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (50,000) without smearing and almost 2000 configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of 200x the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than 2x slower) than single precision computations.« less
Building Automatic Grading Tools for Basic of Programming Lab in an Academic Institution
NASA Astrophysics Data System (ADS)
Harimurti, Rina; Iwan Nurhidayat, Andi; Asmunin
2018-04-01
The skills of computer programming is a core competency that must be mastered by students majoring in computer sciences. The best way to improve this skill is through the practice of writing many programs to solve various problems from simple to complex. It takes hard work and a long time to check and evaluate the results of student labs one by one, especially if the number of students a lot. Based on these constrain, web proposes Automatic Grading Tools (AGT), the application that can evaluate and deeply check the source code in C, C++. The application architecture consists of students, web-based applications, compilers, and operating systems. Automatic Grading Tools (AGT) is implemented MVC Architecture and using open source software, such as laravel framework version 5.4, PostgreSQL 9.6, Bootstrap 3.3.7, and jquery library. Automatic Grading Tools has also been tested for real problems by submitting source code in C/C++ language and then compiling. The test results show that the AGT application has been running well.
A portable approach for PIC on emerging architectures
NASA Astrophysics Data System (ADS)
Decyk, Viktor
2016-03-01
A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.
NASA Technical Reports Server (NTRS)
McGuire, Tim
1998-01-01
In this paper, we report the results of our recent research on the application of a multiprocessor Cray T916 supercomputer in modeling super-thermal electron transport in the earth's magnetic field. In general, this mathematical model requires numerical solution of a system of partial differential equations. The code we use for this model is moderately vectorized. By using Amdahl's Law for vector processors, it can be verified that the code is about 60% vectorized on a Cray computer. Speedup factors on the order of 2.5 were obtained compared to the unvectorized code. In the following sections, we discuss the methodology of improving the code. In addition to our goal of optimizing the code for solution on the Cray computer, we had the goal of scalability in mind. Scalability combines the concepts of portabilty with near-linear speedup. Specifically, a scalable program is one whose performance is portable across many different architectures with differing numbers of processors for many different problem sizes. Though we have access to a Cray at this time, the goal was to also have code which would run well on a variety of architectures.
Shared Memory Parallelization of an Implicit ADI-type CFD Code
NASA Technical Reports Server (NTRS)
Hauser, Th.; Huang, P. G.
1999-01-01
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
Replica Exchange Molecular Dynamics in the Age of Heterogeneous Architectures
NASA Astrophysics Data System (ADS)
Roitberg, Adrian
2014-03-01
The rise of GPU-based codes has allowed MD to reach timescales only dreamed of only 5 years ago. Even within this new paradigm there is still need for advanced sampling techniques. Modern supercomputers (e.g. Blue Waters, Titan, Keeneland) have made available to users a significant number of GPUS and CPUS, which in turn translate into amazing opportunities for dream calculations. Replica-exchange based methods can optimally use tis combination of codes and architectures to explore conformational variabilities in large systems. I will show our recent work in porting the program Amber to GPUS, and the support for replica exchange methods, where the replicated dimension could be Temperature, pH, Hamiltonian, Umbrella windows and combinations of those schemes.
NASA Technical Reports Server (NTRS)
McCurdy, David R.; Roche, Joseph M.
2004-01-01
In support of NASA's Next Generation Launch Technology (NGLT) program, the Andrews Gryphon booster was studied. The Andrews Gryphon concept is a horizontal lift-off, two-stage-to-orbit, reusable launch vehicle that uses an air collection and enrichment system (ACES). The purpose of the ACES is to collect atmospheric oxygen during a subsonic flight loiter phase and cool it to cryogenic temperature, ultimately resulting in a reduced initial take-off weight To study the performance and size of an air-collection based booster, an initial airplane like shape was established as a baseline and modeled in a vehicle sizing code. The code, SIZER, contains a general series of volume, surface area, and fuel fraction relationships that tie engine and ACES performance with propellant requirements and volumetric constraints in order to establish vehicle closure for the given mission. A key element of system level weight optimization is the use of the SIZER program that provides rapid convergence and a great deal of flexibility for different tank architectures and material suites in order to study their impact on gross lift-off weight. This paper discusses important elements of the sizing code architecture followed by highlights of the baseline booster study.
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes
Vincenti, H.; Lobet, M.; Lehe, R.; ...
2016-09-19
In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈20pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD registermore » length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scat;ter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of ×2 to ×2.5 speed-up in double precision for particle shape factor of orders 1–3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles). Program summary Program Title: vec_deposition Program Files doi:http://dx.doi.org/10.17632/nh77fv9k8c.1 Licensing provisions: BSD 3-Clause Programming language: Fortran 90 External routines/libraries: OpenMP > 4.0 Nature of problem: Exascale architectures will have many-core processors per node with long vector data registers capable of performing one single instruction on multiple data during one clock cycle. Data register lengths are expected to double every four years and this pushes for new portable solutions for efficiently vectorizing Particle-In-Cell codes on these future many-core architectures. One of the main hotspot routines of the PIC algorithm is the current/charge deposition for which there is no efficient and portable vector algorithm. Solution method: Here we provide an efficient and portable vector algorithm of current/charge deposition routines that uses a new data structure, which significantly reduces gather/scatter operations. Vectorization is controlled using OpenMP 4.0 compiler directives for vectorization which ensures portability across different architectures. Restrictions: Here we do not provide the full PIC algorithm with an executable but only vector routines for current/charge deposition. These scalar/vector routines can be used as library routines in your 3D Particle-In-Cell code. However, to get the best performances out of vector routines you have to satisfy the two following requirements: (1) Your code should implement particle tiling (as explained in the manuscript) to allow for maximized cache reuse and reduce memory accesses that can hinder vector performances. The routines can be used directly on each particle tile. (2) You should compile your code with a Fortran 90 compiler (e.g Intel, gnu or cray) and provide proper alignment flags and compiler alignment directives (more details in README file).« less
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vincenti, H.; Lobet, M.; Lehe, R.
In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈20pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD registermore » length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scat;ter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of ×2 to ×2.5 speed-up in double precision for particle shape factor of orders 1–3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles). Program summary Program Title: vec_deposition Program Files doi:http://dx.doi.org/10.17632/nh77fv9k8c.1 Licensing provisions: BSD 3-Clause Programming language: Fortran 90 External routines/libraries: OpenMP > 4.0 Nature of problem: Exascale architectures will have many-core processors per node with long vector data registers capable of performing one single instruction on multiple data during one clock cycle. Data register lengths are expected to double every four years and this pushes for new portable solutions for efficiently vectorizing Particle-In-Cell codes on these future many-core architectures. One of the main hotspot routines of the PIC algorithm is the current/charge deposition for which there is no efficient and portable vector algorithm. Solution method: Here we provide an efficient and portable vector algorithm of current/charge deposition routines that uses a new data structure, which significantly reduces gather/scatter operations. Vectorization is controlled using OpenMP 4.0 compiler directives for vectorization which ensures portability across different architectures. Restrictions: Here we do not provide the full PIC algorithm with an executable but only vector routines for current/charge deposition. These scalar/vector routines can be used as library routines in your 3D Particle-In-Cell code. However, to get the best performances out of vector routines you have to satisfy the two following requirements: (1) Your code should implement particle tiling (as explained in the manuscript) to allow for maximized cache reuse and reduce memory accesses that can hinder vector performances. The routines can be used directly on each particle tile. (2) You should compile your code with a Fortran 90 compiler (e.g Intel, gnu or cray) and provide proper alignment flags and compiler alignment directives (more details in README file).« less
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.-L.
2015-05-01
Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The co-processor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of Xeon Phi will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 1.3x.
A portable platform for accelerated PIC codes and its application to GPUs using OpenACC
NASA Astrophysics Data System (ADS)
Hariri, F.; Tran, T. M.; Jocksch, A.; Lanti, E.; Progsch, J.; Messmer, P.; Brunner, S.; Gheller, C.; Villard, L.
2016-10-01
We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandy bridge 8-core CPU by a factor of 3.4.
Computing element evolution towards Exascale and its impact on legacy simulation codes
NASA Astrophysics Data System (ADS)
Colin de Verdière, Guillaume J. L.
2015-12-01
In the light of the current race towards the Exascale, this article highlights the main features of the forthcoming computing elements that will be at the core of next generations of supercomputers. The market analysis, underlying this work, shows that computers are facing a major evolution in terms of architecture. As a consequence, it is important to understand the impacts of those evolutions on legacy codes or programming methods. The problems of dissipated power and memory access are discussed and will lead to a vision of what should be an exascale system. To survive, programming languages had to respond to the hardware evolutions either by evolving or with the creation of new ones. From the previous elements, we elaborate why vectorization, multithreading, data locality awareness and hybrid programming will be the key to reach the exascale, implying that it is time to start rewriting codes.
The path toward HEP High Performance Computing
NASA Astrophysics Data System (ADS)
Apostolakis, John; Brun, René; Carminati, Federico; Gheata, Andrei; Wenzel, Sandro
2014-06-01
High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies have shown that, on modern architectures, it achieves a performance between 10% and 50% of the peak one. Although several successful attempts have been made to port selected codes on GPUs, no major HEP code suite has a "High Performance" implementation. With LHC undergoing a major upgrade and a number of challenging experiments on the drawing board, HEP cannot any longer neglect the less-than-optimal performance of its code and it has to try making the best usage of the hardware. This activity is one of the foci of the SFT group at CERN, which hosts, among others, the Root and Geant4 project. The activity of the experiments is shared and coordinated via a Concurrency Forum, where the experience in optimising HEP code is presented and discussed. Another activity is the Geant-V project, centred on the development of a highperformance prototype for particle transport. Achieving a good concurrency level on the emerging parallel architectures without a complete redesign of the framework can only be done by parallelizing at event level, or with a much larger effort at track level. Apart the shareable data structures, this typically implies a multiplication factor in terms of memory consumption compared to the single threaded version, together with sub-optimal handling of event processing tails. Besides this, the low level instruction pipelining of modern processors cannot be used efficiently to speedup the program. We have implemented a framework that allows scheduling vectors of particles to an arbitrary number of computing resources in a fine grain parallel approach. The talk will review the current optimisation activities within the SFT group with a particular emphasis on the development perspectives towards a simulation framework able to profit best from the recent technology evolution in computing.
Cognitive Architectures for Multimedia Learning
ERIC Educational Resources Information Center
Reed, Stephen K.
2006-01-01
This article provides a tutorial overview of cognitive architectures that can form a theoretical foundation for designing multimedia instruction. Cognitive architectures include a description of memory stores, memory codes, and cognitive operations. Architectures that are relevant to multimedia learning include Paivio's dual coding theory,…
Multidisciplinary High-Fidelity Analysis and Optimization of Aerospace Vehicles. Part 1; Formulation
NASA Technical Reports Server (NTRS)
Walsh, J. L.; Townsend, J. C.; Salas, A. O.; Samareh, J. A.; Mukhopadhyay, V.; Barthelemy, J.-F.
2000-01-01
An objective of the High Performance Computing and Communication Program at the NASA Langley Research Center is to demonstrate multidisciplinary shape and sizing optimization of a complete aerospace vehicle configuration by using high-fidelity, finite element structural analysis and computational fluid dynamics aerodynamic analysis in a distributed, heterogeneous computing environment that includes high performance parallel computing. A software system has been designed and implemented to integrate a set of existing discipline analysis codes, some of them computationally intensive, into a distributed computational environment for the design of a highspeed civil transport configuration. The paper describes the engineering aspects of formulating the optimization by integrating these analysis codes and associated interface codes into the system. The discipline codes are integrated by using the Java programming language and a Common Object Request Broker Architecture (CORBA) compliant software product. A companion paper presents currently available results.
Building Interactive Simulations in Web Pages without Programming.
Mailen Kootsey, J; McAuley, Grant; Bernal, Julie
2005-01-01
A software system is described for building interactive simulations and other numerical calculations in Web pages. The system is based on a new Java-based software architecture named NumberLinX (NLX) that isolates each function required to build the simulation so that a library of reusable objects could be assembled. The NLX objects are integrated into a commercial Web design program for coding-free page construction. The model description is entered through a wizard-like utility program that also functions as a model editor. The complete system permits very rapid construction of interactive simulations without coding. A wide range of applications are possible with the system beyond interactive calculations, including remote data collection and processing and collaboration over a network.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ortiz-Rodriguez, J. M.; Reyes Alfaro, A.; Reyes Haro, A.
In this work a neutron spectrum unfolding code, based on artificial intelligence technology is presented. The code called ''Neutron Spectrometry and Dosimetry with Artificial Neural Networks and two Bonner spheres'', (NSDann2BS), was designed in a graphical user interface under the LabVIEW programming environment. The main features of this code are to use an embedded artificial neural network architecture optimized with the ''Robust design of artificial neural networks methodology'' and to use two Bonner spheres as the only piece of information. In order to build the code here presented, once the net topology was optimized and properly trained, knowledge stored atmore » synaptic weights was extracted and using a graphical framework build on the LabVIEW programming environment, the NSDann2BS code was designed. This code is friendly, intuitive and easy to use for the end user. The code is freely available upon request to authors. To demonstrate the use of the neural net embedded in the NSDann2BS code, the rate counts of {sup 252}Cf, {sup 241}AmBe and {sup 239}PuBe neutron sources measured with a Bonner spheres system.« less
NASA Astrophysics Data System (ADS)
Ortiz-Rodríguez, J. M.; Reyes Alfaro, A.; Reyes Haro, A.; Solís Sánches, L. O.; Miranda, R. Castañeda; Cervantes Viramontes, J. M.; Vega-Carrillo, H. R.
2013-07-01
In this work a neutron spectrum unfolding code, based on artificial intelligence technology is presented. The code called "Neutron Spectrometry and Dosimetry with Artificial Neural Networks and two Bonner spheres", (NSDann2BS), was designed in a graphical user interface under the LabVIEW programming environment. The main features of this code are to use an embedded artificial neural network architecture optimized with the "Robust design of artificial neural networks methodology" and to use two Bonner spheres as the only piece of information. In order to build the code here presented, once the net topology was optimized and properly trained, knowledge stored at synaptic weights was extracted and using a graphical framework build on the LabVIEW programming environment, the NSDann2BS code was designed. This code is friendly, intuitive and easy to use for the end user. The code is freely available upon request to authors. To demonstrate the use of the neural net embedded in the NSDann2BS code, the rate counts of 252Cf, 241AmBe and 239PuBe neutron sources measured with a Bonner spheres system.
NASA Astrophysics Data System (ADS)
Vilardy, Juan M.; Giacometto, F.; Torres, C. O.; Mattos, L.
2011-01-01
The two-dimensional Fast Fourier Transform (FFT 2D) is an essential tool in the two-dimensional discrete signals analysis and processing, which allows developing a large number of applications. This article shows the description and synthesis in VHDL code of the FFT 2D with fixed point binary representation using the programming tool Simulink HDL Coder of Matlab; showing a quick and easy way to handle overflow, underflow and the creation registers, adders and multipliers of complex data in VHDL and as well as the generation of test bench for verification of the codes generated in the ModelSim tool. The main objective of development of the hardware architecture of the FFT 2D focuses on the subsequent completion of the following operations applied to images: frequency filtering, convolution and correlation. The description and synthesis of the hardware architecture uses the XC3S1200E family Spartan 3E FPGA from Xilinx Manufacturer.
Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications.
Cabezas, Javier; Gelado, Isaac; Stone, John E; Navarro, Nacho; Kirk, David B; Hwu, Wen-Mei
2015-05-01
Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice.
Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications
Cabezas, Javier; Gelado, Isaac; Stone, John E.; Navarro, Nacho; Kirk, David B.; Hwu, Wen-mei
2014-01-01
Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice. PMID:26180487
NASA Technical Reports Server (NTRS)
Korzennik, Sylvain
1997-01-01
Under the direction of Dr. Rhodes, and the technical supervision of Dr. Korzennik, the data assimilation of high spatial resolution solar dopplergrams has been carried out throughout the program on the Intel Delta Touchstone supercomputer. With the help of a research assistant, partially supported by this grant, and under the supervision of Dr. Korzennik, code development was carried out at SAO, using various available resources. To ensure cross-platform portability, PVM was selected as the message passing library. A parallel implementation of power spectra computation for helioseismology data reduction, using PVM was successfully completed. It was successfully ported to SMP architectures (i.e. SUN), and to some MPP architectures (i.e. the CM5). Due to limitation of the implementation of PVM on the Cray T3D, the port to that architecture was not completed at the time.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
NASA Astrophysics Data System (ADS)
Simunovic, S.; Zacharia, T.; Baltas, N.; Spalding, D. B.
1995-03-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. The Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.
MPI implementation of PHOENICS: A general purpose computational fluid dynamics code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simunovic, S.; Zacharia, T.; Baltas, N.
1995-04-01
PHOENICS is a suite of computational analysis programs that are used for simulation of fluid flow, heat transfer, and dynamical reaction processes. The parallel version of the solver EARTH for the Computational Fluid Dynamics (CFD) program PHOENICS has been implemented using Message Passing Interface (MPI) standard. Implementation of MPI version of PHOENICS makes this computational tool portable to a wide range of parallel machines and enables the use of high performance computing for large scale computational simulations. MPI libraries are available on several parallel architectures making the program usable across different architectures as well as on heterogeneous computer networks. Themore » Intel Paragon NX and MPI versions of the program have been developed and tested on massively parallel supercomputers Intel Paragon XP/S 5, XP/S 35, and Kendall Square Research, and on the multiprocessor SGI Onyx computer at Oak Ridge National Laboratory. The preliminary testing results of the developed program have shown scalable performance for reasonably sized computational domains.« less
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel
2014-07-22
The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diversemore » manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.« less
NASA Technical Reports Server (NTRS)
Lavelle, Tom
2003-01-01
The objective is to increase the usability of the current NPSS code/architecture by incorporating an advanced space transportation propulsion system capability into the existing NPSS code and begin defining advanced capabilities for NPSS and provide an enhancement for the NPSS code/architecture.
Code Parallelization with CAPO: A User Manual
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)
2001-01-01
A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
NASA Astrophysics Data System (ADS)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.; Porter, D.; O'Neill, B. J.; Nolting, C.; Edmon, P.; Donnert, J. M. F.; Jones, T. W.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it may be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.
Bosse, Stefan
2015-01-01
Multi-agent systems (MAS) can be used for decentralized and self-organizing data processing in a distributed system, like a resource-constrained sensor network, enabling distributed information extraction, for example, based on pattern recognition and self-organization, by decomposing complex tasks in simpler cooperative agents. Reliable MAS-based data processing approaches can aid the material-integration of structural-monitoring applications, with agent processing platforms scaled to the microchip level. The agent behavior, based on a dynamic activity-transition graph (ATG) model, is implemented with program code storing the control and the data state of an agent, which is novel. The program code can be modified by the agent itself using code morphing techniques and is capable of migrating in the network between nodes. The program code is a self-contained unit (a container) and embeds the agent data, the initialization instructions and the ATG behavior implementation. The microchip agent processing platform used for the execution of the agent code is a standalone multi-core stack machine with a zero-operand instruction format, leading to a small-sized agent program code, low system complexity and high system performance. The agent processing is token-queue-based, similar to Petri-nets. The agent platform can be implemented in software, too, offering compatibility at the operational and code level, supporting agent processing in strong heterogeneous networks. In this work, the agent platform embedded in a large-scale distributed sensor network is simulated at the architectural level by using agent-based simulation techniques. PMID:25690550
Bosse, Stefan
2015-02-16
Multi-agent systems (MAS) can be used for decentralized and self-organizing data processing in a distributed system, like a resource-constrained sensor network, enabling distributed information extraction, for example, based on pattern recognition and self-organization, by decomposing complex tasks in simpler cooperative agents. Reliable MAS-based data processing approaches can aid the material-integration of structural-monitoring applications, with agent processing platforms scaled to the microchip level. The agent behavior, based on a dynamic activity-transition graph (ATG) model, is implemented with program code storing the control and the data state of an agent, which is novel. The program code can be modified by the agent itself using code morphing techniques and is capable of migrating in the network between nodes. The program code is a self-contained unit (a container) and embeds the agent data, the initialization instructions and the ATG behavior implementation. The microchip agent processing platform used for the execution of the agent code is a standalone multi-core stack machine with a zero-operand instruction format, leading to a small-sized agent program code, low system complexity and high system performance. The agent processing is token-queue-based, similar to Petri-nets. The agent platform can be implemented in software, too, offering compatibility at the operational and code level, supporting agent processing in strong heterogeneous networks. In this work, the agent platform embedded in a large-scale distributed sensor network is simulated at the architectural level by using agent-based simulation techniques.
First experience of vectorizing electromagnetic physics models for detector simulation
NASA Astrophysics Data System (ADS)
Amadio, G.; Apostolakis, J.; Bandieramonte, M.; Bianchini, C.; Bitzes, G.; Brun, R.; Canal, P.; Carminati, F.; de Fine Licht, J.; Duhem, L.; Elvira, D.; Gheata, A.; Jun, S. Y.; Lima, G.; Novak, M.; Presbyterian, M.; Shadura, O.; Seghal, R.; Wenzel, S.
2015-12-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project.
NASA Technical Reports Server (NTRS)
Tick, Evan
1987-01-01
This note describes an efficient software emulator for the Warren Abstract Machine (WAM) Prolog architecture. The version of the WAM implemented is called Lcode. The Lcode emulator, written in C, executes the 'naive reverse' benchmark at 3900 LIPS. The emulator is one of a set of tools used to measure the memory-referencing characteristics and performance of Prolog programs. These tools include a compiler, assembler, and memory simulators. An overview of the Lcode architecture is given here, followed by a description and listing of the emulator code implementing each Lcode instruction. This note will be of special interest to those studying the WAM and its performance characteristics. In general, this note will be of interest to those creating efficient software emulators for abstract machine architectures.
The Influence of Building Codes on Recreation Facility Design.
ERIC Educational Resources Information Center
Morrison, Thomas A.
1989-01-01
Implications of building codes upon design and construction of recreation facilities are investigated (national building codes, recreation facility standards, and misperceptions of design requirements). Recreation professionals can influence architectural designers to correct past deficiencies, but they must understand architectural and…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sharma, Vishal C.; Gopalakrishnan, Ganesh; Krishnamoorthy, Sriram
The systems resilience research community has developed methods to manually insert additional source-program level assertions to trap errors, and also devised tools to conduct fault injection studies for scalar program codes. In this work, we contribute the first vector oriented LLVM-level fault injector VULFI to help study the effects of faults in vector architectures that are of growing importance, especially for vectorizing loops. Using VULFI, we conduct a resiliency study of nine real-world vector benchmarks using Intel’s AVX and SSE extensions as the target vector instruction sets, and offer the first reported understanding of how faults affect vector instruction sets.more » We take this work further toward automating the insertion of resilience assertions during compilation. This is based on our observation that during intermediate (e.g., LLVM-level) code generation to handle full and partial vectorization, modern compilers exploit (and explicate in their code-documentation) critical invariants. These invariants are turned into error-checking code. We confirm the efficacy of these automatically inserted low-overhead error detectors for vectorized for-loops.« less
User-Defined Data Distributions in High-Level Programming Languages
NASA Technical Reports Server (NTRS)
Diaconescu, Roxana E.; Zima, Hans P.
2006-01-01
One of the characteristic features of today s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade.
Developing CORBA-Based Distributed Scientific Applications From Legacy Fortran Programs
NASA Technical Reports Server (NTRS)
Sang, Janche; Kim, Chan; Lopez, Isaac
2000-01-01
An efficient methodology is presented for integrating legacy applications written in Fortran into a distributed object framework. Issues and strategies regarding the conversion and decomposition of Fortran codes into Common Object Request Broker Architecture (CORBA) objects are discussed. Fortran codes are modified as little as possible as they are decomposed into modules and wrapped as objects. A new conversion tool takes the Fortran application as input and generates the C/C++ header file and Interface Definition Language (IDL) file. In addition, the performance of the client server computing is evaluated.
SMT-Aware Instantaneous Footprint Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roy, Probir; Liu, Xu; Song, Shuaiwen
Modern architectures employ simultaneous multithreading (SMT) to increase thread-level parallelism. SMT threads share many functional units and the whole memory hierarchy of a physical core. Without a careful code design, SMT threads can easily contend with each other for these shared resources, causing severe performance degradation. Minimizing SMT thread contention for HPC applications running on dedicated platforms is very challenging, because they usually spawn threads within Single Program Multiple Data (SPMD) models. To address this important issue, we introduce a simple scheme for SMT-aware code optimization, which aims to reduce the memory contention across SMT threads.
Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor
NASA Astrophysics Data System (ADS)
Hristov, Ivan; Goranov, Goran; Hristova, Radoslava
2018-02-01
We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.
1977-10-01
APPROVED DATE FUNCTION APPROVED jDATE WRITER J . K-olanek 2/6/76 REVISIONS CHK DESCRIPTION REV CHK DESCRIPTION IREV REVISION jJ ~ ~ ~~~ _ II SHEET NO...DOCUMENT (CDBDD) 45 5.5 COMPUTER PROGRAM PACKAGE (CPP)- j 45 5.6 COMPUTER PROGRAM OPERATOR’S MANUAL (CPOM) 45 5.7 COMPUTER PROGRAM TEST PLAN (CPTPL) 45...I LIST OF FIGURES Number Page 1 JEWS Simplified Block Diagram 4 2 System Controller Architecture 5 SIZE CODE IDENT NO DRAWING NO. A 49956 SCALE REV J
BGen: A UML Behavior Network Generator Tool
NASA Technical Reports Server (NTRS)
Huntsberger, Terry; Reder, Leonard J.; Balian, Harry
2010-01-01
BGen software was designed for autogeneration of code based on a graphical representation of a behavior network used for controlling automatic vehicles. A common format used for describing a behavior network, such as that used in the JPL-developed behavior-based control system, CARACaS ["Control Architecture for Robotic Agent Command and Sensing" (NPO-43635), NASA Tech Briefs, Vol. 32, No. 10 (October 2008), page 40] includes a graph with sensory inputs flowing through the behaviors in order to generate the signals for the actuators that drive and steer the vehicle. A computer program to translate Unified Modeling Language (UML) Freeform Implementation Diagrams into a legacy C implementation of Behavior Network has been developed in order to simplify the development of C-code for behavior-based control systems. UML is a popular standard developed by the Object Management Group (OMG) to model software architectures graphically. The C implementation of a Behavior Network is functioning as a decision tree.
Geospace simulations using modern accelerator processor technology
NASA Astrophysics Data System (ADS)
Germaschewski, K.; Raeder, J.; Larson, D. J.
2009-12-01
OpenGGCM (Open Geospace General Circulation Model) is a well-established numerical code simulating the Earth's space environment. The most computing intensive part is the MHD (magnetohydrodynamics) solver that models the plasma surrounding Earth and its interaction with Earth's magnetic field and the solar wind flowing in from the sun. Like other global magnetosphere codes, OpenGGCM's realism is currently limited by computational constraints on grid resolution. OpenGGCM has been ported to make use of the added computational powerof modern accelerator based processor architectures, in particular the Cell processor. The Cell architecture is a novel inhomogeneous multicore architecture capable of achieving up to 230 GFLops on a single chip. The University of New Hampshire recently acquired a PowerXCell 8i based computing cluster, and here we will report initial performance results of OpenGGCM. Realizing the high theoretical performance of the Cell processor is a programming challenge, though. We implemented the MHD solver using a multi-level parallelization approach: On the coarsest level, the problem is distributed to processors based upon the usual domain decomposition approach. Then, on each processor, the problem is divided into 3D columns, each of which is handled by the memory limited SPEs (synergistic processing elements) slice by slice. Finally, SIMD instructions are used to fully exploit the SIMD FPUs in each SPE. Memory management needs to be handled explicitly by the code, using DMA to move data from main memory to the per-SPE local store and vice versa. We use a modern technique, automatic code generation, which shields the application programmer from having to deal with all of the implementation details just described, keeping the code much more easily maintainable. Our preliminary results indicate excellent performance, a speed-up of a factor of 30 compared to the unoptimized version.
ATLAS, an integrated structural analysis and design system. Volume 2: System design document
NASA Technical Reports Server (NTRS)
Erickson, W. J. (Editor)
1979-01-01
ATLAS is a structural analysis and design system, operational on the Control Data Corporation 6600/CYBER computers. The overall system design, the design of the individual program modules, and the routines in the ATLAS system library are described. The overall design is discussed in terms of system architecture, executive function, data base structure, user program interfaces and operational procedures. The program module sections include detailed code description, common block usage and random access file usage. The description of the ATLAS program library includes all information needed to use these general purpose routines.
HACC: Simulating sky surveys on state-of-the-art supercomputing architectures
NASA Astrophysics Data System (ADS)
Habib, Salman; Pope, Adrian; Finkel, Hal; Frontiere, Nicholas; Heitmann, Katrin; Daniel, David; Fasel, Patricia; Morozov, Vitali; Zagaris, George; Peterka, Tom; Vishwanath, Venkatram; Lukić, Zarija; Sehrish, Saba; Liao, Wei-keng
2016-01-01
Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the 'Dark Universe', dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers that enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC's design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles.
HACC: Simulating sky surveys on state-of-the-art supercomputing architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Habib, Salman; Pope, Adrian; Finkel, Hal
2016-01-01
Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the ‘Dark Universe’, dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers thatmore » enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC’s design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles.« less
High-speed architecture for the decoding of trellis-coded modulation
NASA Technical Reports Server (NTRS)
Osborne, William P.
1992-01-01
Since 1971, when the Viterbi Algorithm was introduced as the optimal method of decoding convolutional codes, improvements in circuit technology, especially VLSI, have steadily increased its speed and practicality. Trellis-Coded Modulation (TCM) combines convolutional coding with higher level modulation (non-binary source alphabet) to provide forward error correction and spectral efficiency. For binary codes, the current stare-of-the-art is a 64-state Viterbi decoder on a single CMOS chip, operating at a data rate of 25 Mbps. Recently, there has been an interest in increasing the speed of the Viterbi Algorithm by improving the decoder architecture, or by reducing the algorithm itself. Designs employing new architectural techniques are now in existence, however these techniques are currently applied to simpler binary codes, not to TCM. The purpose of this report is to discuss TCM architectural considerations in general, and to present the design, at the logic gate level, or a specific TCM decoder which applies these considerations to achieve high-speed decoding.
The National Transport Code Collaboration Module Library
NASA Astrophysics Data System (ADS)
Kritz, A. H.; Bateman, G.; Kinsey, J.; Pankin, A.; Onjun, T.; Redd, A.; McCune, D.; Ludescher, C.; Pletzer, A.; Andre, R.; Zakharov, L.; Lodestro, L.; Pearlstein, L. D.; Jong, R.; Houlberg, W.; Strand, P.; Wiley, J.; Valanju, P.; John, H. St.; Waltz, R.; Mandrekas, J.; Mau, T. K.; Carlsson, J.; Braams, B.
2004-12-01
This paper reports on the progress in developing a library of code modules under the auspices of the National Transport Code Collaboration (NTCC). Code modules are high quality, fully documented software packages with a clearly defined interface. The modules provide a variety of functions, such as implementing numerical physics models; performing ancillary functions such as I/O or graphics; or providing tools for dealing with common issues in scientific programming such as portability of Fortran codes. Researchers in the plasma community submit code modules, and a review procedure is followed to insure adherence to programming and documentation standards. The review process is designed to provide added confidence with regard to the use of the modules and to allow users and independent reviews to validate the claims of the modules' authors. All modules include source code; clear instructions for compilation of binaries on a variety of target architectures; and test cases with well-documented input and output. All the NTCC modules and ancillary information, such as current standards and documentation, are available from the NTCC Module Library Website http://w3.pppl.gov/NTCC. The goal of the project is to develop a resource of value to builders of integrated modeling codes and to plasma physics researchers generally. Currently, there are more than 40 modules in the module library.
Imran, Noreen; Seet, Boon-Chong; Fong, A C M
2015-01-01
Distributed video coding (DVC) is a relatively new video coding architecture originated from two fundamental theorems namely, Slepian-Wolf and Wyner-Ziv. Recent research developments have made DVC attractive for applications in the emerging domain of wireless video sensor networks (WVSNs). This paper reviews the state-of-the-art DVC architectures with a focus on understanding their opportunities and gaps in addressing the operational requirements and application needs of WVSNs.
A Scalable Architecture of a Structured LDPC Decoder
NASA Technical Reports Server (NTRS)
Lee, Jason Kwok-San; Lee, Benjamin; Thorpe, Jeremy; Andrews, Kenneth; Dolinar, Sam; Hamkins, Jon
2004-01-01
We present a scalable decoding architecture for a certain class of structured LDPC codes. The codes are designed using a small (n,r) protograph that is replicated Z times to produce a decoding graph for a (Z x n, Z x r) code. Using this architecture, we have implemented a decoder for a (4096,2048) LDPC code on a Xilinx Virtex-II 2000 FPGA, and achieved decoding speeds of 31 Mbps with 10 fixed iterations. The implemented message-passing algorithm uses an optimized 3-bit non-uniform quantizer that operates with 0.2dB implementation loss relative to a floating point decoder.
Proof Compression and the Mobius PCC Architecture for Embedded Devices
NASA Technical Reports Server (NTRS)
Jensen, Thomas
2009-01-01
The EU Mobius project has been concerned with the security of Java applications, and of mobile devices such as smart phones that execute such applications. In this talk, I'll give a brief overview of the results obtained on on-device checking of various security-related program properties. I'll then describe in more detail how the concept of certified abstract interpretation and abstraction-carrying code can be applied to polyhedral-based analysis of Java byte code in order to verify properties pertaining to the usage of resources of a down-loaded application. Particular emphasis has been on finding ways of reducing the size of the certificates that accompany a piece of code.
NASA Astrophysics Data System (ADS)
Nitadori, Keigo; Makino, Junichiro; Hut, Piet
2006-12-01
The main performance bottleneck of gravitational N-body codes is the force calculation between two particles. We have succeeded in speeding up this pair-wise force calculation by factors between 2 and 10, depending on the code and the processor on which the code is run. These speed-ups were obtained by writing highly fine-tuned code for x86_64 microprocessors. Any existing N-body code, running on these chips, can easily incorporate our assembly code programs. In the current paper, we present an outline of our overall approach, which we illustrate with one specific example: the use of a Hermite scheme for a direct N2 type integration on a single 2.0 GHz Athlon 64 processor, for which we obtain an effective performance of 4.05 Gflops, for double-precision accuracy. In subsequent papers, we will discuss other variations, including the combinations of N log N codes, single-precision implementations, and performance on other microprocessors.
WOMBAT: A Scalable and High-performance Astrophysical Magnetohydrodynamics Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mendygral, P. J.; Radcliffe, N.; Kandalla, K.
2017-02-01
We present a new code for astrophysical magnetohydrodynamics specifically designed and optimized for high performance and scaling on modern and future supercomputers. We describe a novel hybrid OpenMP/MPI programming model that emerged from a collaboration between Cray, Inc. and the University of Minnesota. This design utilizes MPI-RMA optimized for thread scaling, which allows the code to run extremely efficiently at very high thread counts ideal for the latest generation of multi-core and many-core architectures. Such performance characteristics are needed in the era of “exascale” computing. We describe and demonstrate our high-performance design in detail with the intent that it maymore » be used as a model for other, future astrophysical codes intended for applications demanding exceptional performance.« less
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Owen, Jeffrey E.
1988-01-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
AltiVec performance increases for autonomous robotics for the MARSSCAPE architecture program
NASA Astrophysics Data System (ADS)
Gothard, Benny M.
2002-02-01
One of the main tall poles that must be overcome to develop a fully autonomous vehicle is the inability of the computer to understand its surrounding environment to a level that is required for the intended task. The military mission scenario requires a robot to interact in a complex, unstructured, dynamic environment. Reference A High Fidelity Multi-Sensor Scene Understanding System for Autonomous Navigation The Mobile Autonomous Robot Software Self Composing Adaptive Programming Environment (MarsScape) perception research addresses three aspects of the problem; sensor system design, processing architectures, and algorithm enhancements. A prototype perception system has been demonstrated on robotic High Mobility Multi-purpose Wheeled Vehicle and All Terrain Vehicle testbeds. This paper addresses the tall pole of processing requirements and the performance improvements based on the selected MarsScape Processing Architecture. The processor chosen is the Motorola Altivec-G4 Power PC(PPC) (1998 Motorola, Inc.), a highly parallized commercial Single Instruction Multiple Data processor. Both derived perception benchmarks and actual perception subsystems code will be benchmarked and compared against previous Demo II-Semi-autonomous Surrogate Vehicle processing architectures along with desktop Personal Computers(PC). Performance gains are highlighted with progress to date, and lessons learned and future directions are described.
2007-10-01
Architecture ................................................................................ 14 Figure 2. Eclipse Java Model...16 Figure 3. Eclipse Java Model at the Source Code Level...24 Figure 9. Java Source Code
Performance optimization of Qbox and WEST on Intel Knights Landing
NASA Astrophysics Data System (ADS)
Zheng, Huihuo; Knight, Christopher; Galli, Giulia; Govoni, Marco; Gygi, Francois
We present the optimization of electronic structure codes Qbox and WEST targeting the Intel®Xeon Phi™processor, codenamed Knights Landing (KNL). Qbox is an ab-initio molecular dynamics code based on plane wave density functional theory (DFT) and WEST is a post-DFT code for excited state calculations within many-body perturbation theory. Both Qbox and WEST employ highly scalable algorithms which enable accurate large-scale electronic structure calculations on leadership class supercomputer platforms beyond 100,000 cores, such as Mira and Theta at the Argonne Leadership Computing Facility. In this work, features of the KNL architecture (e.g. hierarchical memory) are explored to achieve higher performance in key algorithms of the Qbox and WEST codes and to develop a road-map for further development targeting next-generation computing architectures. In particular, the optimizations of the Qbox and WEST codes on the KNL platform will target efficient large-scale electronic structure calculations of nanostructured materials exhibiting complex structures and prediction of their electronic and thermal properties for use in solar and thermal energy conversion device. This work was supported by MICCoM, as part of Comp. Mats. Sci. Program funded by the U.S. DOE, Office of Sci., BES, MSE Division. This research used resources of the ALCF, which is a DOE Office of Sci. User Facility under Contract DE-AC02-06CH11357.
High-throughput sample adaptive offset hardware architecture for high-efficiency video coding
NASA Astrophysics Data System (ADS)
Zhou, Wei; Yan, Chang; Zhang, Jingzhi; Zhou, Xin
2018-03-01
A high-throughput hardware architecture for a sample adaptive offset (SAO) filter in the high-efficiency video coding video coding standard is presented. First, an implementation-friendly and simplified bitrate estimation method of rate-distortion cost calculation is proposed to reduce the computational complexity in the mode decision of SAO. Then, a high-throughput VLSI architecture for SAO is presented based on the proposed bitrate estimation method. Furthermore, multiparallel VLSI architecture for in-loop filters, which integrates both deblocking filter and SAO filter, is proposed. Six parallel strategies are applied in the proposed in-loop filters architecture to improve the system throughput and filtering speed. Experimental results show that the proposed in-loop filters architecture can achieve up to 48% higher throughput in comparison with prior work. The proposed architecture can reach a high-operating clock frequency of 297 MHz with TSMC 65-nm library and meet the real-time requirement of the in-loop filters for 8 K × 4 K video format at 132 fps.
GRADSPMHD: A parallel MHD code based on the SPH formalism
NASA Astrophysics Data System (ADS)
Vanaverbeke, S.; Keppens, R.; Poedts, S.
2014-03-01
We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a Sedov test including 15625 particles on a single CPU. Classification: 12. Nature of problem: Evolution of a plasma in the ideal MHD approximation. Solution method: The equations of magnetohydrodynamics are solved using the SPH method. Running time: The test provided takes approximately 20 min using 4 processors.
NASA Astrophysics Data System (ADS)
Bird, Robert; Nystrom, David; Albright, Brian
2017-10-01
The ability of scientific simulations to effectively deliver performant computation is increasingly being challenged by successive generations of high-performance computing architectures. Code development to support efficient computation on these modern architectures is both expensive, and highly complex; if it is approached without due care, it may also not be directly transferable between subsequent hardware generations. Previous works have discussed techniques to support the process of adapting a legacy code for modern hardware generations, but despite the breakthroughs in the areas of mini-app development, portable-performance, and cache oblivious algorithms the problem still remains largely unsolved. In this work we demonstrate how a focus on platform agnostic modern code-development can be applied to Particle-in-Cell (PIC) simulations to facilitate effective scientific delivery. This work builds directly on our previous work optimizing VPIC, in which we replaced intrinsic based vectorisation with compile generated auto-vectorization to improve the performance and portability of VPIC. In this work we present the use of a specialized SIMD queue for processing some particle operations, and also preview a GPU capable OpenMP variant of VPIC. Finally we include a lessons learnt. Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
Canbay, Ferhat; Levent, Vecdi Emre; Serbes, Gorkem; Ugurdag, H. Fatih; Goren, Sezer
2016-01-01
The authors aimed to develop an application for producing different architectures to implement dual tree complex wavelet transform (DTCWT) having near shift-invariance property. To obtain a low-cost and portable solution for implementing the DTCWT in multi-channel real-time applications, various embedded-system approaches are realised. For comparison, the DTCWT was implemented in C language on a personal computer and on a PIC microcontroller. However, in the former approach portability and in the latter desired speed performance properties cannot be achieved. Hence, implementation of the DTCWT on a reconfigurable platform such as field programmable gate array, which provides portable, low-cost, low-power, and high-performance computing, is considered as the most feasible solution. At first, they used the system generator DSP design tool of Xilinx for algorithm design. However, the design implemented by using such tools is not optimised in terms of area and power. To overcome all these drawbacks mentioned above, they implemented the DTCWT algorithm by using Verilog Hardware Description Language, which has its own difficulties. To overcome these difficulties, simplify the usage of proposed algorithms and the adaptation procedures, a code generator program that can produce different architectures is proposed. PMID:27733925
Canbay, Ferhat; Levent, Vecdi Emre; Serbes, Gorkem; Ugurdag, H Fatih; Goren, Sezer; Aydin, Nizamettin
2016-09-01
The authors aimed to develop an application for producing different architectures to implement dual tree complex wavelet transform (DTCWT) having near shift-invariance property. To obtain a low-cost and portable solution for implementing the DTCWT in multi-channel real-time applications, various embedded-system approaches are realised. For comparison, the DTCWT was implemented in C language on a personal computer and on a PIC microcontroller. However, in the former approach portability and in the latter desired speed performance properties cannot be achieved. Hence, implementation of the DTCWT on a reconfigurable platform such as field programmable gate array, which provides portable, low-cost, low-power, and high-performance computing, is considered as the most feasible solution. At first, they used the system generator DSP design tool of Xilinx for algorithm design. However, the design implemented by using such tools is not optimised in terms of area and power. To overcome all these drawbacks mentioned above, they implemented the DTCWT algorithm by using Verilog Hardware Description Language, which has its own difficulties. To overcome these difficulties, simplify the usage of proposed algorithms and the adaptation procedures, a code generator program that can produce different architectures is proposed.
User's and test case manual for FEMATS
NASA Technical Reports Server (NTRS)
Chatterjee, Arindam; Volakis, John; Nurnberger, Mike; Natzke, John
1995-01-01
The FEMATS program incorporates first-order edge-based finite elements and vector absorbing boundary conditions into the scattered field formulation for computation of the scattering from three-dimensional geometries. The code has been validated extensively for a large class of geometries containing inhomogeneities and satisfying transition conditions. For geometries that are too large for the workstation environment, the FEMATS code has been optimized to run on various supercomputers. Currently, FEMATS has been configured to run on the HP 9000 workstation, vectorized for the Cray Y-MP, and parallelized to run on the Kendall Square Research (KSR) architecture and the Intel Paragon.
User's guide for FRMOD, a zero dimensional FRM burn code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Driemeryer, D.; Miley, G.H.
1979-10-15
The zero-dimensional FRM plasma burn code, FRMOD is written in the FORTRAN language and is currently available on the Control Data Corporation (CDC) 7600 computer at the Magnetic Fusion Energy Computer Center (MFECC), sponsored by the US Department of Energy, in Livermore, CA. This guide assumes that the user is familiar with the system architecture and some of the utility programs available on the MFE-7600 machine, since online documentation is available for system routines through the use of the DOCUMENT utility. Users may therefore refer to it for answers to system related questions.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB
NASA Technical Reports Server (NTRS)
Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.
2017-01-01
Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Coding coarse grained polymer model for LAMMPS and its application to polymer crystallization
NASA Astrophysics Data System (ADS)
Luo, Chuanfu; Sommer, Jens-Uwe
2009-08-01
We present a patch code for LAMMPS to implement a coarse grained (CG) model of poly(vinyl alcohol) (PVA). LAMMPS is a powerful molecular dynamics (MD) simulator developed at Sandia National Laboratories. Our patch code implements tabulated angular potential and Lennard-Jones-9-6 (LJ96) style interaction for PVA. Benefited from the excellent parallel efficiency of LAMMPS, our patch code is suitable for large-scale simulations. This CG-PVA code is used to study polymer crystallization, which is a long-standing unsolved problem in polymer physics. By using parallel computing, cooling and heating processes for long chains are simulated. The results show that chain-folded structures resembling the lamellae of polymer crystals are formed during the cooling process. The evolution of the static structure factor during the crystallization transition indicates that long-range density order appears before local crystalline packing. This is consistent with some experimental observations by small/wide angle X-ray scattering (SAXS/WAXS). During the heating process, it is found that the crystalline regions are still growing until they are fully melted, which can be confirmed by the evolution both of the static structure factor and average stem length formed by the chains. This two-stage behavior indicates that melting of polymer crystals is far from thermodynamic equilibrium. Our results concur with various experiments. It is the first time that such growth/reorganization behavior is clearly observed by MD simulations. Our code can be easily used to model other type of polymers by providing a file containing the tabulated angle potential data and a set of appropriate parameters. Program summaryProgram title: lammps-cgpva Catalogue identifier: AEDE_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDE_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU's GPL No. of lines in distributed program, including test data, etc.: 940 798 No. of bytes in distributed program, including test data, etc.: 12 536 245 Distribution format: tar.gz Programming language: C++/MPI Computer: Tested on Intel-x86 and AMD64 architectures. Should run on any architecture providing a C++ compiler Operating system: Tested under Linux. Any other OS with C++ compiler and MPI library should suffice Has the code been vectorized or parallelized?: Yes RAM: Depends on system size and how many CPUs are used Classification: 7.7 External routines: LAMMPS ( http://lammps.sandia.gov/), FFTW ( http://www.fftw.org/) Nature of problem: Implementing special tabular angle potentials and Lennard-Jones-9-6 style interactions of a coarse grained polymer model for LAMMPS code. Solution method: Cubic spline interpolation of input tabulated angle potential data. Restrictions: The code is based on a former version of LAMMPS. Unusual features.: Any special angular potential can be used if it can be tabulated. Running time: Seconds to weeks, depending on system size, speed of CPU and how many CPUs are used. The test run provided with the package takes about 5 minutes on 4 AMD's opteron (2.6 GHz) CPUs. References:D. Reith, H. Meyer, F. Müller-Plathe, Macromolecules 34 (2001) 2335-2345. H. Meyer, F. Müller-Plathe, J. Chem. Phys. 115 (2001) 7807. H. Meyer, F. Müller-Plathe, Macromolecules 35 (2002) 1241-1252.
Microgravity Materials Research and Code U ISRU
NASA Technical Reports Server (NTRS)
Curreri, Peter A.; Sibille, Laurent
2004-01-01
The NASA microgravity research program, simply put, has the goal of doing science (which is essentially finding out something previously unknown about nature) utilizing the unique long-term microgravity environment in Earth orbit. Since 1997 Code U has in addition funded scientific basic research that enables safe and economical capabilities to enable humans to live, work and do science beyond Earth orbit. This research has been integrated with the larger NASA missions (Code M and S). These new exploration research focus areas include Radiation Shielding Materials, Macromolecular Research on Bone and Muscle Loss, In Space Fabrication and Repair, and Low Gravity ISRU. The latter two focus on enabling materials processing in space for use in space. The goal of this program is to provide scientific and technical research resulting in proof-of-concept experiments feeding into the larger NASA program to provide humans in space with an energy rich, resource rich, self sustaining infrastructure at the earliest possible time and with minimum risk, launch mass and program cost. President Bush's Exploration Vision (1/14/04) gives a new urgency for the development of ISRU concepts into the exploration architecture. This will require an accelerated One NASA approach utilizing NASA's partners in academia, and industry.
Compiling global name-space programs for distributed execution
NASA Technical Reports Server (NTRS)
Koelbel, Charles; Mehrotra, Piyush
1990-01-01
Distributed memory machines do not provide hardware support for a global address space. Thus programmers are forced to partition the data across the memories of the architecture and use explicit message passing to communicate data between processors. The compiler support required to allow programmers to express their algorithms using a global name-space is examined. A general method is presented for analysis of a high level source program and along with its translation to a set of independently executing tasks communicating via messages. If the compiler has enough information, this translation can be carried out at compile-time. Otherwise run-time code is generated to implement the required data movement. The analysis required in both situations is described and the performance of the generated code on the Intel iPSC/2 is presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong, Tianzhen; Buhl, Fred; Haves, Philip
2008-09-20
EnergyPlus is a new generation building performance simulation program offering many new modeling capabilities and more accurate performance calculations integrating building components in sub-hourly time steps. However, EnergyPlus runs much slower than the current generation simulation programs. This has become a major barrier to its widespread adoption by the industry. This paper analyzed EnergyPlus run time from comprehensive perspectives to identify key issues and challenges of speeding up EnergyPlus: studying the historical trends of EnergyPlus run time based on the advancement of computers and code improvements to EnergyPlus, comparing EnergyPlus with DOE-2 to understand and quantify the run time differences,more » identifying key simulation settings and model features that have significant impacts on run time, and performing code profiling to identify which EnergyPlus subroutines consume the most amount of run time. This paper provides recommendations to improve EnergyPlus run time from the modeler?s perspective and adequate computing platforms. Suggestions of software code and architecture changes to improve EnergyPlus run time based on the code profiling results are also discussed.« less
Parallel Subspace Subcodes of Reed-Solomon Codes for Magnetic Recording Channels
ERIC Educational Resources Information Center
Wang, Han
2010-01-01
Read channel architectures based on a single low-density parity-check (LDPC) code are being considered for the next generation of hard disk drives. However, LDPC-only solutions suffer from the error floor problem, which may compromise reliability, if not handled properly. Concatenated architectures using an LDPC code plus a Reed-Solomon (RS) code…
The language parallel Pascal and other aspects of the massively parallel processor
NASA Technical Reports Server (NTRS)
Reeves, A. P.; Bruner, J. D.
1982-01-01
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Invocation oriented architecture for agile code and agile data
NASA Astrophysics Data System (ADS)
Verma, Dinesh; Chan, Kevin; Leung, Kin; Gkelias, Athanasios
2017-05-01
In order to address the unique requirements of sensor information fusion in a tactical coalition environment, we are proposing a new architecture - one based on the concept of invocations. An invocation is a combination of a software code and a piece of data, both managed using techniques from Information Centric networking. This paper will discuss limitations of current approaches, present the architecture for an invocation oriented architecture, illustrate how it works with an example scenario, and provide reasons for its suitability in a coalition environment.
NASA Astrophysics Data System (ADS)
Guilfoyle, Peter S.; Stone, Richard V.
1991-12-01
OptiComp is currently completing a 32-bit, fully programmable digital optical computer (DOC II) that is designed to operate in a UNIX environment running RISC microcode. OptiComp's DOC II architecture is focused toward parallel microcode implementation where data is input in a dual rail format. By exploiting the physical principals inherent to optics (speed and low power consumption), an architectural balance of optical interconnects and software code efficiency can be achieved including high fan-in and fan-out. OptiComp's DOC II program is jointly sponsored by the Office of Naval Research (ONR), the Strategic Defense Initiative Office (SDIO), NASA space station group and Rome Laboratory (USAF). This paper not only describes the motivational basis behind DOC II but also provides an optical overview and architectural summary of the device that allows the emulation of any digital instruction set.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harms, Kevin; Oral, H. Sarp; Atchley, Scott
The Oak Ridge and Argonne Leadership Computing Facilities are both receiving new systems under the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) program. Because they are both part of the INCITE program, applications need to be portable between these two facilities. However, the Summit and Aurora systems will be vastly different architectures, including their I/O subsystems. While both systems will have POSIX-compliant parallel file systems, their Burst Buffer technologies will be different. This difference may pose challenges to application portability between facilities. Application developers need to pay attention to specific burst buffer implementations to maximize code portability.
A language comparison for scientific computing on MIMD architectures
NASA Technical Reports Server (NTRS)
Jones, Mark T.; Patrick, Merrell L.; Voigt, Robert G.
1989-01-01
Choleski's method for solving banded symmetric, positive definite systems is implemented on a multiprocessor computer using three FORTRAN based parallel programming languages, the Force, PISCES and Concurrent FORTRAN. The capabilities of the language for expressing parallelism and their user friendliness are discussed, including readability of the code, debugging assistance offered, and expressiveness of the languages. The performance of the different implementations is compared. It is argued that PISCES, using the Force for medium-grained parallelism, is the appropriate choice for programming Choleski's method on the multiprocessor computer, Flex/32.
A personal computer-based, multitasking data acquisition system
NASA Technical Reports Server (NTRS)
Bailey, Steven A.
1990-01-01
A multitasking, data acquisition system was written to simultaneously collect meteorological radar and telemetry data from two sources. This system is based on the personal computer architecture. Data is collected via two asynchronous serial ports and is deposited to disk. The system is written in both the C programming language and assembler. It consists of three parts: a multitasking kernel for data collection, a shell with pull down windows as user interface, and a graphics processor for editing data and creating coded messages. An explanation of both system principles and program structure is presented.
NASTRAN as a resource in code development
NASA Technical Reports Server (NTRS)
Stanton, E. L.; Crain, L. M.; Neu, T. F.
1975-01-01
A case history is presented in which the NASTRAN system provided both guidelines and working software for use in the development of a discrete element program, PATCHES-111. To avoid duplication and to take advantage of the wide spread user familiarity with NASTRAN, the PATCHES-111 system uses NASTRAN bulk data syntax, NASTRAN matrix utilities, and the NASTRAN linkage editor. Problems in developing the program are discussed along with details on the architecture of the PATCHES-111 parametric cubic modeling system. The system includes model construction procedures, checkpoint/restart strategies, and other features.
NASA's mobile satellite development program
NASA Technical Reports Server (NTRS)
Rafferty, William; Dessouky, Khaled; Sue, Miles
1988-01-01
A Mobile Satellite System (MSS) will provide data and voice communications over a vast geographical area to a large population of mobile users. A technical overview is given of the extensive research and development studies and development performed under NASA's mobile satellite program (MSAT-X) in support of the introduction of a U.S. MSS. The critical technologies necessary to enable such a system are emphasized: vehicle antennas, modulation and coding, speech coders, networking and propagation characterization. Also proposed is a first, and future generation MSS architecture based upon realized ground segment equipment and advanced space segment studies.
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness
NASA Astrophysics Data System (ADS)
Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.
2014-09-01
Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
Performance study of a data flow architecture
NASA Technical Reports Server (NTRS)
Adams, George
1985-01-01
Teams of scientists studied data flow concepts, static data flow machine architecture, and the VAL language. Each team mapped its application onto the machine and coded it in VAL. The principal findings of the study were: (1) Five of the seven applications used the full power of the target machine. The galactic simulation and multigrid fluid flow teams found that a significantly smaller version of the machine (16 processing elements) would suffice. (2) A number of machine design parameters including processing element (PE) function unit numbers, array memory size and bandwidth, and routing network capability were found to be crucial for optimal machine performance. (3) The study participants readily acquired VAL programming skills. (4) Participants learned that application-based performance evaluation is a sound method of evaluating new computer architectures, even those that are not fully specified. During the course of the study, participants developed models for using computers to solve numerical problems and for evaluating new architectures. These models form the bases for future evaluation studies.
Activity-Centric Approach to Distributed Programming
NASA Technical Reports Server (NTRS)
Levy, Renato; Satapathy, Goutam; Lang, Jun
2004-01-01
The first phase of an effort to develop a NASA version of the Cybele software system has been completed. To give meaning to even a highly abbreviated summary of the modifications to be embodied in the NASA version, it is necessary to present the following background information on Cybele: Cybele is a proprietary software infrastructure for use by programmers in developing agent-based application programs [complex application programs that contain autonomous, interacting components (agents)]. Cybele provides support for event handling from multiple sources, multithreading, concurrency control, migration, and load balancing. A Cybele agent follows a programming paradigm, called activity-centric programming, that enables an abstraction over system-level thread mechanisms. Activity centric programming relieves application programmers of the complex tasks of thread management, concurrency control, and event management. In order to provide such functionality, activity-centric programming demands support of other layers of software. This concludes the background information. In the first phase of the present development, a new architecture for Cybele was defined. In this architecture, Cybele follows a modular service-based approach to coupling of the programming and service layers of software architecture. In a service-based approach, the functionalities supported by activity-centric programming are apportioned, according to their characteristics, among several groups called services. A well-defined interface among all such services serves as a path that facilitates the maintenance and enhancement of such services without adverse effect on the whole software framework. The activity-centric application-program interface (API) is part of a kernel. The kernel API calls the services by use of their published interface. This approach makes it possible for any application code written exclusively under the API to be portable for any configuration of Cybele.
Do plant cell walls have a code?
Tavares, Eveline Q P; Buckeridge, Marcos S
2015-12-01
A code is a set of rules that establish correspondence between two worlds, signs (consisting of encrypted information) and meaning (of the decrypted message). A third element, the adaptor, connects both worlds, assigning meaning to a code. We propose that a Glycomic Code exists in plant cell walls where signs are represented by monosaccharides and phenylpropanoids and meaning is cell wall architecture with its highly complex association of polymers. Cell wall biosynthetic mechanisms, structure, architecture and properties are addressed according to Code Biology perspective, focusing on how they oppose to cell wall deconstruction. Cell wall hydrolysis is mainly focused as a mechanism of decryption of the Glycomic Code. Evidence for encoded information in cell wall polymers fine structure is highlighted and the implications of the existence of the Glycomic Code are discussed. Aspects related to fine structure are responsible for polysaccharide packing and polymer-polymer interactions, affecting the final cell wall architecture. The question whether polymers assembly within a wall display similar properties as other biological macromolecules (i.e. proteins, DNA, histones) is addressed, i.e. do they display a code? Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
A distributed version of the NASA Engine Performance Program
NASA Technical Reports Server (NTRS)
Cours, Jeffrey T.; Curlett, Brian P.
1993-01-01
Distributed NEPP, a version of the NASA Engine Performance Program, uses the original NEPP code but executes it in a distributed computer environment. Multiple workstations connected by a network increase the program's speed and, more importantly, the complexity of the cases it can handle in a reasonable time. Distributed NEPP uses the public domain software package, called Parallel Virtual Machine, allowing it to execute on clusters of machines containing many different architectures. It includes the capability to link with other computers, allowing them to process NEPP jobs in parallel. This paper discusses the design issues and granularity considerations that entered into programming Distributed NEPP and presents the results of timing runs.
NASA Astrophysics Data System (ADS)
Brandelik, Andreas
2009-07-01
CALCMIN, an open source Visual Basic program, was implemented in EXCEL™. The program was primarily developed to support geoscientists in their routine task of calculating structural formulae of minerals on the basis of chemical analysis mainly obtained by electron microprobe (EMP) techniques. Calculation programs for various minerals are already included in the form of sub-routines. These routines are arranged in separate modules containing a minimum of code. The architecture of CALCMIN allows the user to easily develop new calculation routines or modify existing routines with little knowledge of programming techniques. By means of a simple mouse-click, the program automatically generates a rudimentary framework of code using the object model of the Visual Basic Editor (VBE). Within this framework simple commands and functions, which are provided by the program, can be used, for example, to perform various normalization procedures or to output the results of the computations. For the clarity of the code, element symbols are used as variables initialized by the program automatically. CALCMIN does not set any boundaries in complexity of the code used, resulting in a wide range of possible applications. Thus, matrix and optimization methods can be included, for instance, to determine end member contents for subsequent thermodynamic calculations. Diverse input procedures are provided, such as the automated read-in of output files created by the EMP. Furthermore, a subsequent filter routine enables the user to extract specific analyses in order to use them for a corresponding calculation routine. An event-driven, interactive operating mode was selected for easy application of the program. CALCMIN leads the user from the beginning to the end of the calculation process.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heroux, Michael; Lethin, Richard
Programming models and environments play the essential roles in high performance computing of enabling the conception, design, implementation and execution of science and engineering application codes. Programmer productivity is strongly influenced by the effectiveness of our programming models and environments, as is software sustainability since our codes have lifespans measured in decades, so the advent of new computing architectures, increased concurrency, concerns for resilience, and the increasing demands for high-fidelity, multi-physics, multi-scale and data-intensive computations mean that we have new challenges to address as part of our fundamental R&D requirements. Fortunately, we also have new tools and environments that makemore » design, prototyping and delivery of new programming models easier than ever. The combination of new and challenging requirements and new, powerful toolsets enables significant synergies for the next generation of programming models and environments R&D. This report presents the topics discussed and results from the 2014 DOE Office of Science Advanced Scientific Computing Research (ASCR) Programming Models & Environments Summit, and subsequent discussions among the summit participants and contributors to topics in this report.« less
Cross-Platform Development Techniques for Mobile Devices
2013-09-01
many other platforms including Windows, Blackberry , and Symbian. Each of these platforms has their own distinct architecture and programming language...sales of iPhones and the increasing use of Android-based devices have forced less successful competitors such as Microsoft, Blackberry , and Symbian... Blackberry and Windows Phone are planned [12] in this tool’s attempt to reuse code with a unified JavaScript API while at the same time supporting unique
A Data Warehouse to Support Condition Based Maintenance (CBM)
2005-05-01
Application ( VBA ) code sequence to import the original MAST-generated CSV and then create a single output table in DBASE IV format. The DBASE IV format...database architecture (Oracle, Sybase, MS- SQL , etc). This design includes table definitions, comments, specification of table attributes, primary and foreign...built queries and applications. Needs the application developers to construct data views. No SQL programming experience. b. Power Database User - knows
Architectural design of an Algol interpreter
NASA Technical Reports Server (NTRS)
Jackson, C. K.
1971-01-01
The design of a syntax-directed interpreter for a subset of Algol is described. It is a conceptual design with sufficient details and completeness but as much independence of implementation as possible. The design includes a detailed description of a scanner, an analyzer described in the Floyd-Evans productions, a hash-coded symbol table, and an executor. Interpretation of sample programs is also provided to show how the interpreter functions.
Geospace simulations on the Cell BE processor
NASA Astrophysics Data System (ADS)
Germaschewski, K.; Raeder, J.; Larson, D.
2008-12-01
OpenGGCM (Open Geospace General circulation Model) is an established numerical code that simulates the Earth's space environment. The most computing intensive part is the MHD (magnetohydrodynamics) solver that models the plasma surrounding Earth and its interaction with Earth's magnetic field and the solar wind flowing in from the sun. Like other global magnetosphere codes, OpenGGCM's realism is limited by computational constraints on grid resolution. We investigate porting of the MHD solver to the Cell BE architecture, a novel inhomogeneous multicore architecture capable of up to 230 GFlops per processor. Realizing this high performance on the Cell processor is a programming challenge, though. We implemented the MHD solver using a multi-level parallel approach: On the coarsest level, the problem is distributed to processors based upon the usual domain decomposition approach. Then, on each processor, the problem is divided into 3D columns, each of which is handled by the memory limited SPEs (synergistic processing elements) slice by slice. Finally, SIMD instructions are used to fully exploit the vector/SIMD FPUs in each SPE. Memory management needs to be handled explicitly by the code, using DMA to move data from main memory to the per-SPE local store and vice versa. We obtained excellent performance numbers, a speed-up of a factor of 25 compared to just using the main processor, while still keeping the numerical implementation details of the code maintainable.
Particle In Cell Codes on Highly Parallel Architectures
NASA Astrophysics Data System (ADS)
Tableman, Adam
2014-10-01
We describe strategies and examples of Particle-In-Cell Codes running on Nvidia GPU and Intel Phi architectures. This includes basic implementations in skeletons codes and full-scale development versions (encompassing 1D, 2D, and 3D codes) in Osiris. Both the similarities and differences between Intel's and Nvidia's hardware will be examined. Work supported by grants NSF ACI 1339893, DOE DE SC 000849, DOE DE SC 0008316, DOE DE NA 0001833, and DOE DE FC02 04ER 54780.
By Hand or Not By-Hand: A Case Study of Alternative Approaches to Parallelize CFD Applications
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Bailey, David (Technical Monitor)
1997-01-01
While parallel processing promises to speed up applications by several orders of magnitude, the performance achieved still depends upon several factors, including the multiprocessor architecture, system software, data distribution and alignment, as well as the methods used for partitioning the application and mapping its components onto the architecture. The existence of the Gorden Bell Prize given out at Supercomputing every year suggests that while good performance can be attained for real applications on general purpose multiprocessors, the large investment in man-power and time still has to be repeated for each application-machine combination. As applications and machine architectures become more complex, the cost and time-delays for obtaining performance by hand will become prohibitive. Computer users today can turn to three possible avenues for help: parallel libraries, parallel languages and compilers, interactive parallelization tools. The success of these methodologies, in turn, depends on proper application of data dependency analysis, program structure recognition and transformation, performance prediction as well as exploitation of user supplied knowledge. NASA has been developing multidisciplinary applications on highly parallel architectures under the High Performance Computing and Communications Program. Over the past six years, the transition of underlying hardware and system software have forced the scientists to spend a large effort to migrate and recede their applications. Various attempts to exploit software tools to automate the parallelization process have not produced favorable results. In this paper, we report our most recent experience with CAPTOOL, a package developed at Greenwich University. We have chosen CAPTOOL for three reasons: 1. CAPTOOL accepts a FORTRAN 77 program as input. This suggests its potential applicability to a large collection of legacy codes currently in use. 2. CAPTOOL employs domain decomposition to obtain parallelism. Although the fact that not all kinds of parallelism are handled may seem unappealing, many NASA applications in computational aerosciences as well as earth and space sciences are amenable to domain decomposition. 3. CAPTOOL generates code for a large variety of environments employed across NASA centers: MPI/PVM on network of workstations to the IBS/SP2 and CRAY/T3D.
Production Level CFD Code Acceleration for Hybrid Many-Core Architectures
NASA Technical Reports Server (NTRS)
Duffy, Austen C.; Hammond, Dana P.; Nielsen, Eric J.
2012-01-01
In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time.
A Specification for a Godunov-type Eulerian 2-D Hydrocode, Revision 0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nystrom, William D; Robey, Jonathan M
2012-05-01
The purpose of this code specification is to describe an algorithm for solving the Euler equations of hydrodynamics in a 2D rectangular region in sufficient detail to allow a software developer to produce an implementation on their target platform using their programming language of choice without requiring detailed knowledge and experience in the field of computational fluid dynamics. It should be possible for a software developer who is proficient in the programming language of choice and is knowledgable of the target hardware to produce an efficient implementation of this specification if they also possess a thorough working knowledge of parallelmore » programming and have some experience in scientific programming using fields and meshes. On modern architectures, it will be important to focus on issues related to the exploitation of the fine grain parallelism and data locality present in this algorithm. This specification aims to make that task easier by presenting the essential details of the algorithm in a systematic and language neutral manner while also avoiding the inclusion of implementation details that would likely be specific to a particular type of programming paradigm or platform architecture.« less
Parallel computing for probabilistic fatigue analysis
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.
1993-01-01
This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Development of an object-oriented ORIGEN for advanced nuclear fuel modeling applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Skutnik, S.; Havloej, F.; Lago, D.
2013-07-01
The ORIGEN package serves as the core depletion and decay calculation module within the SCALE code system. A recent major re-factor to the ORIGEN code architecture as part of an overall modernization of the SCALE code system has both greatly enhanced its maintainability as well as afforded several new capabilities useful for incorporating depletion analysis into other code frameworks. This paper will present an overview of the improved ORIGEN code architecture (including the methods and data structures introduced) as well as current and potential future applications utilizing the new ORIGEN framework. (authors)
A Low-Complexity Euclidean Orthogonal LDPC Architecture for Low Power Applications.
Revathy, M; Saravanan, R
2015-01-01
Low-density parity-check (LDPC) codes have been implemented in latest digital video broadcasting, broadband wireless access (WiMax), and fourth generation of wireless standards. In this paper, we have proposed a high efficient low-density parity-check code (LDPC) decoder architecture for low power applications. This study also considers the design and analysis of check node and variable node units and Euclidean orthogonal generator in LDPC decoder architecture. The Euclidean orthogonal generator is used to reduce the error rate of the proposed LDPC architecture, which can be incorporated between check and variable node architecture. This proposed decoder design is synthesized on Xilinx 9.2i platform and simulated using Modelsim, which is targeted to 45 nm devices. Synthesis report proves that the proposed architecture greatly reduces the power consumption and hardware utilizations on comparing with different conventional architectures.
Instruction-level performance modeling and characterization of multimedia applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Luo, Y.; Cameron, K.W.
1999-06-01
One of the challenges for characterizing and modeling realistic multimedia applications is the lack of access to source codes. On-chip performance counters effectively resolve this problem by monitoring run-time behaviors at the instruction-level. This paper presents a novel technique of characterizing and modeling workloads at the instruction level for realistic multimedia applications using hardware performance counters. A variety of instruction counts are collected from some multimedia applications, such as RealPlayer, GSM Vocoder, MPEG encoder/decoder, and speech synthesizer. These instruction counts can be used to form a set of abstract characteristic parameters directly related to a processor`s architectural features. Based onmore » microprocessor architectural constraints and these calculated abstract parameters, the architectural performance bottleneck for a specific application can be estimated. Meanwhile, the bottleneck estimation can provide suggestions about viable architectural/functional improvement for certain workloads. The biggest advantage of this new characterization technique is a better understanding of processor utilization efficiency and architectural bottleneck for each application. This technique also provides predictive insight of future architectural enhancements and their affect on current codes. In this paper the authors also attempt to model architectural effect on processor utilization without memory influence. They derive formulas for calculating CPI{sub 0}, CPI without memory effect, and they quantify utilization of architectural parameters. These equations are architecturally diagnostic and predictive in nature. Results provide promise in code characterization, and empirical/analytical modeling.« less
Proceedings of the second SISAL users` conference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feo, J T; Frerking, C; Miller, P J
1992-12-01
This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less
Debugging and Logging Services for Defence Service Oriented Architectures
2012-02-01
Service A software component and callable end point that provides a logically related set of operations, each of which perform a logical step in a...important to note that in some cases when the fault is identified to lie in uneditable code such as program libraries, or outsourced software services ...debugging is limited to characterisation of the fault, reporting it to the software or service provider and development of work-arounds and management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brun, B.
1997-07-01
Computer technology has improved tremendously during the last years with larger media capacity, memory and more computational power. Visual computing with high-performance graphic interface and desktop computational power have changed the way engineers accomplish everyday tasks, development and safety studies analysis. The emergence of parallel computing will permit simulation over a larger domain. In addition, new development methods, languages and tools have appeared in the last several years.
Message Passing vs. Shared Address Space on a Cluster of SMPs
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak
2000-01-01
The convergence of scalable computer architectures using clusters of PCs (or PC-SMPs) with commodity networking has become an attractive platform for high end scientific computing. Currently, message-passing and shared address space (SAS) are the two leading programming paradigms for these systems. Message-passing has been standardized with MPI, and is the most common and mature programming approach. However message-passing code development can be extremely difficult, especially for irregular structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality, and high protocol overhead. In this paper, we compare the performance of and programming effort, required for six applications under both programming models on a 32 CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit high efficiency under shared memory programming. due to their high communication to computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications: however, on certain classes of problems SAS performance is competitive with MPI. We also present new algorithms for improving the PC cluster performance of MPI collective operations.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-01-23
... Engineering, Architectural Services, Design Policies and Construction Standards AGENCY: Rural Utilities..., engineering services and architectural services for transactions above the established threshold dollar levels... Code of Federal Regulations as follows: PART 1724--ELECTRIC ENGINEERING, ARCHITECTURAL SERVICES AND...
Smart photonic networks and computer security for image data
NASA Astrophysics Data System (ADS)
Campello, Jorge; Gill, John T.; Morf, Martin; Flynn, Michael J.
1998-02-01
Work reported here is part of a larger project on 'Smart Photonic Networks and Computer Security for Image Data', studying the interactions of coding and security, switching architecture simulations, and basic technologies. Coding and security: coding methods that are appropriate for data security in data fusion networks were investigated. These networks have several characteristics that distinguish them form other currently employed networks, such as Ethernet LANs or the Internet. The most significant characteristics are very high maximum data rates; predominance of image data; narrowcasting - transmission of data form one source to a designated set of receivers; data fusion - combining related data from several sources; simple sensor nodes with limited buffering. These characteristics affect both the lower level network design and the higher level coding methods.Data security encompasses privacy, integrity, reliability, and availability. Privacy, integrity, and reliability can be provided through encryption and coding for error detection and correction. Availability is primarily a network issue; network nodes must be protected against failure or routed around in the case of failure. One of the more promising techniques is the use of 'secret sharing'. We consider this method as a special case of our new space-time code diversity based algorithms for secure communication. These algorithms enable us to exploit parallelism and scalable multiplexing schemes to build photonic network architectures. A number of very high-speed switching and routing architectures and their relationships with very high performance processor architectures were studied. Indications are that routers for very high speed photonic networks can be designed using the very robust and distributed TCP/IP protocol, if suitable processor architecture support is available.
A Low-Complexity Euclidean Orthogonal LDPC Architecture for Low Power Applications
Revathy, M.; Saravanan, R.
2015-01-01
Low-density parity-check (LDPC) codes have been implemented in latest digital video broadcasting, broadband wireless access (WiMax), and fourth generation of wireless standards. In this paper, we have proposed a high efficient low-density parity-check code (LDPC) decoder architecture for low power applications. This study also considers the design and analysis of check node and variable node units and Euclidean orthogonal generator in LDPC decoder architecture. The Euclidean orthogonal generator is used to reduce the error rate of the proposed LDPC architecture, which can be incorporated between check and variable node architecture. This proposed decoder design is synthesized on Xilinx 9.2i platform and simulated using Modelsim, which is targeted to 45 nm devices. Synthesis report proves that the proposed architecture greatly reduces the power consumption and hardware utilizations on comparing with different conventional architectures. PMID:26065017
NASA Technical Reports Server (NTRS)
Walsh, J. L.; Weston, R. P.; Samareh, J. A.; Mason, B. H.; Green, L. L.; Biedron, R. T.
2000-01-01
An objective of the High Performance Computing and Communication Program at the NASA Langley Research Center is to demonstrate multidisciplinary shape and sizing optimization of a complete aerospace vehicle configuration by using high-fidelity finite-element structural analysis and computational fluid dynamics aerodynamic analysis in a distributed, heterogeneous computing environment that includes high performance parallel computing. A software system has been designed and implemented to integrate a set of existing discipline analysis codes, some of them computationally intensive, into a distributed computational environment for the design of a high-speed civil transport configuration. The paper describes both the preliminary results from implementing and validating the multidisciplinary analysis and the results from an aerodynamic optimization. The discipline codes are integrated by using the Java programming language and a Common Object Request Broker Architecture compliant software product. A companion paper describes the formulation of the multidisciplinary analysis and optimization system.
Discrete sensitivity derivatives of the Navier-Stokes equations with a parallel Krylov solver
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Taylor, Arthur C., III
1994-01-01
This paper solves an 'incremental' form of the sensitivity equations derived by differentiating the discretized thin-layer Navier Stokes equations with respect to certain design variables of interest. The equations are solved with a parallel, preconditioned Generalized Minimal RESidual (GMRES) solver on a distributed-memory architecture. The 'serial' sensitivity analysis code is parallelized by using the Single Program Multiple Data (SPMD) programming model, domain decomposition techniques, and message-passing tools. Sensitivity derivatives are computed for low and high Reynolds number flows over a NACA 1406 airfoil on a 32-processor Intel Hypercube, and found to be identical to those computed on a single-processor Cray Y-MP. It is estimated that the parallel sensitivity analysis code has to be run on 40-50 processors of the Intel Hypercube in order to match the single-processor processing time of a Cray Y-MP.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singleton, Jr., Robert; Israel, Daniel M.; Doebling, Scott William
For code verification, one compares the code output against known exact solutions. There are many standard test problems used in this capacity, such as the Noh and Sedov problems. ExactPack is a utility that integrates many of these exact solution codes into a common API (application program interface), and can be used as a stand-alone code or as a python package. ExactPack consists of python driver scripts that access a library of exact solutions written in Fortran or Python. The spatial profiles of the relevant physical quantities, such as the density, fluid velocity, sound speed, or internal energy, are returnedmore » at a time specified by the user. The solution profiles can be viewed and examined by a command line interface or a graphical user interface, and a number of analysis tools and unit tests are also provided. We have documented the physics of each problem in the solution library, and provided complete documentation on how to extend the library to include additional exact solutions. ExactPack’s code architecture makes it easy to extend the solution-code library to include additional exact solutions in a robust, reliable, and maintainable manner.« less
NASA Technical Reports Server (NTRS)
Cross, James H., II
1990-01-01
The study, formulation, and generation of structures for Ada (GRASP/Ada) are discussed in this second phase report of a three phase effort. Various graphical representations that can be extracted or generated from source code are described and categorized with focus on reverse engineering. The overall goal is to provide the foundation for a CASE (computer-aided software design) environment in which reverse engineering and forward engineering (development) are tightly coupled. Emphasis is on a subset of architectural diagrams that can be generated automatically from source code with the control structure diagram (CSD) included for completeness.
An iconic programming language for sensor-based robots
NASA Technical Reports Server (NTRS)
Gertz, Matthew; Stewart, David B.; Khosla, Pradeep K.
1993-01-01
In this paper we describe an iconic programming language called Onika for sensor-based robotic systems. Onika is both modular and reconfigurable and can be used with any system architecture and real-time operating system. Onika is also a multi-level programming environment wherein tasks are built by connecting a series of icons which, in turn, can be defined in terms of other icons at the lower levels. Expert users are also allowed to use control block form to define servo tasks. The icons in Onika are both shape and color coded, like the pieces of a jigsaw puzzle, thus providing a form of error control in the development of high level applications.
Architecture-driven reuse of code in KASE
NASA Technical Reports Server (NTRS)
Bhansali, Sanjay
1993-01-01
In order to support the synthesis of large, complex software systems, we need to focus on issues pertaining to the architectural design of a system in addition to algorithm and data structure design. An approach that is based on abstracting the architectural design of a set of problems in the form of a generic architecture, and providing tools that can be used to instantiate the generic architecture for specific problem instances is presented. Such an approach also facilitates reuse of code between different systems belonging to the same problem class. An application of our approach on a realistic problem is described; the results of the exercise are presented; and how our approach compares to other work in this area is discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Draeger, Erik W.
This report documents the fact that the work in creating a strategic plan and beginning customer engagements has been completed. The description of milestone is: The newly formed advanced architecture and portability specialists (AAPS) team will develop a strategic plan to meet the goals of 1) sharing knowledge and experience with code teams to ensure that ASC codes run well on new architectures, and 2) supplying skilled computational scientists to put the strategy into practice. The plan will be delivered to ASC management in the first quarter. By the fourth quarter, the team will identify their first customers within PEMmore » and IC, perform an initial assessment and scalability and performance bottleneck for next-generation architectures, and embed AAPS team members with customer code teams to assist with initial portability development within standalone kernels or proxy applications.« less
Low Power LDPC Code Decoder Architecture Based on Intermediate Message Compression Technique
NASA Astrophysics Data System (ADS)
Shimizu, Kazunori; Togawa, Nozomu; Ikenaga, Takeshi; Goto, Satoshi
Reducing the power dissipation for LDPC code decoder is a major challenging task to apply it to the practical digital communication systems. In this paper, we propose a low power LDPC code decoder architecture based on an intermediate message-compression technique which features as follows: (i) An intermediate message compression technique enables the decoder to reduce the required memory capacity and write power dissipation. (ii) A clock gated shift register based intermediate message memory architecture enables the decoder to decompress the compressed messages in a single clock cycle while reducing the read power dissipation. The combination of the above two techniques enables the decoder to reduce the power dissipation while keeping the decoding throughput. The simulation results show that the proposed architecture improves the power efficiency up to 52% and 18% compared to that of the decoder based on the overlapped schedule and the rapid convergence schedule without the proposed techniques respectively.
Lattice Boltzmann Simulation Optimization on Leading Multicore Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel; Carter, Jonathan; Oliker, Leonid
2008-02-01
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to a lattice Boltzmann application (LBMHD) that historically has made poor use of scalar microprocessors due to its complex data structures and memory access patterns. We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Clovertown, AMD Opteron X2, Sun Niagara2, STI Cell, as well as the single core Intel Itanium2. Rather than hand-tuning LBMHDmore » for each system, we develop a code generator that allows us identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned LBMHD application achieves up to a 14x improvement compared with the original code. Additionally, we present detailed analysis of each optimization, which reveal surprising hardware bottlenecks and software challenges for future multicore systems and applications.« less
Lattice Boltzmann simulation optimization on leading multicore platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, S.; Carter, J.; Oliker, L.
2008-01-01
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of searchbased performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to a lattice Boltzmann application (LBMHD) that historically has made poor use of scalar microprocessors due to its complex data structures and memory access patterns. We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Clovertown, AMD Opteron X2, Sun Niagara2, STI Cell, as well as the single core Intel Itanium2. Rather than hand-tuning LBMHDmore » for each system, we develop a code generator that allows us identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our autotuned LBMHD application achieves up to a 14 improvement compared with the original code. Additionally, we present detailed analysis of each optimization, which reveal surprising hardware bottlenecks and software challenges for future multicore systems and applications.« less
PERI - Auto-tuning Memory Intensive Kernels for Multicore
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, David H; Williams, Samuel; Datta, Kaushik
2008-06-24
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to Sparse Matrix Vector Multiplication (SpMV), the explicit heat equation PDE on a regular grid (Stencil), and a lattice Boltzmann application (LBMHD). We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Xeon Clovertown, AMD Opteron Barcelona, Sun Victoria Falls, and the Sony-Toshiba-IBM (STI) Cell. Rather than hand-tuning each kernel for each system, we developmore » a code generator for each kernel that allows us to identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned kernel applications often achieve a better than 4X improvement compared with the original code. Additionally, we analyze a Roofline performance model for each platform to reveal hardware bottlenecks and software challenges for future multicore systems and applications.« less
Large liquid rocket engine transient performance simulation system
NASA Technical Reports Server (NTRS)
Mason, J. R.; Southwick, R. D.
1989-01-01
Phase 1 of the Rocket Engine Transient Simulation (ROCETS) program consists of seven technical tasks: architecture; system requirements; component and submodel requirements; submodel implementation; component implementation; submodel testing and verification; and subsystem testing and verification. These tasks were completed. Phase 2 of ROCETS consists of two technical tasks: Technology Test Bed Engine (TTBE) model data generation; and system testing verification. During this period specific coding of the system processors was begun and the engineering representations of Phase 1 were expanded to produce a simple model of the TTBE. As the code was completed, some minor modifications to the system architecture centering on the global variable common, GLOBVAR, were necessary to increase processor efficiency. The engineering modules completed during Phase 2 are listed: INJTOO - main injector; MCHBOO - main chamber; NOZLOO - nozzle thrust calculations; PBRNOO - preburner; PIPE02 - compressible flow without inertia; PUMPOO - polytropic pump; ROTROO - rotor torque balance/speed derivative; and TURBOO - turbine. Detailed documentation of these modules is in the Appendix. In addition to the engineering modules, several submodules were also completed. These submodules include combustion properties, component performance characteristics (maps), and specific utilities. Specific coding was begun on the system configuration processor. All functions necessary for multiple module operation were completed but the SOLVER implementation is still under development. This system, the Verification Checkout Facility (VCF) allows interactive comparison of module results to store data as well as provides an intermediate checkout of the processor code. After validation using the VCF, the engineering modules and submodules were used to build a simple TTBE.
External Dependencies-Driven Architecture Discovery and Analysis of Implemented Systems
NASA Technical Reports Server (NTRS)
Ganesan, Dharmalingam; Lindvall, Mikael; Ron, Monica
2014-01-01
A method for architecture discovery and analysis of implemented systems (AIS) is disclosed. The premise of the method is that architecture decisions are inspired and influenced by the external entities that the software system makes use of. Examples of such external entities are COTS components, frameworks, and ultimately even the programming language itself and its libraries. Traces of these architecture decisions can thus be found in the implemented software and is manifested in the way software systems use such external entities. While this fact is often ignored in contemporary reverse engineering methods, the AIS method actively leverages and makes use of the dependencies to external entities as a starting point for the architecture discovery. The AIS method is demonstrated using the NASA's Space Network Access System (SNAS). The results show that, with abundant evidence, the method offers reusable and repeatable guidelines for discovering the architecture and locating potential risks (e.g. low testability, decreased performance) that are hidden deep in the implementation. The analysis is conducted by using external dependencies to identify, classify and review a minimal set of key source code files. Given the benefits of analyzing external dependencies as a way to discover architectures, it is argued that external dependencies deserve to be treated as first-class citizens during reverse engineering. The current structure of a knowledge base of external entities and analysis questions with strategies for getting answers is also discussed.
Batagov, Arsen O; Yarmishyn, Aliaksandr A; Jenjaroenpun, Piroon; Tan, Jovina Z; Nishida, Yuichiro; Kurochkin, Igor V
2013-10-16
Mammalian genomes are extensively transcribed producing thousands of long non-protein-coding RNAs (lncRNAs). The biological significance and function of the vast majority of lncRNAs remain unclear. Recent studies have implicated several lncRNAs as playing important roles in embryonic development and cancer progression. LncRNAs are characterized with different genomic architectures in relationship with their associated protein-coding genes. Our study aimed at bridging lncRNA architecture with dynamical patterns of their expression using differentiating human neuroblastoma cells model. LncRNA expression was studied in a 120-hours timecourse of differentiation of human neuroblastoma SH-SY5Y cells into neurons upon treatment with retinoic acid (RA), the compound used for the treatment of neuroblastoma. A custom microarray chip was utilized to interrogate expression levels of 9,267 lncRNAs in the course of differentiation. We categorized lncRNAs into 19 architecture classes according to their position relatively to protein-coding genes. For each architecture class, dynamics of expression of lncRNAs was studied in association with their protein-coding partners. It allowed us to demonstrate positive correlation of lncRNAs with their associated protein-coding genes at bidirectional promoters and for sense-antisense transcript pairs. In contrast, lncRNAs located in the introns and downstream of the protein-coding genes were characterized with negative correlation modes. We further classified the lncRNAs by the temporal patterns of their expression dynamics. We found that intronic and bidirectional promoter architectures are associated with rapid RA-dependent induction or repression of the corresponding lncRNAs, followed by their constant expression. At the same time, lncRNAs expressed downstream of protein-coding genes are characterized by rapid induction, followed by transcriptional repression. Quantitative RT-PCR analysis confirmed the discovered functional modes for several selected lncRNAs associated with proteins involved in cancer and embryonic development. This is the first report detailing dynamical changes of multiple lncRNAs during RA-induced neuroblastoma differentiation. Integration of genomic and transcriptomic levels of information allowed us to demonstrate specific behavior of lncRNAs organized in different genomic architectures. This study also provides a list of lncRNAs with possible roles in neuroblastoma.
The Reed-Solomon encoders: Conventional versus Berlekamp's architecture
NASA Technical Reports Server (NTRS)
Perlman, M.; Lee, J. J.
1982-01-01
Concatenated coding was adopted for interplanetary space missions. Concatenated coding was employed with a convolutional inner code and a Reed-Solomon (RS) outer code for spacecraft telemetry. Conventional RS encoders are compared with those that incorporate two architectural features which approximately halve the number of multiplications of a set of fixed arguments by any RS codeword symbol. The fixed arguments and the RS symbols are taken from a nonbinary finite field. Each set of multiplications is bit-serially performed and completed during one (bit-serial) symbol shift. All firmware employed by conventional RS encoders is eliminated.
Fault-tolerance in Two-dimensional Topological Systems
NASA Astrophysics Data System (ADS)
Anderson, Jonas T.
This thesis is a collection of ideas with the general goal of building, at least in the abstract, a local fault-tolerant quantum computer. The connection between quantum information and topology has proven to be an active area of research in several fields. The introduction of the toric code by Alexei Kitaev demonstrated the usefulness of topology for quantum memory and quantum computation. Many quantum codes used for quantum memory are modeled by spin systems on a lattice, with operators that extract syndrome information placed on vertices or faces of the lattice. It is natural to wonder whether the useful codes in such systems can be classified. This thesis presents work that leverages ideas from topology and graph theory to explore the space of such codes. Homological stabilizer codes are introduced and it is shown that, under a set of reasonable assumptions, any qubit homological stabilizer code is equivalent to either a toric code or a color code. Additionally, the toric code and the color code correspond to distinct classes of graphs. Many systems have been proposed as candidate quantum computers. It is very desirable to design quantum computing architectures with two-dimensional layouts and low complexity in parity-checking circuitry. Kitaev's surface codes provided the first example of codes satisfying this property. They provided a new route to fault tolerance with more modest overheads and thresholds approaching 1%. The recently discovered color codes share many properties with the surface codes, such as the ability to perform syndrome extraction locally in two dimensions. Some families of color codes admit a transversal implementation of the entire Clifford group. This work investigates color codes on the 4.8.8 lattice known as triangular codes. I develop a fault-tolerant error-correction strategy for these codes in which repeated syndrome measurements on this lattice generate a three-dimensional space-time combinatorial structure. I then develop an integer program that analyzes this structure and determines the most likely set of errors consistent with the observed syndrome values. I implement this integer program to find the threshold for depolarizing noise on small versions of these triangular codes. Because the threshold for magic-state distillation is likely to be higher than this value and because logical
Multiplier Architecture for Coding Circuits
NASA Technical Reports Server (NTRS)
Wang, C. C.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.
1986-01-01
Multipliers based on new algorithm for Galois-field (GF) arithmetic regular and expandable. Pipeline structures used for computing both multiplications and inverses. Designs suitable for implementation in very-large-scale integrated (VLSI) circuits. This general type of inverter and multiplier architecture especially useful in performing finite-field arithmetic of Reed-Solomon error-correcting codes and of some cryptographic algorithms.
Scout: high-performance heterogeneous computing made simple
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jablin, James; Mc Cormick, Patrick; Herlihy, Maurice
2011-01-26
Researchers must often write their own simulation and analysis software. During this process they simultaneously confront both computational and scientific problems. Current strategies for aiding the generation of performance-oriented programs do not abstract the software development from the science. Furthermore, the problem is becoming increasingly complex and pressing with the continued development of many-core and heterogeneous (CPU-GPU) architectures. To acbieve high performance, scientists must expertly navigate both software and hardware. Co-design between computer scientists and research scientists can alleviate but not solve this problem. The science community requires better tools for developing, optimizing, and future-proofing codes, allowing scientists to focusmore » on their research while still achieving high computational performance. Scout is a parallel programming language and extensible compiler framework targeting heterogeneous architectures. It provides the abstraction required to buffer scientists from the constantly-shifting details of hardware while still realizing higb-performance by encapsulating software and hardware optimization within a compiler framework.« less
A parallel and modular deformable cell Car-Parrinello code
NASA Astrophysics Data System (ADS)
Cavazzoni, Carlo; Chiarotti, Guido L.
1999-12-01
We have developed a modular parallel code implementing the Car-Parrinello [Phys. Rev. Lett. 55 (1985) 2471] algorithm including the variable cell dynamics [Europhys. Lett. 36 (1994) 345; J. Phys. Chem. Solids 56 (1995) 510]. Our code is written in Fortran 90, and makes use of some new programming concepts like encapsulation, data abstraction and data hiding. The code has a multi-layer hierarchical structure with tree like dependences among modules. The modules include not only the variables but also the methods acting on them, in an object oriented fashion. The modular structure allows easier code maintenance, develop and debugging procedures, and is suitable for a developer team. The layer structure permits high portability. The code displays an almost linear speed-up in a wide range of number of processors independently of the architecture. Super-linear speed up is obtained with a "smart" Fast Fourier Transform (FFT) that uses the available memory on the single node (increasing for a fixed problem with the number of processing elements) as temporary buffer to store wave function transforms. This code has been used to simulate water and ammonia at giant planet conditions for systems as large as 64 molecules for ˜50 ps.
VIRTUAL FRAME BUFFER INTERFACE
NASA Technical Reports Server (NTRS)
Wolfe, T. L.
1994-01-01
Large image processing systems use multiple frame buffers with differing architectures and vendor supplied user interfaces. This variety of architectures and interfaces creates software development, maintenance, and portability problems for application programs. The Virtual Frame Buffer Interface program makes all frame buffers appear as a generic frame buffer with a specified set of characteristics, allowing programmers to write code which will run unmodified on all supported hardware. The Virtual Frame Buffer Interface converts generic commands to actual device commands. The virtual frame buffer consists of a definition of capabilities and FORTRAN subroutines that are called by application programs. The virtual frame buffer routines may be treated as subroutines, logical functions, or integer functions by the application program. Routines are included that allocate and manage hardware resources such as frame buffers, monitors, video switches, trackballs, tablets and joysticks; access image memory planes; and perform alphanumeric font or text generation. The subroutines for the various "real" frame buffers are in separate VAX/VMS shared libraries allowing modification, correction or enhancement of the virtual interface without affecting application programs. The Virtual Frame Buffer Interface program was developed in FORTRAN 77 for a DEC VAX 11/780 or a DEC VAX 11/750 under VMS 4.X. It supports ADAGE IK3000, DEANZA IP8500, Low Resolution RAMTEK 9460, and High Resolution RAMTEK 9460 Frame Buffers. It has a central memory requirement of approximately 150K. This program was developed in 1985.
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.
2014-10-01
The Goddard cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The WRF is a widely used weather prediction system in the world. It development is a done in collaborative around the globe. The Goddard microphysics scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Goddard scheme incorporates a large number of improvements. Thus, we have optimized the code of this important part of WRF. In this paper, we present our results of optimizing the Goddard microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The Intel MIC is capable of executing a full operating system and entire programs rather than just kernels as the GPU do. The MIC coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 4.7x. Furthermore, the same optimizations improved performance on a dual socket Intel Xeon E5-2670 system by a factor of 2.8x compared to the original code.
A Study on Architecture of Malicious Code Blocking Scheme with White List in Smartphone Environment
NASA Astrophysics Data System (ADS)
Lee, Kijeong; Tolentino, Randy S.; Park, Gil-Cheol; Kim, Yong-Tae
Recently, the interest and demands for mobile communications are growing so fast because of the increasing prevalence of smartphones around the world. In addition, the existing feature phones were replaced by smartphones and it has widely improved while using the explosive growth of Internet users using smartphones, e-commerce enabled Internet banking transactions and the importance of protecting personal information. Therefore, the development of smartphones antivirus products was developed and launched in order to prevent malicious code or virus infection. In this paper, we proposed a new scheme to protect the smartphone from malicious codes and malicious applications that are element of security threats in mobile environment and to prevent information leakage from malicious code infection. The proposed scheme is based on the white list smartphone application which only allows installing authorized applications and to prevent the installation of malicious and untrusted mobile applications which can possibly infect the applications and programs of smartphones.
PAREMD: A parallel program for the evaluation of momentum space properties of atoms and molecules
NASA Astrophysics Data System (ADS)
Meena, Deep Raj; Gadre, Shridhar R.; Balanarayan, P.
2018-03-01
The present work describes a code for evaluating the electron momentum density (EMD), its moments and the associated Shannon information entropy for a multi-electron molecular system. The code works specifically for electronic wave functions obtained from traditional electronic structure packages such as GAMESS and GAUSSIAN. For the momentum space orbitals, the general expression for Gaussian basis sets in position space is analytically Fourier transformed to momentum space Gaussian basis functions. The molecular orbital coefficients of the wave function are taken as an input from the output file of the electronic structure calculation. The analytic expressions of EMD are evaluated over a fine grid and the accuracy of the code is verified by a normalization check and a numerical kinetic energy evaluation which is compared with the analytic kinetic energy given by the electronic structure package. Apart from electron momentum density, electron density in position space has also been integrated into this package. The program is written in C++ and is executed through a Shell script. It is also tuned for multicore machines with shared memory through OpenMP. The program has been tested for a variety of molecules and correlated methods such as CISD, Møller-Plesset second order (MP2) theory and density functional methods. For correlated methods, the PAREMD program uses natural spin orbitals as an input. The program has been benchmarked for a variety of Gaussian basis sets for different molecules showing a linear speedup on a parallel architecture.
Parallel computation with the force
NASA Technical Reports Server (NTRS)
Jordan, H. F.
1985-01-01
A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
Intel NX to PVM 3.2 message passing conversion library
NASA Technical Reports Server (NTRS)
Arthur, Trey; Nelson, Michael L.
1993-01-01
NASA Langley Research Center has developed a library that allows Intel NX message passing codes to be executed under the more popular and widely supported Parallel Virtual Machine (PVM) message passing library. PVM was developed at Oak Ridge National Labs and has become the defacto standard for message passing. This library will allow the many programs that were developed on the Intel iPSC/860 or Intel Paragon in a Single Program Multiple Data (SPMD) design to be ported to the numerous architectures that PVM (version 3.2) supports. Also, the library adds global operations capability to PVM. A familiarity with Intel NX and PVM message passing is assumed.
Scheduling Operations for Massive Heterogeneous Clusters
NASA Technical Reports Server (NTRS)
Humphrey, John; Spagnoli, Kyle
2013-01-01
High-performance computing (HPC) programming has become increasingly difficult with the advent of hybrid supercomputers consisting of multicore CPUs and accelerator boards such as the GPU. Manual tuning of software to achieve high performance on this type of machine has been performed by programmers. This is needlessly difficult and prone to being invalidated by new hardware, new software, or changes in the underlying code. A system was developed for task-based representation of programs, which when coupled with a scheduler and runtime system, allows for many benefits, including higher performance and utilization of computational resources, easier programming and porting, and adaptations of code during runtime. The system consists of a method of representing computer algorithms as a series of data-dependent tasks. The series forms a graph, which can be scheduled for execution on many nodes of a supercomputer efficiently by a computer algorithm. The schedule is executed by a dispatch component, which is tailored to understand all of the hardware types that may be available within the system. The scheduler is informed by a cluster mapping tool, which generates a topology of available resources and their strengths and communication costs. Software is decoupled from its hardware, which aids in porting to future architectures. A computer algorithm schedules all operations, which for systems of high complexity (i.e., most NASA codes), cannot be performed optimally by a human. The system aids in reducing repetitive code, such as communication code, and aids in the reduction of redundant code across projects. It adds new features to code automatically, such as recovering from a lost node or the ability to modify the code while running. In this project, the innovators at the time of this reporting intend to develop two distinct technologies that build upon each other and both of which serve as building blocks for more efficient HPC usage. First is the scheduling and dynamic execution framework, and the second is scalable linear algebra libraries that are built directly on the former.
NASA Astrophysics Data System (ADS)
Furukawa, Tatsuya; Aoki, Noriyuki; Ohchi, Masashi; Nakao, Masaki
The image proccessing has become a useful and important technology in various reserch and development fields. According to such demands for engineering problems, we have designed and implemented the educational support system for that using a Java Applet technology. However in the conventional system, it required the tedious procedure for the end user to code his own programs. Therefore, in this study, we have improved the defect in the previous system by using a Java Servlet technology. The new system will make it possible for novice user to experience a practical digital image proccessing and an advanced programming with ease. We will describe the architecture of the proposed system function, that has been introduced to facilitate the client-side programming.
38 CFR 39.63 - Architectural design standards.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 5000, Building Construction and Safety Code, and the 2002 edition of the National Electrical Code, NFPA... 5000, Building Construction and Safety Code. Where the adopted codes state conflicting requirements... the standards set forth in this section, all applicable local and State building codes and regulations...
38 CFR 39.63 - Architectural design standards.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 5000, Building Construction and Safety Code, and the 2002 edition of the National Electrical Code, NFPA... 5000, Building Construction and Safety Code. Where the adopted codes state conflicting requirements... the standards set forth in this section, all applicable local and State building codes and regulations...
38 CFR 39.63 - Architectural design standards.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 5000, Building Construction and Safety Code, and the 2002 edition of the National Electrical Code, NFPA... 5000, Building Construction and Safety Code. Where the adopted codes state conflicting requirements... the standards set forth in this section, all applicable local and State building codes and regulations...
An efficient interpolation filter VLSI architecture for HEVC standard
NASA Astrophysics Data System (ADS)
Zhou, Wei; Zhou, Xin; Lian, Xiaocong; Liu, Zhenyu; Liu, Xiaoxiang
2015-12-01
The next-generation video coding standard of High-Efficiency Video Coding (HEVC) is especially efficient for coding high-resolution video such as 8K-ultra-high-definition (UHD) video. Fractional motion estimation in HEVC presents a significant challenge in clock latency and area cost as it consumes more than 40 % of the total encoding time and thus results in high computational complexity. With aims at supporting 8K-UHD video applications, an efficient interpolation filter VLSI architecture for HEVC is proposed in this paper. Firstly, a new interpolation filter algorithm based on the 8-pixel interpolation unit is proposed in this paper. It can save 19.7 % processing time on average with acceptable coding quality degradation. Based on the proposed algorithm, an efficient interpolation filter VLSI architecture, composed of a reused data path of interpolation, an efficient memory organization, and a reconfigurable pipeline interpolation filter engine, is presented to reduce the implement hardware area and achieve high throughput. The final VLSI implementation only requires 37.2k gates in a standard 90-nm CMOS technology at an operating frequency of 240 MHz. The proposed architecture can be reused for either half-pixel interpolation or quarter-pixel interpolation, which can reduce the area cost for about 131,040 bits RAM. The processing latency of our proposed VLSI architecture can support the real-time processing of 4:2:0 format 7680 × 4320@78fps video sequences.
The Tera Multithreaded Architecture and Unstructured Meshes
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.; Mavriplis, Dimitri J.
1998-01-01
The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures.
System, methods and apparatus for program optimization for multi-threaded processor architectures
Bastoul, Cedric; Lethin, Richard A; Leung, Allen K; Meister, Benoit J; Szilagyi, Peter; Vasilache, Nicolas T; Wohlford, David E
2015-01-06
Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
NASA Astrophysics Data System (ADS)
Schmieschek, S.; Shamardin, L.; Frijters, S.; Krüger, T.; Schiller, U. D.; Harting, J.; Coveney, P. V.
2017-08-01
We introduce the lattice-Boltzmann code LB3D, version 7.1. Building on a parallel program and supporting tools which have enabled research utilising high performance computing resources for nearly two decades, LB3D version 7 provides a subset of the research code functionality as an open source project. Here, we describe the theoretical basis of the algorithm as well as computational aspects of the implementation. The software package is validated against simulations of meso-phases resulting from self-assembly in ternary fluid mixtures comprising immiscible and amphiphilic components such as water-oil-surfactant systems. The impact of the surfactant species on the dynamics of spinodal decomposition are tested and quantitative measurement of the permeability of a body centred cubic (BCC) model porous medium for a simple binary mixture is described. Single-core performance and scaling behaviour of the code are reported for simulations on current supercomputer architectures.
RELAP-7 Software Verification and Validation Plan
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, Curtis L.; Choi, Yong-Joon; Zou, Ling
This INL plan comprehensively describes the software for RELAP-7 and documents the software, interface, and software design requirements for the application. The plan also describes the testing-based software verification and validation (SV&V) process—a set of specially designed software models used to test RELAP-7. The RELAP-7 (Reactor Excursion and Leak Analysis Program) code is a nuclear reactor system safety analysis code being developed at Idaho National Laboratory (INL). The code is based on the INL’s modern scientific software development framework – MOOSE (Multi-Physics Object-Oriented Simulation Environment). The overall design goal of RELAP-7 is to take advantage of the previous thirty yearsmore » of advancements in computer architecture, software design, numerical integration methods, and physical models. The end result will be a reactor systems analysis capability that retains and improves upon RELAP5’s capability and extends the analysis capability for all reactor system simulation scenarios.« less
Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations
NASA Astrophysics Data System (ADS)
Bernaschi, M.; Bisson, M.; Salvadore, F.
2014-10-01
We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. We present data also for a traditional high-end multi-core architecture: the Intel Sandy Bridge. The results show that although on the two Intel architectures it is possible to use basically the same code, the performances of a Intel MIC change dramatically depending on (apparently) minor details. Another issue is that to obtain a reasonable scalability with the Intel Phi coprocessor (Phi is the coprocessor that implements the MIC architecture) in a cluster configuration it is necessary to use the so-called offload mode which reduces the performances of the single system. As to the GPU, the Kepler architecture offers a clear advantage with respect to the previous Fermi architecture maintaining exactly the same source code. Scalability of the multi-GPU implementation remains very good by using the CPU as a communication co-processor of the GPU. All source codes are provided for inspection and for double-checking the results.
2013-06-01
Radio is a software development toolkit that provides signal processing blocks to drive the SDR. GNU Radio has many strong points – it is actively...maintained with a large user base, new capabilities are constantly being added, and compiled C code is fast for many real-time applications such as...programming interface (API) makes learning the architecture a daunting task, even for the experienced software developer. This requirement poses many
A Code Generation Approach for Auto-Vectorization in the Spade Compiler
NASA Astrophysics Data System (ADS)
Wang, Huayong; Andrade, Henrique; Gedik, Buğra; Wu, Kun-Lung
We describe an auto-vectorization approach for the Spade stream processing programming language, comprising two ideas. First, we provide support for vectors as a primitive data type. Second, we provide a C++ library with architecture-specific implementations of a large number of pre-vectorized operations as the means to support language extensions. We evaluate our approach with several stream processing operators, contrasting Spade's auto-vectorization with the native auto-vectorization provided by the GNU gcc and Intel icc compilers.
The Environment-Power System Analysis Tool development program. [for spacecraft power supplies
NASA Technical Reports Server (NTRS)
Jongeward, Gary A.; Kuharski, Robert A.; Kennedy, Eric M.; Wilcox, Katherine G.; Stevens, N. John; Putnam, Rand M.; Roche, James C.
1989-01-01
The Environment Power System Analysis Tool (EPSAT) is being developed to provide engineers with the ability to assess the effects of a broad range of environmental interactions on space power systems. A unique user-interface-data-dictionary code architecture oversees a collection of existing and future environmental modeling codes (e.g., neutral density) and physical interaction models (e.g., sheath ionization). The user-interface presents the engineer with tables, graphs, and plots which, under supervision of the data dictionary, are automatically updated in response to parameter change. EPSAT thus provides the engineer with a comprehensive and responsive environmental assessment tool and the scientist with a framework into which new environmental or physical models can be easily incorporated.
Certifying Domain-Specific Policies
NASA Technical Reports Server (NTRS)
Lowry, Michael; Pressburger, Thomas; Rosu, Grigore; Koga, Dennis (Technical Monitor)
2001-01-01
Proof-checking code for compliance to safety policies potentially enables a product-oriented approach to certain aspects of software certification. To date, previous research has focused on generic, low-level programming-language properties such as memory type safety. In this paper we consider proof-checking higher-level domain -specific properties for compliance to safety policies. The paper first describes a framework related to abstract interpretation in which compliance to a class of certification policies can be efficiently calculated Membership equational logic is shown to provide a rich logic for carrying out such calculations, including partiality, for certification. The architecture for a domain-specific certifier is described, followed by an implemented case study. The case study considers consistency of abstract variable attributes in code that performs geometric calculations in Aerospace systems.
DataRocket: Interactive Visualisation of Data Structures
NASA Astrophysics Data System (ADS)
Parkes, Steve; Ramsay, Craig
2010-08-01
CodeRocket is a software engineering tool that provides cognitive support to the software engineer for reasoning about a method or procedure and for documenting the resulting code [1]. DataRocket is a software engineering tool designed to support visualisation and reasoning about program data structures. DataRocket is part of the CodeRocket family of software tools developed by Rapid Quality Systems [2] a spin-out company from the Space Technology Centre at the University of Dundee. CodeRocket and DataRocket integrate seamlessly with existing architectural design and coding tools and provide extensive documentation with little or no effort on behalf of the software engineer. Comprehensive, abstract, detailed design documentation is available early on in a project so that it can be used for design reviews with project managers and non expert stakeholders. Code and documentation remain fully synchronised even when changes are implemented in the code without reference to the existing documentation. At the end of a project the press of a button suffices to produce the detailed design document. Existing legacy code can be easily imported into CodeRocket and DataRocket to reverse engineer detailed design documentation making legacy code more manageable and adding substantially to its value. This paper introduces CodeRocket. It then explains the rationale for DataRocket and describes the key features of this new tool. Finally the major benefits of DataRocket for different stakeholders are considered.
Numerical Propulsion System Simulation Architecture
NASA Technical Reports Server (NTRS)
Naiman, Cynthia G.
2004-01-01
The Numerical Propulsion System Simulation (NPSS) is a framework for performing analysis of complex systems. Because the NPSS was developed using the object-oriented paradigm, the resulting architecture is an extensible and flexible framework that is currently being used by a diverse set of participants in government, academia, and the aerospace industry. NPSS is being used by over 15 different institutions to support rockets, hypersonics, power and propulsion, fuel cells, ground based power, and aerospace. Full system-level simulations as well as subsystems may be modeled using NPSS. The NPSS architecture enables the coupling of analyses at various levels of detail, which is called numerical zooming. The middleware used to enable zooming and distributed simulations is the Common Object Request Broker Architecture (CORBA). The NPSS Developer's Kit offers tools for the developer to generate CORBA-based components and wrap codes. The Developer's Kit enables distributed multi-fidelity and multi-discipline simulations, preserves proprietary and legacy codes, and facilitates addition of customized codes. The platforms supported are PC, Linux, HP, Sun, and SGI.
Validation of Framework Code Approach to a Life Prediction System for Fiber Reinforced Composites
NASA Technical Reports Server (NTRS)
Gravett, Phillip
1997-01-01
The grant was conducted by the MMC Life Prediction Cooperative, an industry/government collaborative team, Ohio Aerospace Institute (OAI) acted as the prime contractor on behalf of the Cooperative for this grant effort. See Figure I for the organization and responsibilities of team members. The technical effort was conducted during the period August 7, 1995 to June 30, 1996 in cooperation with Erwin Zaretsky, the LERC Program Monitor. Phil Gravett of Pratt & Whitney was the principal technical investigator. Table I documents all meeting-related coordination memos during this period. The effort under this grant was closely coordinated with an existing USAF sponsored program focused on putting into practice a life prediction system for turbine engine components made of metal matrix composites (MMC). The overall architecture of the NMC life prediction system was defined in the USAF sponsored program (prior to this grant). The efforts of this grant were focussed on implementing and tailoring of the life prediction system, the framework code within it and the damage modules within it to meet the specific requirements of the Cooperative. T'he tailoring of the life prediction system provides the basis for pervasive and continued use of this capability by the industry/government cooperative. The outputs of this grant are: 1. Definition of the framework code to analysis modules interfaces, 2. Definition of the interface between the materials database and the finite element model, and 3. Definition of the integration of the framework code into an FEM design tool.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Seyong; Kim, Jungwon; Vetter, Jeffrey S
This paper presents a directive-based, high-level programming framework for high-performance reconfigurable computing. It takes a standard, portable OpenACC C program as input and generates a hardware configuration file for execution on FPGAs. We implemented this prototype system using our open-source OpenARC compiler; it performs source-to-source translation and optimization of the input OpenACC program into an OpenCL code, which is further compiled into a FPGA program by the backend Altera Offline OpenCL compiler. Internally, the design of OpenARC uses a high- level intermediate representation that separates concerns of program representation from underlying architectures, which facilitates portability of OpenARC. In fact, thismore » design allowed us to create the OpenACC-to-FPGA translation framework with minimal extensions to our existing system. In addition, we show that our proposed FPGA-specific compiler optimizations and novel OpenACC pragma extensions assist the compiler in generating more efficient FPGA hardware configuration files. Our empirical evaluation on an Altera Stratix V FPGA with eight OpenACC benchmarks demonstrate the benefits of our strategy. To demonstrate the portability of OpenARC, we show results for the same benchmarks executing on other heterogeneous platforms, including NVIDIA GPUs, AMD GPUs, and Intel Xeon Phis. This initial evidence helps support the goal of using a directive-based, high-level programming strategy for performance portability across heterogeneous HPC architectures.« less
NASA Technical Reports Server (NTRS)
Pindera, Marek-Jerzy; Salzar, Robert S.
1996-01-01
The objective of this work was the development of efficient, user-friendly computer codes for optimizing fabrication-induced residual stresses in metal matrix composites through the use of homogeneous and heterogeneous interfacial layer architectures and processing parameter variation. To satisfy this objective, three major computer codes have been developed and delivered to the NASA-Lewis Research Center, namely MCCM, OPTCOMP, and OPTCOMP2. MCCM is a general research-oriented code for investigating the effects of microstructural details, such as layered morphology of SCS-6 SiC fibers and multiple homogeneous interfacial layers, on the inelastic response of unidirectional metal matrix composites under axisymmetric thermomechanical loading. OPTCOMP and OPTCOMP2 combine the major analysis module resident in MCCM with a commercially-available optimization algorithm and are driven by user-friendly interfaces which facilitate input data construction and program execution. OPTCOMP enables the user to identify those dimensions, geometric arrangements and thermoelastoplastic properties of homogeneous interfacial layers that minimize thermal residual stresses for the specified set of constraints. OPTCOMP2 provides additional flexibility in the residual stress optimization through variation of the processing parameters (time, temperature, external pressure and axial load) as well as the microstructure of the interfacial region which is treated as a heterogeneous two-phase composite. Overviews of the capabilities of these codes are provided together with a summary of results that addresses the effects of various microstructural details of the fiber, interfacial layers and matrix region on the optimization of fabrication-induced residual stresses in metal matrix composites.
Early MIMD experience on the CRAY X-MP
NASA Astrophysics Data System (ADS)
Rhoades, Clifford E.; Stevens, K. G.
1985-07-01
This paper describes some early experience with converting four physics simulation programs to the CRAY X-MP, a current Multiple Instruction, Multiple Data (MIMD) computer consisting of two processors each with an architecture similar to that of the CRAY-1. As a multi-processor, the CRAY X-MP together with the high speed Solid-state Storage Device (SSD) in an ideal machine upon which to study MIMD algorithms for solving the equations of mathematical physics because it is fast enough to run real problems. The computer programs used in this study are all FORTRAN versions of original production codes. They range in sophistication from a one-dimensional numerical simulation of collisionless plasma to a two-dimensional hydrodynamics code with heat flow to a couple of three-dimensional fluid dynamics codes with varying degrees of viscous modeling. Early research with a dual processor configuration has shown speed-ups ranging from 1.55 to 1.98. It has been observed that a few simple extensions to FORTRAN allow a typical programmer to achieve a remarkable level of efficiency. These extensions involve the concept of memory local to a concurrent subprogram and memory common to all concurrent subprograms.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-06-21
... Architectural and Industrial Maintenance Coatings Regulations AGENCY: Environmental Protection Agency (EPA... and Architectural and Industrial Maintenance Coatings Regulations. The revision amends 25 Pa. Code Chapter 130, Subchapters B and C (relating to consumer products and architectural and industrial...
Weight-4 Parity Checks on a Surface Code Sublattice with Superconducting Qubits
NASA Astrophysics Data System (ADS)
Takita, Maika; Corcoles, Antonio; Magesan, Easwar; Bronn, Nicholas; Hertzberg, Jared; Gambetta, Jay; Steffen, Matthias; Chow, Jerry
We present a superconducting qubit quantum processor design amenable to the surface code architecture. In such architecture, parity checks on the data qubits, performed by measuring their X- and Z- syndrome qubits, constitute a critical aspect. Here we show fidelities and outcomes of X- and Z-parity measurements done on a syndrome qubit in a full plaquette consisting of one syndrome qubit coupled via bus resonators to four code qubits. Parities are measured after four code qubits are prepared into sixteen initial states in each basis. Results show strong dependence on ZZ between qubits on the same bus resonators. This work is supported by IARPA under Contract W911NF-10-1-0324.
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws
NASA Technical Reports Server (NTRS)
Cooke, Daniel; Rushton, Nelson
2013-01-01
With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fernandes, Ana; Pereira, Rita C.; Sousa, Jorge
The Instituto de Plasmas e Fusao Nuclear (IPFN) has developed dedicated re-configurable modules based on field programmable gate array (FPGA) devices for several nuclear fusion machines worldwide. Moreover, new Advanced Telecommunication Computing Architecture (ATCA) based modules developed by IPFN are already included in the ITER catalogue. One of the requirements for re-configurable modules operating in future nuclear environments including ITER is the remote update capability. Accordingly, this work presents an alternative method for FPGA remote programing to be implemented in new ATCA based re-configurable modules. FPGAs are volatile devices and their programming code is usually stored in dedicated flash memoriesmore » for properly configuration during module power-on. The presented method is capable to store new FPGA codes in Serial Peripheral Interface (SPI) flash memories using the PCIexpress (PCIe) network established on the ATCA back-plane, linking data acquisition endpoints and the data switch blades. The method is based on the Xilinx Quick Boot application note, adapted to PCIe protocol and ATCA based modules. (authors)« less
MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit
Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R.; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer
2012-01-01
MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/. PMID:23082188
S2PLOT: Three-dimensional (3D) Plotting Library
NASA Astrophysics Data System (ADS)
Barnes, D. G.; Fluke, C. J.; Bourke, P. D.; Parry, O. T.
2011-03-01
We present a new, three-dimensional (3D) plotting library with advanced features, and support for standard and enhanced display devices. The library - S2PLOT - is written in C and can be used by C, C++ and FORTRAN programs on GNU/Linux and Apple/OSX systems. S2PLOT draws objects in a 3D (x,y,z) Cartesian space and the user interactively controls how this space is rendered at run time. With a PGPLOT inspired interface, S2PLOT provides astronomers with elegant techniques for displaying and exploring 3D data sets directly from their program code, and the potential to use stereoscopic and dome display devices. The S2PLOT architecture supports dynamic geometry and can be used to plot time-evolving data sets, such as might be produced by simulation codes. In this paper, we introduce S2PLOT to the astronomical community, describe its potential applications, and present some example uses of the library.
An Advanced, Three-Dimensional Plotting Library for Astronomy
NASA Astrophysics Data System (ADS)
Barnes, David G.; Fluke, Christopher J.; Bourke, Paul D.; Parry, Owen T.
2006-07-01
We present a new, three-dimensional (3D) plotting library with advanced features, and support for standard and enhanced display devices. The library - s2plot - is written in c and can be used by c, c++, and fortran programs on GNU/Linux and Apple/OSX systems. s2plot draws objects in a 3D (x,y,z) Cartesian space and the user interactively controls how this space is rendered at run time. With a pgplot-inspired interface, s2plot provides astronomers with elegant techniques for displaying and exploring 3D data sets directly from their program code, and the potential to use stereoscopic and dome display devices. The s2plot architecture supports dynamic geometry and can be used to plot time-evolving data sets, such as might be produced by simulation codes. In this paper, we introduce s2plot to the astronomical community, describe its potential applications, and present some example uses of the library.
GPU-accelerated phase-field simulation of dendritic solidification in a binary alloy
NASA Astrophysics Data System (ADS)
Yamanaka, Akinori; Aoki, Takayuki; Ogawa, Satoi; Takaki, Tomohiro
2011-03-01
The phase-field simulation for dendritic solidification of a binary alloy has been accelerated by using a graphic processing unit (GPU). To perform the phase-field simulation of the alloy solidification on GPU, a program code was developed with computer unified device architecture (CUDA). In this paper, the implementation technique of the phase-field model on GPU is presented. Also, we evaluated the acceleration performance of the three-dimensional solidification simulation by using a single NVIDIA TESLA C1060 GPU and the developed program code. The results showed that the GPU calculation for 5763 computational grids achieved the performance of 170 GFLOPS by utilizing the shared memory as a software-managed cache. Furthermore, it can be demonstrated that the computation with the GPU is 100 times faster than that with a single CPU core. From the obtained results, we confirmed the feasibility of realizing a real-time full three-dimensional phase-field simulation of microstructure evolution on a personal desktop computer.
MOCAT: a metagenomics assembly and gene prediction toolkit.
Kultima, Jens Roat; Sunagawa, Shinichi; Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer
2012-01-01
MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.
Optimization of a Lattice Boltzmann Computation on State-of-the-Art Multicore Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel; Carter, Jonathan; Oliker, Leonid
2009-04-10
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to a lattice Boltzmann application (LBMHD) that historically has made poor use of scalar microprocessors due to its complex data structures and memory access patterns. We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Xeon E5345 (Clovertown), AMD Opteron 2214 (Santa Rosa), AMD Opteron 2356 (Barcelona), Sun T5140 T2+ (Victoria Falls), as well asmore » a QS20 IBM Cell Blade. Rather than hand-tuning LBMHD for each system, we develop a code generator that allows us to identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned LBMHD application achieves up to a 15x improvement compared with the original code at a given concurrency. Additionally, we present detailed analysis of each optimization, which reveal surprising hardware bottlenecks and software challenges for future multicore systems and applications.« less
Investigating the impact of the cielo cray XE6 architecture on scientific application codes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rajan, Mahesh; Barrett, Richard; Pedretti, Kevin Thomas Tauke
2010-12-01
Cielo, a Cray XE6, is the Department of Energy NNSA Advanced Simulation and Computing (ASC) campaign's newest capability machine. Rated at 1.37 PFLOPS, it consists of 8,944 dual-socket oct-core AMD Magny-Cours compute nodes, linked using Cray's Gemini interconnect. Its primary mission objective is to enable a suite of the ASC applications implemented using MPI to scale to tens of thousands of cores. Cielo is an evolutionary improvement to a successful architecture previously available to many of our codes, thus enabling a basis for understanding the capabilities of this new architecture. Using three codes strategically important to the ASC campaign, andmore » supplemented with some micro-benchmarks that expose the fundamental capabilities of the XE6, we report on the performance characteristics and capabilities of Cielo.« less
Spectral-element Seismic Wave Propagation on CUDA/OpenCL Hardware Accelerators
NASA Astrophysics Data System (ADS)
Peter, D. B.; Videau, B.; Pouget, K.; Komatitsch, D.
2015-12-01
Seismic wave propagation codes are essential tools to investigate a variety of wave phenomena in the Earth. Furthermore, they can now be used for seismic full-waveform inversions in regional- and global-scale adjoint tomography. Although these seismic wave propagation solvers are crucial ingredients to improve the resolution of tomographic images to answer important questions about the nature of Earth's internal processes and subsurface structure, their practical application is often limited due to high computational costs. They thus need high-performance computing (HPC) facilities to improving the current state of knowledge. At present, numerous large HPC systems embed many-core architectures such as graphics processing units (GPUs) to enhance numerical performance. Such hardware accelerators can be programmed using either the CUDA programming environment or the OpenCL language standard. CUDA software development targets NVIDIA graphic cards while OpenCL was adopted by additional hardware accelerators, like e.g. AMD graphic cards, ARM-based processors as well as Intel Xeon Phi coprocessors. For seismic wave propagation simulations using the open-source spectral-element code package SPECFEM3D_GLOBE, we incorporated an automatic source-to-source code generation tool (BOAST) which allows us to use meta-programming of all computational kernels for forward and adjoint runs. Using our BOAST kernels, we generate optimized source code for both CUDA and OpenCL languages within the source code package. Thus, seismic wave simulations are able now to fully utilize CUDA and OpenCL hardware accelerators. We show benchmarks of forward seismic wave propagation simulations using SPECFEM3D_GLOBE on CUDA/OpenCL GPUs, validating results and comparing performances for different simulations and hardware usages.
Automated Discovery of Machine-Specific Code Improvements
1984-12-01
operation of the source language. Additional analysis may reveal special features of the target architecture that may be exploited to generate efficient...Additional analysis may reveal special features of the target architecture that may be exploited to generate efficient code. Such analysis is optional...incorporate knowledge of the source language, but do not refer to features of the target machine. These early phases are sometimes referred to as the
Architectural Terms for Educational Planners.
ERIC Educational Resources Information Center
1997
This booklet is designed to facilitate open, clear communication between educational facility planners and the architects hired to oversee building design and construction. It provides a list of architectural, electrical, plumbing, and topographical symbols; a glossary of architectural terms; and a list of public agencies and relevant codes and…
FEDEF: A High Level Architecture Federate Development Framework
2010-09-01
require code changes for operability between HLA specifications. Configuration of federate requirements such as publications, subscriptions, time ... management , and management protocol should occur outside of federate source code, allowing for federate reusability without code modification and re
GPU COMPUTING FOR PARTICLE TRACKING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishimura, Hiroshi; Song, Kai; Muriki, Krishna
2011-03-25
This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculationmore » of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.« less
Use Computer-Aided Tools to Parallelize Large CFD Applications
NASA Technical Reports Server (NTRS)
Jin, H.; Frumkin, M.; Yan, J.
2000-01-01
Porting applications to high performance parallel computers is always a challenging task. It is time consuming and costly. With rapid progressing in hardware architectures and increasing complexity of real applications in recent years, the problem becomes even more sever. Today, scalability and high performance are mostly involving handwritten parallel programs using message-passing libraries (e.g. MPI). However, this process is very difficult and often error-prone. The recent reemergence of shared memory parallel (SMP) architectures, such as the cache coherent Non-Uniform Memory Access (ccNUMA) architecture used in the SGI Origin 2000, show good prospects for scaling beyond hundreds of processors. Programming on an SMP is simplified by working in a globally accessible address space. The user can supply compiler directives, such as OpenMP, to parallelize the code. As an industry standard for portable implementation of parallel programs for SMPs, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran, C and C++ to express shared memory parallelism. It promises an incremental path for parallel conversion of existing software, as well as scalability and performance for a complete rewrite or an entirely new development. Perhaps the main disadvantage of programming with directives is that inserted directives may not necessarily enhance performance. In the worst cases, it can create erroneous results. While vendors have provided tools to perform error-checking and profiling, automation in directive insertion is very limited and often failed on large programs, primarily due to the lack of a thorough enough data dependence analysis. To overcome the deficiency, we have developed a toolkit, CAPO, to automatically insert OpenMP directives in Fortran programs and apply certain degrees of optimization. CAPO is aimed at taking advantage of detailed inter-procedural dependence analysis provided by CAPTools, developed by the University of Greenwich, to reduce potential errors made by users. Earlier tests on NAS Benchmarks and ARC3D have demonstrated good success of this tool. In this study, we have applied CAPO to parallelize three large applications in the area of computational fluid dynamics (CFD): OVERFLOW, TLNS3D and INS3D. These codes are widely used for solving Navier-Stokes equations with complicated boundary conditions and turbulence model in multiple zones. Each one comprises of from 50K to 1,00k lines of FORTRAN77. As an example, CAPO took 77 hours to complete the data dependence analysis of OVERFLOW on a workstation (SGI, 175MHz, R10K processor). A fair amount of effort was spent on correcting false dependencies due to lack of necessary knowledge during the analysis. Even so, CAPO provides an easy way for user to interact with the parallelization process. The OpenMP version was generated within a day after the analysis was completed. Due to sequential algorithms involved, code sections in TLNS3D and INS3D need to be restructured by hand to produce more efficient parallel codes. An included figure shows preliminary test results of the generated OVERFLOW with several test cases in single zone. The MPI data points for the small test case were taken from a handcoded MPI version. As we can see, CAPO's version has achieved 18 fold speed up on 32 nodes of the SGI O2K. For the small test case, it outperformed the MPI version. These results are very encouraging, but further work is needed. For example, although CAPO attempts to place directives on the outer- most parallel loops in an interprocedural framework, it does not insert directives based on the best manual strategy. In particular, it lacks the support of parallelization at the multi-zone level. Future work will emphasize on the development of methodology to work in a multi-zone level and with a hybrid approach. Development of tools to perform more complicated code transformation is also needed.
NASA Technical Reports Server (NTRS)
Simmons, Reid; Apfelbaum, David
2005-01-01
Task Description Language (TDL) is an extension of the C++ programming language that enables programmers to quickly and easily write complex, concurrent computer programs for controlling real-time autonomous systems, including robots and spacecraft. TDL is based on earlier work (circa 1984 through 1989) on the Task Control Architecture (TCA). TDL provides syntactic support for hierarchical task-level control functions, including task decomposition, synchronization, execution monitoring, and exception handling. A Java-language-based compiler transforms TDL programs into pure C++ code that includes calls to a platform-independent task-control-management (TCM) library. TDL has been used to control and coordinate multiple heterogeneous robots in projects sponsored by NASA and the Defense Advanced Research Projects Agency (DARPA). It has also been used in Brazil to control an autonomous airship and in Canada to control a robotic manipulator.
Dynamic Weather Routes Architecture Overview
NASA Technical Reports Server (NTRS)
Eslami, Hassan; Eshow, Michelle
2014-01-01
Dynamic Weather Routes Architecture Overview, presents the high level software architecture of DWR, based on the CTAS software framework and the Direct-To automation tool. The document also covers external and internal data flows, required dataset, changes to the Direct-To software for DWR, collection of software statistics, and the code structure.
NASA Astrophysics Data System (ADS)
de Schryver, C.; Weithoffer, S.; Wasenmüller, U.; Wehn, N.
2012-09-01
Channel coding is a standard technique in all wireless communication systems. In addition to the typically employed methods like convolutional coding, turbo coding or low density parity check (LDPC) coding, algebraic codes are used in many cases. For example, outer BCH coding is applied in the DVB-S2 standard for satellite TV broadcasting. A key operation for BCH and the related Reed-Solomon codes are multiplications in finite fields (Galois Fields), where extension fields of prime fields are used. A lot of architectures for multiplications in finite fields have been published over the last decades. This paper examines four different multiplier architectures in detail that offer the potential for very high throughputs. We investigate the implementation performance of these multipliers on FPGA technology in the context of channel coding. We study the efficiency of the multipliers with respect to area, frequency and throughput, as well as configurability and scalability. The implementation data of the fully verified circuits are provided for a Xilinx Virtex-4 device after place and route.
The structure of the clouds distributed operating system
NASA Technical Reports Server (NTRS)
Dasgupta, Partha; Leblanc, Richard J., Jr.
1989-01-01
A novel system architecture, based on the object model, is the central structuring concept used in the Clouds distributed operating system. This architecture makes Clouds attractive over a wide class of machines and environments. Clouds is a native operating system, designed and implemented at Georgia Tech. and runs on a set of generated purpose computers connected via a local area network. The system architecture of Clouds is composed of a system-wide global set of persistent (long-lived) virtual address spaces, called objects that contain persistent data and code. The object concept is implemented at the operating system level, thus presenting a single level storage view to the user. Lightweight treads carry computational activity through the code stored in the objects. The persistent objects and threads gives rise to a programming environment composed of shared permanent memory, dispensing with the need for hardware-derived concepts such as the file systems and message systems. Though the hardware may be distributed and may have disks and networks, the Clouds provides the applications with a logically centralized system, based on a shared, structured, single level store. The current design of Clouds uses a minimalist philosophy with respect to both the kernel and the operating system. That is, the kernel and the operating system support a bare minimum of functionality. Clouds also adheres to the concept of separation of policy and mechanism. Most low-level operating system services are implemented above the kernel and most high level services are implemented at the user level. From the measured performance of using the kernel mechanisms, we are able to demonstrate that efficient implementations are feasible for the object model on commercially available hardware. Clouds provides a rich environment for conducting research in distributed systems. Some of the topics addressed in this paper include distributed programming environments, consistency of persistent data and fault-tolerance.
Analysis of view synthesis prediction architectures in modern coding standards
NASA Astrophysics Data System (ADS)
Tian, Dong; Zou, Feng; Lee, Chris; Vetro, Anthony; Sun, Huifang
2013-09-01
Depth-based 3D formats are currently being developed as extensions to both AVC and HEVC standards. The availability of depth information facilitates the generation of intermediate views for advanced 3D applications and displays, and also enables more efficient coding of the multiview input data through view synthesis prediction techniques. This paper outlines several approaches that have been explored to realize view synthesis prediction in modern video coding standards such as AVC and HEVC. The benefits and drawbacks of various architectures are analyzed in terms of performance, complexity, and other design considerations. It is hence concluded that block-based VSP prediction for multiview video signals provides attractive coding gains with comparable complexity as traditional motion/disparity compensation.
Analysis of Defenses Against Code Reuse Attacks on Modern and New Architectures
2015-09-01
soundness or completeness. An incomplete analysis will produce extra edges in the CFG that might allow an attacker to slip through. An unsound analysis...Analysis of Defenses Against Code Reuse Attacks on Modern and New Architectures by Isaac Noah Evans Submitted to the Department of Electrical...Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer
PIC codes for plasma accelerators on emerging computer architectures (GPUS, Multicore/Manycore CPUS)
NASA Astrophysics Data System (ADS)
Vincenti, Henri
2016-03-01
The advent of exascale computers will enable 3D simulations of a new laser-plasma interaction regimes that were previously out of reach of current Petasale computers. However, the paradigm used to write current PIC codes will have to change in order to fully exploit the potentialities of these new computing architectures. Indeed, achieving Exascale computing facilities in the next decade will be a great challenge in terms of energy consumption and will imply hardware developments directly impacting our way of implementing PIC codes. As data movement (from die to network) is by far the most energy consuming part of an algorithm future computers will tend to increase memory locality at the hardware level and reduce energy consumption related to data movement by using more and more cores on each compute nodes (''fat nodes'') that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, CPU machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD register length is expected to double every four years. GPU's also have a reduced clock speed per core and can process Multiple Instructions on Multiple Datas (MIMD). At the software level Particle-In-Cell (PIC) codes will thus have to achieve both good memory locality and vectorization (for Multicore/Manycore CPU) to fully take advantage of these upcoming architectures. In this talk, we present the portable solutions we implemented in our high performance skeleton PIC code PICSAR to both achieve good memory locality and cache reuse as well as good vectorization on SIMD architectures. We also present the portable solutions used to parallelize the Pseudo-sepctral quasi-cylindrical code FBPIC on GPUs using the Numba python compiler.
Software Architecture of the NASA Shuttle Ground Operations Simulator - SGOS
NASA Technical Reports Server (NTRS)
Cook, Robert P.; Lostroscio, Charles T.
2005-01-01
The SGOS executive and its subsystems have been an integral component of the Shuttle Launch Safety Program for almost thirty years. It is usable (via the LAN) by over 2000 NASA employees at the Kennedy Space Center and 11,000 contractors. SGOS supports over 800 models comprised of several hundred thousand lines of code and over 1,000 MCP procedures. Yet neither language has a for loop!! The simulation software described in this paper is used to train ground controllers and to certify launch countdown readiness.
Software Architecture of the NASA Shuttle Ground Operations Simulator--SGOS
NASA Technical Reports Server (NTRS)
Cook Robert P.; Lostroscio, Charles T.
2005-01-01
The SGOS executive and its subsystems have been an integral component of the Shuttle Launch Safety Program for almost thirty years. it is usable (via the LAN) by over 2000 NASA employees at the Kennedy Space Center and 11,000 contractors. SGOS supports over 800 models comprised of several hundred thousand lines of code and over 1,00 MCP procedures. Yet neither language has a for loop!! The simulation software described in this paper is used to train ground controllers and to certify launch countdown readiness.
JETSPIN: A specific-purpose open-source software for simulations of nanofiber electrospinning
NASA Astrophysics Data System (ADS)
Lauricella, Marco; Pontrelli, Giuseppe; Coluzza, Ivan; Pisignano, Dario; Succi, Sauro
2015-12-01
We present the open-source computer program JETSPIN, specifically designed to simulate the electrospinning process of nanofibers. Its capabilities are shown with proper reference to the underlying model, as well as a description of the relevant input variables and associated test-case simulations. The various interactions included in the electrospinning model implemented in JETSPIN are discussed in detail. The code is designed to exploit different computational architectures, from single to parallel processor workstations. This paper provides an overview of JETSPIN, focusing primarily on its structure, parallel implementations, functionality, performance, and availability.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)
2002-01-01
In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chakraborty, S.; Kroposki, B.; Kramer, W.
Integrating renewable energy and distributed generations into the Smart Grid architecture requires power electronic (PE) for energy conversion. The key to reaching successful Smart Grid implementation is to develop interoperable, intelligent, and advanced PE technology that improves and accelerates the use of distributed energy resource systems. This report describes the simulation, design, and testing of a single-phase DC-to-AC inverter developed to operate in both islanded and utility-connected mode. It provides results on both the simulations and the experiments conducted, demonstrating the ability of the inverter to provide advanced control functions such as power flow and VAR/voltage regulation. This report alsomore » analyzes two different techniques used for digital signal processor (DSP) code generation. Initially, the DSP code was written in C programming language using Texas Instrument's Code Composer Studio. In a later stage of the research, the Simulink DSP toolbox was used to self-generate code for the DSP. The successful tests using Simulink self-generated DSP codes show promise for fast prototyping of PE controls.« less
Federal Register 2010, 2011, 2012, 2013, 2014
2012-09-19
... Buildings Service; Information Collection; Art-in- Architecture Program National Artist Registry (GSA Form... regarding Art-in Architecture Program National Artist Registry (GSA Form 7437). The Art-in-Architecture...-Architecture & Fine Arts Division (PCAC), 1800 F Street NW., Room 3305, Washington, DC 20405, at telephone(202...
MPEG4: coding for content, interactivity, and universal accessibility
NASA Astrophysics Data System (ADS)
Reader, Cliff
1996-01-01
MPEG4 is a natural extension of audiovisual coding, and yet from many perspectives breaks new ground as a standard. New coding techniques are being introduced, of course, but they will work on new data structures. The standard itself has a new architecture, and will use a new operational model when implemented on equipment that is likely to have innovative system architecture. The author introduces the background developments in technology and applications that are driving or enabling the standard, introduces the focus of MPEG4, and enumerates the new functionalities to be supported. Key applications in interactive TV and heterogeneous environments are discussed. The architecture of MPEG4 is described, followed by a discussion of the multiphase MPEG4 communication scenarios, and issues of practical implementation of MPEG4 terminals. The paper concludes with a description of the MPEG4 workplan. In summary, MPEG4 has two fundamental attributes. First, it is the coding of audiovisual objects, which may be natural or synthetic data in two or three dimensions. Second, the heart of MPEG4 is its syntax: the MPEG4 Syntactic Descriptive Language -- MSDL.
Terra Harvest Open Source Environment (THOSE): a universal unattended ground sensor controller
NASA Astrophysics Data System (ADS)
Gold, Joshua; Klawon, Kevin; Humeniuk, David; Landoll, Darren
2011-06-01
Under the Terra Harvest Program, the Defense Intelligence Agency (DIA) has the objective of developing a universal Controller for the Unattended Ground Sensor (UGS) community. The mission is to define, implement, and thoroughly document an open architecture that universally supports UGS missions, integrating disparate systems, peripherals, etc. The Controller's inherent interoperability with numerous systems enables the integration of both legacy and future Unattended Ground Sensor System (UGSS) components, while the design's open architecture supports rapid third-party development to ensure operational readiness. The successful accomplishment of these objectives by the program's Phase 3b contractors is demonstrated via integration of the companies' respective plug-'n-play contributions that include various peripherals, such as sensors, cameras, etc., and their associated software drivers. In order to independently validate the Terra Harvest architecture, L-3 Nova Engineering, along with its partner, the University of Dayton Research Institute (UDRI), is developing the Terra Harvest Open Source Environment (THOSE), a Java based system running on an embedded Linux Operating System (OS). The Use Cases on which the software is developed support the full range of UGS operational scenarios such as remote sensor triggering, image capture, and data exfiltration. The Team is additionally developing an ARM microprocessor evaluation platform that is both energyefficient and operationally flexible. The paper describes the overall THOSE architecture, as well as the implementation strategy for some of the key software components. Preliminary integration/test results and the Team's approach for transitioning the THOSE design and source code to the Government are also presented.
Parallel evolutionary computation in bioinformatics applications.
Pinho, Jorge; Sobral, João Luis; Rocha, Miguel
2013-05-01
A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Sanna, N.; Baccarelli, I.; Morelli, G.
2009-12-01
SCELib is a computer program which implements the Single Center Expansion (SCE) method to describe molecular electronic densities and the interaction potentials between a charged projectile (electron or positron) and a target molecular system. The first version (CPC Catalog identifier ADMG_v1_0) was submitted to the CPC Program Library in 2000, and version 2.0 (ADMG_v2_0) was submitted in 2004. We here announce the new release 3.0 which presents additional features with respect to the previous versions aiming at a significative enhance of its capabilities to deal with larger molecular systems. SCELib 3.0 allows for ab initio effective core potential (ECP) calculations of the molecular wavefunctions to be used in the SCE method in addition to the standard all-electron description of the molecule. The list of supported architectures has been updated and the code has been ported to platforms based on accelerating coprocessors, such as the NVIDIA GPGPU and the new parallel model adopted is able to efficiently run on a mixed many-core computing system. Program summaryProgram title: SCELib3.0 Catalogue identifier: ADMG_v3_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADMG_v3_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 2 018 862 No. of bytes in distributed program, including test data, etc.: 4 955 014 Distribution format: tar.gz Programming language: C Compilers used: xlc V8.x, Intel C V10.x, Portland Group V7.x, nvcc V2.x Computer: All SMP platforms based on AIX, Linux and SUNOS operating systems over SPARC, POWER, Intel Itanium2, X86, em64t and Opteron processors Operating system: SUNOS, IBM AIX, Linux RedHat (Enterprise), Linux SuSE (SLES) Has the code been vectorized or parallelized?: Yes. 1 to 32 (CPU or GPU) used RAM: Up to 32 GB depending on the molecular system and runtime parameters Classification: 16.5 Catalogue identifier of previous version: ADMG_v2_0 Journal reference of previous version: Comput. Phys. Comm. 162 (2004) 51 External routines: CUDA libraries (SDK V2.x). Does the new version supersede the previous version?: Yes Nature of problem: In this set of codes an efficient procedure is implemented to describe the wavefunction and related molecular properties of a polyatomic molecular system within the Single Center of Expansion (SCE) approximation. The resulting SCE wavefunction, electron density, electrostatic and correlation/polarization potentials can then be used in a wide variety of applications, such as electron-molecule scattering calculations, quantum chemistry studies, biomodelling and drug design. Solution method: The polycentre Hartree-Fock solution for a molecule of arbitrary geometry, based on linear combination of Gaussian-Type Orbital (GTO), is expanded over a single center, typically the Center Of Mass (C.O.M.), by means of a Gauss Legendre/Chebyschev quadrature over the θ,φ angular coordinates. The resulting SCE numerical wavefunction is then used to calculate the one-particle electron density, the electrostatic potential and two different models for the correlation/polarization potentials induced by the impinging electron, which have the correct asymptotic behavior for the leading dipole molecular polarizabilities. Reasons for new version: The present release of SCELib allows the study of larger molecular systems with respect to the previous versions by means of theoretical and technological advances, with the first implementation of the code over a many-core computing system. Summary of revisions: The major features added with respect to SCELib Version 2.0 are molecular wavefunctions obtained via the Los Alamos (Hay and Wadt) LAN ECP plus DZ description of the inner-shell electrons (on Na-La, Hf-Bi elements) [1] can now be single-center-expanded; the addition required modifications of: (i) the filtering code readgau, (ii) the main reading function setinp, (iii) the sphint code (including changes to the CalcMO code), (iv) the densty code, (v) the vst code; the classes of platforms supported now include two more architectures based on accelerated coprocessors (Nvidia GSeries GPGPU and ClearSpeed e720 (ClearSpeed version, experimental; initial preliminary porting of the sphint() function not for production runs - see the code documentation for additional detail). A single-precision representation for real numbers in the SCE mapping of the GTOs ( sphint code), has been implemented into the new code; the I h symmetry point group for the molecular systems has been added to those already allowed in the SCE procedure; the orientation of the molecular axis system for the Cs (planar) symmetry has been changed in accord with the standard orientation adopted by the latest version of the quantum chemistry code (Gaussian C03 [2]), which is used to generate the input multi-centre molecular wavefunctions ( z-axis perpendicular to the symmetry plane); the abelian subgroup for the Cs point group has been changed from C 1 to Cs; atomic basis functions including g-type GTOs can now be single-center-expanded. Restrictions: Depending on the molecular system under study and on the operating conditions the program may or may not fit into available RAM memory. In this case a feature of the program is to memory map a disk file in order to efficiently access the memory data through a disk device. The parallel GP-GPU implementation limits the number of CPU threads to the number of GPU cores present. Running time: The execution time strongly depends on the molecular target description and on the hardware/OS chosen, it is directly proportional to the ( r,θ,φ) grid size and to the number of angular basis functions used. Thus, from the program printout of the main arrays memory occupancy, the user can approximately derive the expected computer time needed for a given calculation executed in serial mode. For parallel executions the overall efficiency must be further taken into account, and this depends on the no. of processors used as well as on the parallel architecture chosen, so a simple general law is at present not determinable. References:[1] P.J. Hay, W.R. Wadt, J. Chem. Phys. 82 (1985) 270; W.R. Wadt, P.J. Hay, J. Chem. Phys. 284 (1985);P.J. Hay, W.R. Wadt, J. Chem. Phys. 299 (1985). [2] M.J. Frisch et al., Gaussian 03, revision C.02, Gaussian, Inc., Wallingford, CT, 2004.
COOL: A code for Dynamic Monte Carlo Simulation of molecular dynamics
NASA Astrophysics Data System (ADS)
Barletta, Paolo
2012-02-01
Cool is a program to simulate evaporative and sympathetic cooling for a mixture of two gases co-trapped in an harmonic potential. The collisions involved are assumed to be exclusively elastic, and losses are due to evaporation from the trap. Each particle is followed individually in its trajectory, consequently properties such as spatial densities or energy distributions can be readily evaluated. The code can be used sequentially, by employing one output as input for another run. The code can be easily generalised to describe more complicated processes, such as the inclusion of inelastic collisions, or the possible presence of more than two species in the trap. New version program summaryProgram title: COOL Catalogue identifier: AEHJ_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHJ_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 1 097 733 No. of bytes in distributed program, including test data, etc.: 18 425 722 Distribution format: tar.gz Programming language: C++ Computer: Desktop Operating system: Linux RAM: 500 Mbytes Classification: 16.7, 23 Catalogue identifier of previous version: AEHJ_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 388 Does the new version supersede the previous version?: Yes Nature of problem: Simulation of the sympathetic process occurring for two molecular gases co-trapped in a deep optical trap. Solution method: The Direct Simulation Monte Carlo method exploits the decoupling, over a short time period, of the inter-particle interaction from the trapping potential. The particle dynamics is thus exclusively driven by the external optical field. The rare inter-particle collisions are considered with an acceptance/rejection mechanism, that is, by comparing a random number to the collisional probability defined in terms of the inter-particle cross section and centre-of-mass energy. All particles in the trap are individually simulated so that at each time step a number of useful quantities, such as the spatial densities or the energy distributions, can be readily evaluated. Reasons for new version: A number of issues made the old version very difficult to be ported on different architectures, and impossible to compile on Windows. Furthermore, the test runs results could only be replicated poorly, as a consequence of the simulations being very sensitive to the machine background noise. In practise, as the particles are simulated for billions and billions of steps, the consequence of a small difference in the initial conditions due to the finiteness of double precision real can have macroscopic effects in the output. This is not a problem in its own right, but a feature of such simulations. However, for sake of completeness we have introduced a quadruple precision version of the code which yields the same results independently of the software used to compile it, or the hardware architecture where the code is run. Summary of revisions: A number of bugs in the dynamic memory allocation have been detected and removed, mostly in the cool.cpp file. All files have been renamed with a .cpp ending, rather than .c++, to make them compatible with Windows. The Random Number Generator routine, which is the computational core of the algorithm, has been re-written in C++, and there is no need any longer for cross FORTRAN-C++ compilation. A quadruple precision version of the code is provided alongside the original double precision one. The makefile allows the user to choose which one to compile by setting the switch PRECISION to either double or quad. The source code and header files have been organised into directories to make the code file system look neater. Restrictions: The in-trap motion of the particles is treated classically. Running time: The running time is relatively short, 1-2 hours. However it is convenient to replicate each simulation several times with different initialisations of the random sequence.
Architectural Coatings: National Volatile Organic Compounds Emission Standards
Read about the section 183(e) rule for volatile organic compounds for architectural coatings. Read the rule summary and history, find the code of federal regulations test, and additional documents, including compliance information.
OCTGRAV: Sparse Octree Gravitational N-body Code on Graphics Processing Units
NASA Astrophysics Data System (ADS)
Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon
2010-10-01
Octgrav is a very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The algorithms are based on parallel-scan and sort methods. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way, a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s is achieved. It takes about a second to compute forces on a million particles with an opening angle of heta approx 0.5. To test the performance and feasibility, we implemented the algorithms in CUDA in the form of a gravitational tree-code which completely runs on the GPU. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second. The code has a convenient user interface and is freely available for use.
Multiple grid problems on concurrent-processing computers
NASA Technical Reports Server (NTRS)
Eberhardt, D. S.; Baganoff, D.
1986-01-01
Three computer codes were studied which make use of concurrent processing computer architectures in computational fluid dynamics (CFD). The three parallel codes were tested on a two processor multiple-instruction/multiple-data (MIMD) facility at NASA Ames Research Center, and are suggested for efficient parallel computations. The first code is a well-known program which makes use of the Beam and Warming, implicit, approximate factored algorithm. This study demonstrates the parallelism found in a well-known scheme and it achieved speedups exceeding 1.9 on the two processor MIMD test facility. The second code studied made use of an embedded grid scheme which is used to solve problems having complex geometries. The particular application for this study considered an airfoil/flap geometry in an incompressible flow. The scheme eliminates some of the inherent difficulties found in adapting approximate factorization techniques onto MIMD machines and allows the use of chaotic relaxation and asynchronous iteration techniques. The third code studied is an application of overset grids to a supersonic blunt body problem. The code addresses the difficulties encountered when using embedded grids on a compressible, and therefore nonlinear, problem. The complex numerical boundary system associated with overset grids is discussed and several boundary schemes are suggested. A boundary scheme based on the method of characteristics achieved the best results.
ASC Tri-lab Co-design Level 2 Milestone Report 2015
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hornung, Rich; Jones, Holger; Keasler, Jeff
2015-09-23
In 2015, the three Department of Energy (DOE) National Laboratories that make up the Advanced Sci- enti c Computing (ASC) Program (Sandia, Lawrence Livermore, and Los Alamos) collaboratively explored performance portability programming environments in the context of several ASC co-design proxy applica- tions as part of a tri-lab L2 milestone executed by the co-design teams at each laboratory. The programming environments that were studied included Kokkos (developed at Sandia), RAJA (LLNL), and Legion (Stan- ford University). The proxy apps studied included: miniAero, LULESH, CoMD, Kripke, and SNAP. These programming models and proxy-apps are described herein. Each lab focused on amore » particular combination of abstractions and proxy apps, with the goal of assessing performance portability using those. Performance portability was determined by: a) the ability to run a single application source code on multiple advanced architectures, b) comparing runtime performance between \
Low-Level Space Optimization of an AES Implementation for a Bit-Serial Fully Pipelined Architecture
NASA Astrophysics Data System (ADS)
Weber, Raphael; Rettberg, Achim
A previously developed AES (Advanced Encryption Standard) implementation is optimized and described in this paper. The special architecture for which this implementation is targeted comprises synchronous and systematic bit-serial processing without a central controlling instance. In order to shrink the design in terms of logic utilization we deeply analyzed the architecture and the AES implementation to identify the most costly logic elements. We propose to merge certain parts of the logic to achieve better area efficiency. The approach was integrated into an existing synthesis tool which we used to produce synthesizable VHDL code. For testing purposes, we simulated the generated VHDL code and ran tests on an FPGA board.
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Blazewicz, Marek; Hinder, Ian; Koppelman, David M.; ...
2013-01-01
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization ismore » based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.« less
Toward performance portability of the Albany finite element analysis code using the Kokkos library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
Toward performance portability of the Albany finite element analysis code using the Kokkos library
Demeshko, Irina; Watkins, Jerry; Tezaur, Irina K.; ...
2018-02-05
Performance portability on heterogeneous high-performance computing (HPC) systems is a major challenge faced today by code developers: parallel code needs to be executed correctly as well as with high performance on machines with different architectures, operating systems, and software libraries. The finite element method (FEM) is a popular and flexible method for discretizing partial differential equations arising in a wide variety of scientific, engineering, and industrial applications that require HPC. This paper presents some preliminary results pertaining to our development of a performance portable implementation of the FEM-based Albany code. Performance portability is achieved using the Kokkos library. We presentmore » performance results for the Aeras global atmosphere dynamical core module in Albany. Finally, numerical experiments show that our single code implementation gives reasonable performance across three multicore/many-core architectures: NVIDIA General Processing Units (GPU’s), Intel Xeon Phis, and multicore CPUs.« less
Three-Dimensional Terahertz Coded-Aperture Imaging Based on Single Input Multiple Output Technology.
Chen, Shuo; Luo, Chenggao; Deng, Bin; Wang, Hongqiang; Cheng, Yongqiang; Zhuang, Zhaowen
2018-01-19
As a promising radar imaging technique, terahertz coded-aperture imaging (TCAI) can achieve high-resolution, forward-looking, and staring imaging by producing spatiotemporal independent signals with coded apertures. In this paper, we propose a three-dimensional (3D) TCAI architecture based on single input multiple output (SIMO) technology, which can reduce the coding and sampling times sharply. The coded aperture applied in the proposed TCAI architecture loads either purposive or random phase modulation factor. In the transmitting process, the purposive phase modulation factor drives the terahertz beam to scan the divided 3D imaging cells. In the receiving process, the random phase modulation factor is adopted to modulate the terahertz wave to be spatiotemporally independent for high resolution. Considering human-scale targets, images of each 3D imaging cell are reconstructed one by one to decompose the global computational complexity, and then are synthesized together to obtain the complete high-resolution image. As for each imaging cell, the multi-resolution imaging method helps to reduce the computational burden on a large-scale reference-signal matrix. The experimental results demonstrate that the proposed architecture can achieve high-resolution imaging with much less time for 3D targets and has great potential in applications such as security screening, nondestructive detection, medical diagnosis, etc.
NASA Technical Reports Server (NTRS)
Walker, Carrie K.
1991-01-01
A technique has been developed for combining features of a systems architecture design and assessment tool and a software development tool. This technique reduces simulation development time and expands simulation detail. The Architecture Design and Assessment System (ADAS), developed at the Research Triangle Institute, is a set of computer-assisted engineering tools for the design and analysis of computer systems. The ADAS system is based on directed graph concepts and supports the synthesis and analysis of software algorithms mapped to candidate hardware implementations. Greater simulation detail is provided by the ADAS functional simulator. With the functional simulator, programs written in either Ada or C can be used to provide a detailed description of graph nodes. A Computer-Aided Software Engineering tool developed at the Charles Stark Draper Laboratory (CSDL CASE) automatically generates Ada or C code from engineering block diagram specifications designed with an interactive graphical interface. A technique to use the tools together has been developed, which further automates the design process.
Automated Vectorization of Decision-Based Algorithms
NASA Technical Reports Server (NTRS)
James, Mark
2006-01-01
Virtually all existing vectorization algorithms are designed to only analyze the numeric properties of an algorithm and distribute those elements across multiple processors. This advances the state of the practice because it is the only known system, at the time of this reporting, that takes high-level statements and analyzes them for their decision properties and converts them to a form that allows them to automatically be executed in parallel. The software takes a high-level source program that describes a complex decision- based condition and rewrites it as a disjunctive set of component Boolean relations that can then be executed in parallel. This is important because parallel architectures are becoming more commonplace in conventional systems and they have always been present in NASA flight systems. This technology allows one to take existing condition-based code and automatically vectorize it so it naturally decomposes across parallel architectures.
Deployment of the OSIRIS EM-PIC code on the Intel Knights Landing architecture
NASA Astrophysics Data System (ADS)
Fonseca, Ricardo
2017-10-01
Electromagnetic particle-in-cell (EM-PIC) codes such as OSIRIS have found widespread use in modelling the highly nonlinear and kinetic processes that occur in several relevant plasma physics scenarios, ranging from astrophysical settings to high-intensity laser plasma interaction. Being computationally intensive, these codes require large scale HPC systems, and a continuous effort in adapting the algorithm to new hardware and computing paradigms. In this work, we report on our efforts on deploying the OSIRIS code on the new Intel Knights Landing (KNL) architecture. Unlike the previous generation (Knights Corner), these boards are standalone systems, and introduce several new features, include the new AVX-512 instructions and on-package MCDRAM. We will focus on the parallelization and vectorization strategies followed, as well as memory management, and present a detailed performance evaluation of code performance in comparison with the CPU code. This work was partially supported by Fundaçã para a Ciência e Tecnologia (FCT), Portugal, through Grant No. PTDC/FIS-PLA/2940/2014.
Hypercube matrix computation task
NASA Technical Reports Server (NTRS)
Calalo, Ruel H.; Imbriale, William A.; Jacobi, Nathan; Liewer, Paulett C.; Lockhart, Thomas G.; Lyzenga, Gregory A.; Lyons, James R.; Manshadi, Farzin; Patterson, Jean E.
1988-01-01
A major objective of the Hypercube Matrix Computation effort at the Jet Propulsion Laboratory (JPL) is to investigate the applicability of a parallel computing architecture to the solution of large-scale electromagnetic scattering problems. Three scattering analysis codes are being implemented and assessed on a JPL/California Institute of Technology (Caltech) Mark 3 Hypercube. The codes, which utilize different underlying algorithms, give a means of evaluating the general applicability of this parallel architecture. The three analysis codes being implemented are a frequency domain method of moments code, a time domain finite difference code, and a frequency domain finite elements code. These analysis capabilities are being integrated into an electromagnetics interactive analysis workstation which can serve as a design tool for the construction of antennas and other radiating or scattering structures. The first two years of work on the Hypercube Matrix Computation effort is summarized. It includes both new developments and results as well as work previously reported in the Hypercube Matrix Computation Task: Final Report for 1986 to 1987 (JPL Publication 87-18).
NASA Astrophysics Data System (ADS)
Qin, Yi; Wang, Hongjuan; Wang, Zhipeng; Gong, Qiong; Wang, Danchen
2016-09-01
In optical interference-based encryption (IBE) scheme, the currently available methods have to employ the iterative algorithms in order to encrypt two images and retrieve cross-talk free decrypted images. In this paper, we shall show that this goal can be achieved via an analytical process if one of the two images is QR code. For decryption, the QR code is decrypted in the conventional architecture and the decryption has a noisy appearance. Nevertheless, the robustness of QR code against noise enables the accurate acquisition of its content from the noisy retrieval, as a result of which the primary QR code can be exactly regenerated. Thereafter, a novel optical architecture is proposed to recover the grayscale image by aid of the QR code. In addition, the proposal has totally eliminated the silhouette problem existing in the previous IBE schemes, and its effectiveness and feasibility have been demonstrated by numerical simulations.
Static analysis techniques for semiautomatic synthesis of message passing software skeletons
Sottile, Matthew; Dagit, Jason; Zhang, Deli; ...
2015-06-29
The design of high-performance computing architectures demands performance analysis of large-scale parallel applications to derive various parameters concerning hardware design and software development. The process of performance analysis and benchmarking an application can be done in several ways with varying degrees of fidelity. One of the most cost-effective ways is to do a coarse-grained study of large-scale parallel applications through the use of program skeletons. The concept of a “program skeleton” that we discuss in this article is an abstracted program that is derived from a larger program where source code that is determined to be irrelevant is removed formore » the purposes of the skeleton. In this work, we develop a semiautomatic approach for extracting program skeletons based on compiler program analysis. Finally, we demonstrate correctness of our skeleton extraction process by comparing details from communication traces, as well as show the performance speedup of using skeletons by running simulations in the SST/macro simulator.« less
Domain Specific Language Support for Exascale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mellor-Crummey, John
A multi-institutional project known as D-TEC (short for “Domain- specific Technology for Exascale Computing”) set out to explore technologies to support the construction of Domain Specific Languages (DSLs) to map application programs to exascale architectures. DSLs employ automated code transformation to shift the burden of delivering portable performance from application programmers to compilers. Two chief properties contribute: DSLs permit expression at a high level of abstraction so that a programmer’s intent is clear to a compiler and DSL implementations encapsulate human domain-specific optimization knowledge so that a compiler can be smart enough to achieve good results on specific hardware. Domainmore » specificity is what makes these properties possible in a programming language. If leveraging domain specificity is the key to keep exascale software tractable, a corollary is that many different DSLs will be needed to encompass the full range of exascale computing applications; moreover, a single application may well need to use several different DSLs in conjunction. As a result, developing a general toolkit for building domain-specific languages was a key goal for the D-TEC project. Different aspects of the D-TEC research portfolio were the focus of work at each of the partner institutions in the multi-institutional project. D-TEC research and development work at Rice University focused on on three principal topics: understanding how to automate the tuning of code for complex architectures, research and development of the Rosebud DSL engine, and compiler technology to support complex execution platforms. This report provides a summary of the research and development work on the D-TEC project at Rice University.« less
NASA Astrophysics Data System (ADS)
Ragan-Kelley, M.; Perez, F.; Granger, B.; Kluyver, T.; Ivanov, P.; Frederic, J.; Bussonnier, M.
2014-12-01
IPython has provided terminal-based tools for interactive computing in Python since 2001. The notebook document format and multi-process architecture introduced in 2011 have expanded the applicable scope of IPython into teaching, presenting, and sharing computational work, in addition to interactive exploration. The new architecture also allows users to work in any language, with implementations in Python, R, Julia, Haskell, and several other languages. The language agnostic parts of IPython have been renamed to Jupyter, to better capture the notion that a cross-language design can encapsulate commonalities present in computational research regardless of the programming language being used. This architecture offers components like the web-based Notebook interface, that supports rich documents that combine code and computational results with text narratives, mathematics, images, video and any media that a modern browser can display. This interface can be used not only in research, but also for publication and education, as notebooks can be converted to a variety of output formats, including HTML and PDF. Recent developments in the Jupyter project include a multi-user environment for hosting notebooks for a class or research group, a live collaboration notebook via Google Docs, and better support for languages other than Python.
Fully parallel write/read in resistive synaptic array for accelerating on-chip learning
NASA Astrophysics Data System (ADS)
Gao, Ligang; Wang, I.-Ting; Chen, Pai-Yu; Vrudhula, Sarma; Seo, Jae-sun; Cao, Yu; Hou, Tuo-Hung; Yu, Shimeng
2015-11-01
A neuro-inspired computing paradigm beyond the von Neumann architecture is emerging and it generally takes advantage of massive parallelism and is aimed at complex tasks that involve intelligence and learning. The cross-point array architecture with synaptic devices has been proposed for on-chip implementation of the weighted sum and weight update in the learning algorithms. In this work, forming-free, silicon-process-compatible Ta/TaO x /TiO2/Ti synaptic devices are fabricated, in which >200 levels of conductance states could be continuously tuned by identical programming pulses. In order to demonstrate the advantages of parallelism of the cross-point array architecture, a novel fully parallel write scheme is designed and experimentally demonstrated in a small-scale crossbar array to accelerate the weight update in the training process, at a speed that is independent of the array size. Compared to the conventional row-by-row write scheme, it achieves >30× speed-up and >30× improvement in energy efficiency as projected in a large-scale array. If realistic synaptic device characteristics such as device variations are taken into an array-level simulation, the proposed array architecture is able to achieve ∼95% recognition accuracy of MNIST handwritten digits, which is close to the accuracy achieved by software using the ideal sparse coding algorithm.
A new free and open source tool for space plasma modeling.
NASA Astrophysics Data System (ADS)
Honkonen, I. J.
2014-12-01
I will present a new distributed memory parallel, free and open source computational model for studying space plasma. The model is written in C++ with emphasis on good software development practices and code readability without sacrificing serial or parallel performance. As such the model could be especially useful for education, for learning both (magneto)hydrodynamics (MHD) and computational model development. By using latest features of the C++ standard (2011) it has been possible to develop a very modular program which improves not only the readability of code but also the testability of the model and decreases the effort required to make changes to various parts of the program. Major parts of the model, functionality not directly related to (M)HD, have been outsourced to other freely available libraries which has reduced the development time of the model significantly. I will present an overview of the code architecture as well as details of different parts of the model and will show examples of using the model including preparing input files and plotting results. A multitude of 1-, 2- and 3-dimensional test cases are included in the software distribution and the results of, for example, Kelvin-Helmholtz, bow shock, blast wave and reconnection tests, will be presented.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-12-11
... Buildings Service; Submission for OMB Review; Art-in- Architecture Program National Artist Registry (GSA... Architecture Program National Artist Registry (GSA Form 7437). A notice was published in the Federal Register at 77 FR 58141, on September 19, 2012. No comments were received. The Art-in-Architecture Program is...
Quantum error correction in crossbar architectures
NASA Astrophysics Data System (ADS)
Helsen, Jonas; Steudtner, Mark; Veldhorst, Menno; Wehner, Stephanie
2018-07-01
A central challenge for the scaling of quantum computing systems is the need to control all qubits in the system without a large overhead. A solution for this problem in classical computing comes in the form of so-called crossbar architectures. Recently we made a proposal for a large-scale quantum processor (Li et al arXiv:1711.03807 (2017)) to be implemented in silicon quantum dots. This system features a crossbar control architecture which limits parallel single-qubit control, but allows the scheme to overcome control scaling issues that form a major hurdle to large-scale quantum computing systems. In this work, we develop a language that makes it possible to easily map quantum circuits to crossbar systems, taking into account their architecture and control limitations. Using this language we show how to map well known quantum error correction codes such as the planar surface and color codes in this limited control setting with only a small overhead in time. We analyze the logical error behavior of this surface code mapping for estimated experimental parameters of the crossbar system and conclude that logical error suppression to a level useful for real quantum computation is feasible.
Automating tasks in protein structure determination with the clipper python module
McNicholas, Stuart; Croll, Tristan; Burnley, Tom; Palmer, Colin M.; Hoh, Soon Wen; Jenkins, Huw T.; Dodson, Eleanor
2017-01-01
Abstract Scripting programming languages provide the fastest means of prototyping complex functionality. Those with a syntax and grammar resembling human language also greatly enhance the maintainability of the produced source code. Furthermore, the combination of a powerful, machine‐independent scripting language with binary libraries tailored for each computer architecture allows programs to break free from the tight boundaries of efficiency traditionally associated with scripts. In the present work, we describe how an efficient C++ crystallographic library such as Clipper can be wrapped, adapted and generalized for use in both crystallographic and electron cryo‐microscopy applications, scripted with the Python language. We shall also place an emphasis on best practices in automation, illustrating how this can be achieved with this new Python module. PMID:28901669
Kanellopoulou, Chrysi; Muljo, Stefan A
2018-01-01
How a single genome can give rise to many different transcriptomes and thus all the different cell lineages in the human body is a fundamental question in biology. While signaling pathways, transcription factors, and chromatin architecture, to name a few determinants, have been established to play critical roles, recently, there is a growing appreciation of the roles of non-coding RNAs and RNA-binding proteins in controlling cell fates posttranscriptionally. Thus, it is vital that these emerging players are also integrated into models of gene regulatory networks that underlie programs of cellular differentiation. Sometimes, we can leverage knowledge about such posttranscriptional circuits to reprogram patterns of gene expression in meaningful ways. Here, we review three examples from our work.
SKIRT: Hybrid parallelization of radiative transfer simulations
NASA Astrophysics Data System (ADS)
Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.
2017-07-01
We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.
Unobtrusive Software and System Health Management with R2U2 on a Parallel MIMD Coprocessor
NASA Technical Reports Server (NTRS)
Schumann, Johann; Moosbrugger, Patrick
2017-01-01
Dynamic monitoring of software and system health of a complex cyber-physical system requires observers that continuously monitor variables of the embedded software in order to detect anomalies and reason about root causes. There exists a variety of techniques for code instrumentation, but instrumentation might change runtime behavior and could require costly software re-certification. In this paper, we present R2U2E, a novel realization of our real-time, Realizable, Responsive, and Unobtrusive Unit (R2U2). The R2U2E observers are executed in parallel on a dedicated 16-core EPIPHANY co-processor, thereby avoiding additional computational overhead to the system under observation. A DMA-based shared memory access architecture allows R2U2E to operate without any code instrumentation or program interference.
Multidisciplinary Analysis and Optimal Design: As Easy as it Sounds?
NASA Technical Reports Server (NTRS)
Moore, Greg; Chainyk, Mike; Schiermeier, John
2004-01-01
The viewgraph presentation examines optimal design for precision, large aperture structures. Discussion focuses on aspects of design optimization, code architecture and current capabilities, and planned activities and collaborative area suggestions. The discussion of design optimization examines design sensitivity analysis; practical considerations; and new analytical environments including finite element-based capability for high-fidelity multidisciplinary analysis, design sensitivity, and optimization. The discussion of code architecture and current capabilities includes basic thermal and structural elements, nonlinear heat transfer solutions and process, and optical modes generation.
Video Coding and the Application Level Framing Protocol Architecture
1992-06-01
missing ADU can be sent to the decoder when and if it arrives. The need for out-of- order processing arises for two reasons. First, ADUs may be reordered...by the network. Second, an ADU which is lost and then successfully retransmitted will arrive out of order. In either case, out-of- order processing makes...the code do not allow at least some out-of- order processing , one of the strong points of the ALF architecture is eliminated. 2.3.4 Header Data
Digital visual communications using a Perceptual Components Architecture
NASA Technical Reports Server (NTRS)
Watson, Andrew B.
1991-01-01
The next era of space exploration will generate extraordinary volumes of image data, and management of this image data is beyond current technical capabilities. We propose a strategy for coding visual information that exploits the known properties of early human vision. This Perceptual Components Architecture codes images and image sequences in terms of discrete samples from limited bands of color, spatial frequency, orientation, and temporal frequency. This spatiotemporal pyramid offers efficiency (low bit rate), variable resolution, device independence, error-tolerance, and extensibility.
User's manual for the time-dependent INERTIA code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, A.W.; Bennett, R.B.
1985-01-01
The time-dependent INERTIA code is described. This code models the effects of neutral beam momentum input in tokamaks as predicted by the time-dependent formulation of the Stacey-Sigmar formalism. The operation and architecture of the code are described, as are the supplementary plotting and impurity line radiation routines. A short description of the steady-state version of the INERTIA code is also provided.
Functional and Database Architecture Design.
1983-09-26
I AD-At3.N 275 FUNCTIONAL AND D ATABASE ARCHITECTURE DESIGN (U) ALPHA / OMEGA GROUP INC HARVARD MA 26 SEP 83 NODS 4-83-C 0525 UNCLASSIFIED FG52 N EE...0525 REPORT AOO1 FUNCTIONAL AND DATABASE ARCHITECTURE DESIGN Submitted to: Office of Naval Research Department of the Navy 800 N. Quincy Street...ZNTIS GRA& I DTIC TAB Unannounced 0 Justification REPORT ON Distribution/ Availability Codes Avail and/or FUNCTIONAL AND DATABASE ARCHITECTURE DESIGN Dist
Malleable architecture generator for FPGA computing
NASA Astrophysics Data System (ADS)
Gokhale, Maya; Kaba, James; Marks, Aaron; Kim, Jang
1996-10-01
The malleable architecture generator (MARGE) is a tool set that translates high-level parallel C to configuration bit streams for field-programmable logic based computing systems. MARGE creates an application-specific instruction set and generates the custom hardware components required to perform exactly those computations specified by the C program. In contrast to traditional fixed-instruction processors, MARGE's dynamic instruction set creation provides for efficient use of hardware resources. MARGE processes intermediate code in which each operation is annotated by the bit lengths of the operands. Each basic block (sequence of straight line code) is mapped into a single custom instruction which contains all the operations and logic inherent in the block. A synthesis phase maps the operations comprising the instructions into register transfer level structural components and control logic which have been optimized to exploit functional parallelism and function unit reuse. As a final stage, commercial technology-specific tools are used to generate configuration bit streams for the desired target hardware. Technology- specific pre-placed, pre-routed macro blocks are utilized to implement as much of the hardware as possible. MARGE currently supports the Xilinx-based Splash-2 reconfigurable accelerator and National Semiconductor's CLAy-based parallel accelerator, MAPA. The MARGE approach has been demonstrated on systolic applications such as DNA sequence comparison.
7 CFR 1724.50 - Compliance with National Electrical Safety Code (NESC).
Code of Federal Regulations, 2013 CFR
2013-01-01
... 7 Agriculture 11 2013-01-01 2013-01-01 false Compliance with National Electrical Safety Code (NESC... UTILITIES SERVICE, DEPARTMENT OF AGRICULTURE ELECTRIC ENGINEERING, ARCHITECTURAL SERVICES AND DESIGN POLICIES AND PROCEDURES Electric System Design § 1724.50 Compliance with National Electrical Safety Code...
7 CFR 1724.50 - Compliance with National Electrical Safety Code (NESC).
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 11 2010-01-01 2010-01-01 false Compliance with National Electrical Safety Code (NESC... UTILITIES SERVICE, DEPARTMENT OF AGRICULTURE ELECTRIC ENGINEERING, ARCHITECTURAL SERVICES AND DESIGN POLICIES AND PROCEDURES Electric System Design § 1724.50 Compliance with National Electrical Safety Code...
7 CFR 1724.50 - Compliance with National Electrical Safety Code (NESC).
Code of Federal Regulations, 2011 CFR
2011-01-01
... 7 Agriculture 11 2011-01-01 2011-01-01 false Compliance with National Electrical Safety Code (NESC... UTILITIES SERVICE, DEPARTMENT OF AGRICULTURE ELECTRIC ENGINEERING, ARCHITECTURAL SERVICES AND DESIGN POLICIES AND PROCEDURES Electric System Design § 1724.50 Compliance with National Electrical Safety Code...
7 CFR 1724.50 - Compliance with National Electrical Safety Code (NESC).
Code of Federal Regulations, 2012 CFR
2012-01-01
... 7 Agriculture 11 2012-01-01 2012-01-01 false Compliance with National Electrical Safety Code (NESC... UTILITIES SERVICE, DEPARTMENT OF AGRICULTURE ELECTRIC ENGINEERING, ARCHITECTURAL SERVICES AND DESIGN POLICIES AND PROCEDURES Electric System Design § 1724.50 Compliance with National Electrical Safety Code...
7 CFR 1724.50 - Compliance with National Electrical Safety Code (NESC).
Code of Federal Regulations, 2014 CFR
2014-01-01
... 7 Agriculture 11 2014-01-01 2014-01-01 false Compliance with National Electrical Safety Code (NESC... UTILITIES SERVICE, DEPARTMENT OF AGRICULTURE ELECTRIC ENGINEERING, ARCHITECTURAL SERVICES AND DESIGN POLICIES AND PROCEDURES Electric System Design § 1724.50 Compliance with National Electrical Safety Code...
PARALLELISATION OF THE MODEL-BASED ITERATIVE RECONSTRUCTION ALGORITHM DIRA.
Örtenberg, A; Magnusson, M; Sandborg, M; Alm Carlsson, G; Malusek, A
2016-06-01
New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelisation of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelised using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelisation of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelisation with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Olson, Richard F.
2013-05-01
Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
NASA Astrophysics Data System (ADS)
Caplan, R. M.
2013-04-01
We present a simple to use, yet powerful code package called NLSEmagic to numerically integrate the nonlinear Schrödinger equation in one, two, and three dimensions. NLSEmagic is a high-order finite-difference code package which utilizes graphic processing unit (GPU) parallel architectures. The codes running on the GPU are many times faster than their serial counterparts, and are much cheaper to run than on standard parallel clusters. The codes are developed with usability and portability in mind, and therefore are written to interface with MATLAB utilizing custom GPU-enabled C codes with the MEX-compiler interface. The packages are freely distributed, including user manuals and set-up files. Catalogue identifier: AEOJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOJ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 124453 No. of bytes in distributed program, including test data, etc.: 4728604 Distribution format: tar.gz Programming language: C, CUDA, MATLAB. Computer: PC, MAC. Operating system: Windows, MacOS, Linux. Has the code been vectorized or parallelized?: Yes. Number of processors used: Single CPU, number of GPU processors dependent on chosen GPU card (max is currently 3072 cores on GeForce GTX 690). Supplementary material: Setup guide, Installation guide. RAM: Highly dependent on dimensionality and grid size. For typical medium-large problem size in three dimensions, 4GB is sufficient. Keywords: Nonlinear Schröodinger Equation, GPU, high-order finite difference, Bose-Einstien condensates. Classification: 4.3, 7.7. Nature of problem: Integrate solutions of the time-dependent one-, two-, and three-dimensional cubic nonlinear Schrödinger equation. Solution method: The integrators utilize a fully-explicit fourth-order Runge-Kutta scheme in time and both second- and fourth-order differencing in space. The integrators are written to run on NVIDIA GPUs and are interfaced with MATLAB including built-in visualization and analysis tools. Restrictions: The main restriction for the GPU integrators is the amount of RAM on the GPU as the code is currently only designed for running on a single GPU. Unusual features: Ability to visualize real-time simulations through the interaction of MATLAB and the compiled GPU integrators. Additional comments: Setup guide and Installation guide provided. Program has a dedicated web site at www.nlsemagic.com. Running time: A three-dimensional run with a grid dimension of 87×87×203 for 3360 time steps (100 non-dimensional time units) takes about one and a half minutes on a GeForce GTX 580 GPU card.
38 CFR 39.63 - Architectural design standards.
Code of Federal Regulations, 2011 CFR
2011-07-01
... Association Life Safety Code and Errata (NFPA 101), the 2003 edition of the NFPA 5000, Building Construction... section, all applicable local and State building codes and regulations must be observed. In areas not subject to local or State building codes, the recommendations contained in the 2003 edition of the NFPA...
a Framework for Distributed Mixed Language Scientific Applications
NASA Astrophysics Data System (ADS)
Quarrie, D. R.
The Object Management Group has defined an architecture (CORBA) for distributed object applications based on an Object Request Broker and Interface Definition Language. This project builds upon this architecture to establish a framework for the creation of mixed language scientific applications. A prototype compiler has been written that generates FORTRAN 90 or Eiffel stubs and skeletons and the required C++ glue code from an input IDL file that specifies object interfaces. This generated code can be used directly for non-distributed mixed language applications or in conjunction with the C++ code generated from a commercial IDL compiler for distributed applications. A feasibility study is presently underway to see whether a fully integrated software development environment for distributed, mixed-language applications can be created by modifying the back-end code generator of a commercial CASE tool to emit IDL.
Developing Information Power Grid Based Algorithms and Software
NASA Technical Reports Server (NTRS)
Dongarra, Jack
1998-01-01
This was an exploratory study to enhance our understanding of problems involved in developing large scale applications in a heterogeneous distributed environment. It is likely that the large scale applications of the future will be built by coupling specialized computational modules together. For example, efforts now exist to couple ocean and atmospheric prediction codes to simulate a more complete climate system. These two applications differ in many respects. They have different grids, the data is in different unit systems and the algorithms for inte,-rating in time are different. In addition the code for each application is likely to have been developed on different architectures and tend to have poor performance when run on an architecture for which the code was not designed, if it runs at all. Architectural differences may also induce differences in data representation which effect precision and convergence criteria as well as data transfer issues. In order to couple such dissimilar codes some form of translation must be present. This translation should be able to handle interpolation from one grid to another as well as construction of the correct data field in the correct units from available data. Even if a code is to be developed from scratch, a modular approach will likely be followed in that standard scientific packages will be used to do the more mundane tasks such as linear algebra or Fourier transform operations. This approach allows the developers to concentrate on their science rather than becoming experts in linear algebra or signal processing. Problems associated with this development approach include difficulties associated with data extraction and translation from one module to another, module performance on different nodal architectures, and others. In addition to these data and software issues there exists operational issues such as platform stability and resource management.
Building code challenging the ethics behind adobe architecture in North Cyprus.
Hurol, Yonca; Yüceer, Hülya; Şahali, Öznem
2015-04-01
Adobe masonry is part of the vernacular architecture of Cyprus. Thus, it is possible to use this technology in a meaningful way on the island. On the other hand, although adobe architecture is more sustainable in comparison to other building technologies, the use of it is diminishing in North Cyprus. The application of Turkish building code in the north of the island has created complications in respect of the use of adobe masonry, because this building code demands that reinforced concrete vertical tie-beams are used together with adobe masonry. The use of reinforced concrete elements together with adobe masonry causes problems in relation to the climatic response of the building as well as causing other technical and aesthetic problems. This situation makes the design of adobe masonry complicated and various types of ethical problems also emerge. The objective of this article is to analyse the ethical problems which arise as a consequence of the restrictive character of the building code, by analysing two case studies and conducting an interview with an architect who was involved with the use of adobe masonry in North Cyprus. According to the results of this article there are ethical problems at various levels in the design of both case studies. These problems are connected to the responsibilities of architects in respect of the social benefit, material production, aesthetics and affordability of the architecture as well as presenting distrustful behaviour where the obligations of architects to their clients is concerned.
JASPAR RESTful API: accessing JASPAR data from any programming language.
Khan, Aziz; Mathelier, Anthony
2018-05-01
JASPAR is a widely used open-access database of curated, non-redundant transcription factor binding profiles. Currently, data from JASPAR can be retrieved as flat files or by using programming language-specific interfaces. Here, we present a programming language-independent application programming interface (API) to access JASPAR data using the Representational State Transfer (REST) architecture. The REST API enables programmatic access to JASPAR by most programming languages and returns data in eight widely used formats. Several endpoints are available to access the data and an endpoint is available to infer the TF binding profile(s) likely bound by a given DNA binding domain protein sequence. Additionally, it provides an interactive browsable interface for bioinformatics tool developers. This REST API is implemented in Python using the Django REST Framework. It is accessible at http://jaspar.genereg.net/api/ and the source code is freely available at https://bitbucket.org/CBGR/jaspar under GPL v3 license. aziz.khan@ncmm.uio.no or anthony.mathelier@ncmm.uio.no. Supplementary data are available at Bioinformatics online.
OpenMP 4.5 Validation and Verification Suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pophale, Swaroop S; Bernholdt, David E; Hernandez, Oscar R
2017-12-15
OpenMP, a directive-based programming API, introduce directives for accelerator devices that programmers are starting to use more frequently in production codes. To make sure OpenMP directives work correctly across architectures, it is critical to have a mechanism that tests for an implementation's conformance to the OpenMP standard. This testing process can uncover ambiguities in the OpenMP specification, which helps compiler developers and users make a better use of the standard. We fill this gap through our validation and verification test suite that focuses on the offload directives available in OpenMP 4.5.
ERIC Educational Resources Information Center
Luce, Ann Campbell
This resource, containing a teacher's manual, reproducible student workbook, and a color teaching poster, is designed to accompany a 21-minute videotape program, but may be adapted for independent use. Part 1 of the program, "Greek Architecture," looks at elements of architectural construction as applied to Greek structures, and…
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen
2015-10-01
The Thompson cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Thompson scheme incorporates a large number of improvements. Thus, we have optimized the speed of this important part of WRF. Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the Thompson microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. New optimizations for an updated Thompson scheme are discusses in this paper. The optimizations improved the performance of the original Thompson code on Xeon Phi 7120P by a factor of 1.8x. Furthermore, the same optimizations improved the performance of the Thompson on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 1.8x compared to the original Thompson code.
75 FR 34004 - State Cemetery Grants
Federal Register 2010, 2011, 2012, 2013, 2014
2010-06-16
... architectural design codes that apply to grant applicants, we decided to update those references in a separate.... 39.63 Architectural design standards. Subpart C--Operation and Maintenance Projects Grant... acquisition, design and planning, earth moving, landscaping, construction, and provision of initial operating...
2011-02-01
Process Architecture Technology Analysis: Executive .............................................. 15 UIMA as Executive...44 A.4: Flow Code in UIMA ......................................................................................................... 46... UIMA ................................................................................................................................ 57 E.2
NASA Astrophysics Data System (ADS)
Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.
2017-12-01
As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.
NASA Astrophysics Data System (ADS)
Litinski, Daniel; Kesselring, Markus S.; Eisert, Jens; von Oppen, Felix
2017-07-01
We present a scalable architecture for fault-tolerant topological quantum computation using networks of voltage-controlled Majorana Cooper pair boxes and topological color codes for error correction. Color codes have a set of transversal gates which coincides with the set of topologically protected gates in Majorana-based systems, namely, the Clifford gates. In this way, we establish color codes as providing a natural setting in which advantages offered by topological hardware can be combined with those arising from topological error-correcting software for full-fledged fault-tolerant quantum computing. We provide a complete description of our architecture, including the underlying physical ingredients. We start by showing that in topological superconductor networks, hexagonal cells can be employed to serve as physical qubits for universal quantum computation, and we present protocols for realizing topologically protected Clifford gates. These hexagonal-cell qubits allow for a direct implementation of open-boundary color codes with ancilla-free syndrome read-out and logical T gates via magic-state distillation. For concreteness, we describe how the necessary operations can be implemented using networks of Majorana Cooper pair boxes, and we give a feasibility estimate for error correction in this architecture. Our approach is motivated by nanowire-based networks of topological superconductors, but it could also be realized in alternative settings such as quantum-Hall-superconductor hybrids.
41 CFR 101-6.500 - Scope of subpart.
Code of Federal Regulations, 2010 CFR
2010-07-01
... Federal Government for the purpose of performing official business, at least one copy of the Code shall be... display the Code. Display shall be consistent with the decor and architecture of the building space...
i-PI: A Python interface for ab initio path integral molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Ceriotti, Michele; More, Joshua; Manolopoulos, David E.
2014-03-01
Recent developments in path integral methodology have significantly reduced the computational expense of including quantum mechanical effects in the nuclear motion in ab initio molecular dynamics simulations. However, the implementation of these developments requires a considerable programming effort, which has hindered their adoption. Here we describe i-PI, an interface written in Python that has been designed to minimise the effort required to bring state-of-the-art path integral techniques to an electronic structure program. While it is best suited to first principles calculations and path integral molecular dynamics, i-PI can also be used to perform classical molecular dynamics simulations, and can just as easily be interfaced with an empirical forcefield code. To give just one example of the many potential applications of the interface, we use it in conjunction with the CP2K electronic structure package to showcase the importance of nuclear quantum effects in high-pressure water. Catalogue identifier: AERN_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AERN_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU General Public License, version 3 No. of lines in distributed program, including test data, etc.: 138626 No. of bytes in distributed program, including test data, etc.: 3128618 Distribution format: tar.gz Programming language: Python. Computer: Multiple architectures. Operating system: Linux, Mac OSX, Windows. RAM: Less than 256 Mb Classification: 7.7. External routines: NumPy Nature of problem: Bringing the latest developments in the modelling of nuclear quantum effects with path integral molecular dynamics to ab initio electronic structure programs with minimal implementational effort. Solution method: State-of-the-art path integral molecular dynamics techniques are implemented in a Python interface. Any electronic structure code can be patched to receive the atomic coordinates from the Python interface, and to return the forces and energy that are used to integrate the equations of motion. Restrictions: This code only deals with distinguishable particles. It does not include fermonic or bosonic exchanges between equivalent nuclei, which can become important at very low temperatures. Running time: Depends dramatically on the nature of the simulation being performed. A few minutes for short tests with empirical force fields, up to several weeks for production calculations with ab initio forces. The examples provided with the code run in less than an hour.
An efficient decoding for low density parity check codes
NASA Astrophysics Data System (ADS)
Zhao, Ling; Zhang, Xiaolin; Zhu, Manjie
2009-12-01
Low density parity check (LDPC) codes are a class of forward-error-correction codes. They are among the best-known codes capable of achieving low bit error rates (BER) approaching Shannon's capacity limit. Recently, LDPC codes have been adopted by the European Digital Video Broadcasting (DVB-S2) standard, and have also been proposed for the emerging IEEE 802.16 fixed and mobile broadband wireless-access standard. The consultative committee for space data system (CCSDS) has also recommended using LDPC codes in the deep space communications and near-earth communications. It is obvious that LDPC codes will be widely used in wired and wireless communication, magnetic recording, optical networking, DVB, and other fields in the near future. Efficient hardware implementation of LDPC codes is of great interest since LDPC codes are being considered for a wide range of applications. This paper presents an efficient partially parallel decoder architecture suited for quasi-cyclic (QC) LDPC codes using Belief propagation algorithm for decoding. Algorithmic transformation and architectural level optimization are incorporated to reduce the critical path. First, analyze the check matrix of LDPC code, to find out the relationship between the row weight and the column weight. And then, the sharing level of the check node updating units (CNU) and the variable node updating units (VNU) are determined according to the relationship. After that, rearrange the CNU and the VNU, and divide them into several smaller parts, with the help of some assistant logic circuit, these smaller parts can be grouped into CNU during the check node update processing and grouped into VNU during the variable node update processing. These smaller parts are called node update kernel units (NKU) and the assistant logic circuit are called node update auxiliary unit (NAU). With NAUs' help, the two steps of iteration operation are completed by NKUs, which brings in great hardware resource reduction. Meanwhile, efficient techniques have been developed to reduce the computation delay of the node processing units and to minimize hardware overhead for parallel processing. This method may be applied not only to regular LDPC codes, but also to the irregular ones. Based on the proposed architectures, a (7493, 6096) irregular QC-LDPC code decoder is described using verilog hardware design language and implemented on Altera field programmable gate array (FPGA) StratixII EP2S130. The implementation results show that over 20% of logic core size can be saved than conventional partially parallel decoder architectures without any performance degradation. If the decoding clock is 100MHz, the proposed decoder can achieve a maximum (source data) decoding throughput of 133 Mb/s at 18 iterations.
Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs
NASA Astrophysics Data System (ADS)
Dias, Tiago; Roma, Nuno; Sousa, Leonel
2014-12-01
A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e.g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 × 4,320 at 30 fps) in real time.
Dependency graph for code analysis on emerging architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shashkov, Mikhail Jurievich; Lipnikov, Konstantin
Direct acyclic dependency (DAG) graph is becoming the standard for modern multi-physics codes.The ideal DAG is the true block-scheme of a multi-physics code. Therefore, it is the convenient object for insitu analysis of the cost of computations and algorithmic bottlenecks related to statistical frequent data motion and dymanical machine state.
NASA Astrophysics Data System (ADS)
Gel, Aytekin; Hu, Jonathan; Ould-Ahmed-Vall, ElMoustapha; Kalinkin, Alexander A.
2017-02-01
Legacy codes remain a crucial element of today's simulation-based engineering ecosystem due to the extensive validation process and investment in such software. The rapid evolution of high-performance computing architectures necessitates the modernization of these codes. One approach to modernization is a complete overhaul of the code. However, this could require extensive investments, such as rewriting in modern languages, new data constructs, etc., which will necessitate systematic verification and validation to re-establish the credibility of the computational models. The current study advocates using a more incremental approach and is a culmination of several modernization efforts of the legacy code MFIX, which is an open-source computational fluid dynamics code that has evolved over several decades, widely used in multiphase flows and still being developed by the National Energy Technology Laboratory. Two different modernization approaches,'bottom-up' and 'top-down', are illustrated. Preliminary results show up to 8.5x improvement at the selected kernel level with the first approach, and up to 50% improvement in total simulated time with the latter were achieved for the demonstration cases and target HPC systems employed.
Transforming Aggregate Object-Oriented Formal Specifications to Code
1999-03-01
integration issues associated with a formal-based software transformation system, such as the source specification, the problem space architecture , design architecture ... design transforms, and target software transforms. Software is critical in today’s Air Force, yet its specification, design, and development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saad, Tony; Sutherland, James C.
To address the coding and software challenges of modern hybrid architectures, we propose an approach to multiphysics code development for high-performance computing. This approach is based on using a Domain Specific Language (DSL) in tandem with a directed acyclic graph (DAG) representation of the problem to be solved that allows runtime algorithm generation. When coupled with a large-scale parallel framework, the result is a portable development framework capable of executing on hybrid platforms and handling the challenges of multiphysics applications. In addition, we share our experience developing a code in such an environment – an effort that spans an interdisciplinarymore » team of engineers and computer scientists.« less
Saad, Tony; Sutherland, James C.
2016-05-04
To address the coding and software challenges of modern hybrid architectures, we propose an approach to multiphysics code development for high-performance computing. This approach is based on using a Domain Specific Language (DSL) in tandem with a directed acyclic graph (DAG) representation of the problem to be solved that allows runtime algorithm generation. When coupled with a large-scale parallel framework, the result is a portable development framework capable of executing on hybrid platforms and handling the challenges of multiphysics applications. In addition, we share our experience developing a code in such an environment – an effort that spans an interdisciplinarymore » team of engineers and computer scientists.« less
Alview: Portable Software for Viewing Sequence Reads in BAM Formatted Files.
Finney, Richard P; Chen, Qing-Rong; Nguyen, Cu V; Hsu, Chih Hao; Yan, Chunhua; Hu, Ying; Abawi, Massih; Bian, Xiaopeng; Meerzaman, Daoud M
2015-01-01
The name Alview is a contraction of the term Alignment Viewer. Alview is a compiled to native architecture software tool for visualizing the alignment of sequencing data. Inputs are files of short-read sequences aligned to a reference genome in the SAM/BAM format and files containing reference genome data. Outputs are visualizations of these aligned short reads. Alview is written in portable C with optional graphical user interface (GUI) code written in C, C++, and Objective-C. The application can run in three different ways: as a web server, as a command line tool, or as a native, GUI program. Alview is compatible with Microsoft Windows, Linux, and Apple OS X. It is available as a web demo at https://cgwb.nci.nih.gov/cgi-bin/alview. The source code and Windows/Mac/Linux executables are available via https://github.com/NCIP/alview.
GPU accelerated implementation of NCI calculations using promolecular density.
Rubez, Gaëtan; Etancelin, Jean-Matthieu; Vigouroux, Xavier; Krajecki, Michael; Boisson, Jean-Charles; Hénon, Eric
2017-05-30
The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand-protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual-GPU version leads to a 39-fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Deep Learning Methods for Improved Decoding of Linear Codes
NASA Astrophysics Data System (ADS)
Nachmani, Eliya; Marciano, Elad; Lugosch, Loren; Gross, Warren J.; Burshtein, David; Be'ery, Yair
2018-02-01
The problem of low complexity, close to optimal, channel decoding of linear codes with short to moderate block length is considered. It is shown that deep learning methods can be used to improve a standard belief propagation decoder, despite the large example space. Similar improvements are obtained for the min-sum algorithm. It is also shown that tying the parameters of the decoders across iterations, so as to form a recurrent neural network architecture, can be implemented with comparable results. The advantage is that significantly less parameters are required. We also introduce a recurrent neural decoder architecture based on the method of successive relaxation. Improvements over standard belief propagation are also observed on sparser Tanner graph representations of the codes. Furthermore, we demonstrate that the neural belief propagation decoder can be used to improve the performance, or alternatively reduce the computational complexity, of a close to optimal decoder of short BCH codes.
Marsili, Simone; Signorini, Giorgio Federico; Chelli, Riccardo; Marchi, Massimo; Procacci, Piero
2010-04-15
We present the new release of the ORAC engine (Procacci et al., Comput Chem 1997, 18, 1834), a FORTRAN suite to simulate complex biosystems at the atomistic level. The previous release of the ORAC code included multiple time steps integration, smooth particle mesh Ewald method, constant pressure and constant temperature simulations. The present release has been supplemented with the most advanced techniques for enhanced sampling in atomistic systems including replica exchange with solute tempering, metadynamics and steered molecular dynamics. All these computational technologies have been implemented for parallel architectures using the standard MPI communication protocol. ORAC is an open-source program distributed free of charge under the GNU general public license (GPL) at http://www.chim.unifi.it/orac. 2009 Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
Rutishauser, David
2006-01-01
The motivation for this work comes from an observation that amidst the push for Massively Parallel (MP) solutions to high-end computing problems such as numerical physical simulations, large amounts of legacy code exist that are highly optimized for vector supercomputers. Because re-hosting legacy code often requires a complete re-write of the original code, which can be a very long and expensive effort, this work examines the potential to exploit reconfigurable computing machines in place of a vector supercomputer to implement an essentially unmodified legacy source code. Custom and reconfigurable computing resources could be used to emulate an original application's target platform to the extent required to achieve high performance. To arrive at an architecture that delivers the desired performance subject to limited resources involves solving a multi-variable optimization problem with constraints. Prior research in the area of reconfigurable computing has demonstrated that designing an optimum hardware implementation of a given application under hardware resource constraints is an NP-complete problem. The premise of the approach is that the general issue of applying reconfigurable computing resources to the implementation of an application, maximizing the performance of the computation subject to physical resource constraints, can be made a tractable problem by assuming a computational paradigm, such as vector processing. This research contributes a formulation of the problem and a methodology to design a reconfigurable vector processing implementation of a given application that satisfies a performance metric. A generic, parametric, architectural framework for vector processing implemented in reconfigurable logic is developed as a target for a scheduling/mapping algorithm that maps an input computation to a given instance of the architecture. This algorithm is integrated with an optimization framework to arrive at a specification of the architecture parameters that attempts to minimize execution time, while staying within resource constraints. The flexibility of using a custom reconfigurable implementation is exploited in a unique manner to leverage the lessons learned in vector supercomputer development. The vector processing framework is tailored to the application, with variable parameters that are fixed in traditional vector processing. Benchmark data that demonstrates the functionality and utility of the approach is presented. The benchmark data includes an identified bottleneck in a real case study example vector code, the NASA Langley Terminal Area Simulation System (TASS) application.
Interpretive computer simulator for the NASA Standard Spacecraft Computer-2 (NSSC-2)
NASA Technical Reports Server (NTRS)
Smith, R. S.; Noland, M. S.
1979-01-01
An Interpretive Computer Simulator (ICS) for the NASA Standard Spacecraft Computer-II (NSSC-II) was developed as a code verification and testing tool for the Annular Suspension and Pointing System (ASPS) project. The simulator is written in the higher level language PASCAL and implented on the CDC CYBER series computer system. It is supported by a metal assembler, a linkage loader for the NSSC-II, and a utility library to meet the application requirements. The architectural design of the NSSC-II is that of an IBM System/360 (S/360) and supports all but four instructions of the S/360 standard instruction set. The structural design of the ICS is described with emphasis on the design differences between it and the NSSC-II hardware. The program flow is diagrammed, with the function of each procedure being defined; the instruction implementation is discussed in broad terms; and the instruction timings used in the ICS are listed. An example of the steps required to process an assembly level language program on the ICS is included. The example illustrates the control cards necessary to assemble, load, and execute assembly language code; the sample program to to be executed; the executable load module produced by the loader; and the resulting output produced by the ICS.
Building a High Performance Metadata Broker using Clojure, NoSQL and Message Queues
NASA Astrophysics Data System (ADS)
Truslove, I.; Reed, S.
2013-12-01
In practice, Earth and Space Science Informatics often relies on getting more done with less: fewer hardware resources, less IT staff, fewer lines of code. As a capacity-building exercise focused on rapid development of high-performance geoinformatics software, the National Snow and Ice Data Center (NSIDC) built a prototype metadata brokering system using a new JVM language, modern database engines and virtualized or cloud computing resources. The metadata brokering system was developed with the overarching goals of (i) demonstrating a technically viable product with as little development effort as possible, (ii) using very new yet very popular tools and technologies in order to get the most value from the least legacy-encumbered code bases, and (iii) being a high-performance system by using scalable subcomponents, and implementation patterns typically used in web architectures. We implemented the system using the Clojure programming language (an interactive, dynamic, Lisp-like JVM language), Redis (a fast in-memory key-value store) as both the data store for original XML metadata content and as the provider for the message queueing service, and ElasticSearch for its search and indexing capabilities to generate search results. On evaluating the results of the prototyping process, we believe that the technical choices did in fact allow us to do more for less, due to the expressive nature of the Clojure programming language and its easy interoperability with Java libraries, and the successful reuse or re-application of high performance products or designs. This presentation will describe the architecture of the metadata brokering system, cover the tools and techniques used, and describe lessons learned, conclusions, and potential next steps.
Numerical Propulsion System Simulation: A Common Tool for Aerospace Propulsion Being Developed
NASA Technical Reports Server (NTRS)
Follen, Gregory J.; Naiman, Cynthia G.
2001-01-01
The NASA Glenn Research Center is developing an advanced multidisciplinary analysis environment for aerospace propulsion systems called the Numerical Propulsion System Simulation (NPSS). This simulation is initially being used to support aeropropulsion in the analysis and design of aircraft engines. NPSS provides increased flexibility for the user, which reduces the total development time and cost. It is currently being extended to support the Aviation Safety Program and Advanced Space Transportation. NPSS focuses on the integration of multiple disciplines such as aerodynamics, structure, and heat transfer with numerical zooming on component codes. Zooming is the coupling of analyses at various levels of detail. NPSS development includes using the Common Object Request Broker Architecture (CORBA) in the NPSS Developer's Kit to facilitate collaborative engineering. The NPSS Developer's Kit will provide the tools to develop custom components and to use the CORBA capability for zooming to higher fidelity codes, coupling to multidiscipline codes, transmitting secure data, and distributing simulations across different platforms. These powerful capabilities will extend NPSS from a zero-dimensional simulation tool to a multifidelity, multidiscipline system-level simulation tool for the full life cycle of an engine.
Crespo, Alejandro C.; Dominguez, Jose M.; Barreiro, Anxo; Gómez-Gesteira, Moncho; Rogers, Benedict D.
2011-01-01
Smoothed Particle Hydrodynamics (SPH) is a numerical method commonly used in Computational Fluid Dynamics (CFD) to simulate complex free-surface flows. Simulations with this mesh-free particle method far exceed the capacity of a single processor. In this paper, as part of a dual-functioning code for either central processing units (CPUs) or Graphics Processor Units (GPUs), a parallelisation using GPUs is presented. The GPU parallelisation technique uses the Compute Unified Device Architecture (CUDA) of nVidia devices. Simulations with more than one million particles on a single GPU card exhibit speedups of up to two orders of magnitude over using a single-core CPU. It is demonstrated that the code achieves different speedups with different CUDA-enabled GPUs. The numerical behaviour of the SPH code is validated with a standard benchmark test case of dam break flow impacting on an obstacle where good agreement with the experimental results is observed. Both the achieved speed-ups and the quantitative agreement with experiments suggest that CUDA-based GPU programming can be used in SPH methods with efficiency and reliability. PMID:21695185
Karthikeyan, M; Krishnan, S; Pandey, Anil Kumar; Bender, Andreas; Tropsha, Alexander
2008-04-01
We present the application of a Java remote method invocation (RMI) based open source architecture to distributed chemical computing. This architecture was previously employed for distributed data harvesting of chemical information from the Internet via the Google application programming interface (API; ChemXtreme). Due to its open source character and its flexibility, the underlying server/client framework can be quickly adopted to virtually every computational task that can be parallelized. Here, we present the server/client communication framework as well as an application to distributed computing of chemical properties on a large scale (currently the size of PubChem; about 18 million compounds), using both the Marvin toolkit as well as the open source JOELib package. As an application, for this set of compounds, the agreement of log P and TPSA between the packages was compared. Outliers were found to be mostly non-druglike compounds and differences could usually be explained by differences in the underlying algorithms. ChemStar is the first open source distributed chemical computing environment built on Java RMI, which is also easily adaptable to user demands due to its "plug-in architecture". The complete source codes as well as calculated properties along with links to PubChem resources are available on the Internet via a graphical user interface at http://moltable.ncl.res.in/chemstar/.
2006-12-01
Convolutional encoder of rate 1/2 (From [10]). Table 3 shows the puncturing patterns used to derive the different code rates . X precedes Y in the order... convolutional code with puncturing configuration (From [10])......11 Table 4. Mandatory channel coding per modulation (From [10...a concatenation of a Reed– Solomon outer code and a rate -adjustable convolutional inner code . At the transmitter, data shall first be encoded with
Validation of CFD/Heat Transfer Software for Turbine Blade Analysis
NASA Technical Reports Server (NTRS)
Kiefer, Walter D.
2004-01-01
I am an intern in the Turbine Branch of the Turbomachinery and Propulsion Systems Division. The division is primarily concerned with experimental and computational methods of calculating heat transfer effects of turbine blades during operation in jet engines and land-based power systems. These include modeling flow in internal cooling passages and film cooling, as well as calculating heat flux and peak temperatures to ensure safe and efficient operation. The branch is research-oriented, emphasizing the development of tools that may be used by gas turbine designers in industry. The branch has been developing a computational fluid dynamics (CFD) and heat transfer code called GlennHT to achieve the computational end of this analysis. The code was originally written in FORTRAN 77 and run on Silicon Graphics machines. However the code has been rewritten and compiled in FORTRAN 90 to take advantage of more modem computer memory systems. In addition the branch has made a switch in system architectures from SGI's to Linux PC's. The newly modified code therefore needs to be tested and validated. This is the primary goal of my internship. To validate the GlennHT code, it must be run using benchmark fluid mechanics and heat transfer test cases, for which there are either analytical solutions or widely accepted experimental data. From the solutions generated by the code, comparisons can be made to the correct solutions to establish the accuracy of the code. To design and create these test cases, there are many steps and programs that must be used. Before a test case can be run, pre-processing steps must be accomplished. These include generating a grid to describe the geometry, using a software package called GridPro. Also various files required by the GlennHT code must be created including a boundary condition file, a file for multi-processor computing, and a file to describe problem and algorithm parameters. A good deal of this internship will be to become familiar with these programs and the structure of the GlennHT code. Additional information is included in the original extended abstract.
ERIC Educational Resources Information Center
McMinn, William G.
An evaluation and report was done on the status of programs in architecture and related fields in the Florida State University System as a follow-up to a 1983 evaluation. The evaluation involved self-studies prepared by each program and a series of site visits to each of seven campuses and two centers with programs under review. These institutions…
Verifying Architectural Design Rules of the Flight Software Product Line
NASA Technical Reports Server (NTRS)
Ganesan, Dharmalingam; Lindvall, Mikael; Ackermann, Chris; McComas, David; Bartholomew, Maureen
2009-01-01
This paper presents experiences of verifying architectural design rules of the NASA Core Flight Software (CFS) product line implementation. The goal of the verification is to check whether the implementation is consistent with the CFS architectural rules derived from the developer's guide. The results indicate that consistency checking helps a) identifying architecturally significant deviations that were eluded during code reviews, b) clarifying the design rules to the team, and c) assessing the overall implementation quality. Furthermore, it helps connecting business goals to architectural principles, and to the implementation. This paper is the first step in the definition of a method for analyzing and evaluating product line implementations from an architecture-centric perspective.
Design of efficient and simple interface testing equipment for opto-electric tracking system
NASA Astrophysics Data System (ADS)
Liu, Qiong; Deng, Chao; Tian, Jing; Mao, Yao
2016-10-01
Interface testing for opto-electric tracking system is one important work to assure system running performance, aiming to verify the design result of every electronic interface matching the communication protocols or not, by different levels. Opto-electric tracking system nowadays is more complicated, composed of many functional units. Usually, interface testing is executed between units manufactured completely, highly depending on unit design and manufacture progress as well as relative people. As a result, it always takes days or weeks, inefficiently. To solve the problem, this paper promotes an efficient and simple interface testing equipment for opto-electric tracking system, consisting of optional interface circuit card, processor and test program. The hardware cards provide matched hardware interface(s), easily offered from hardware engineer. Automatic code generation technique is imported, providing adaption to new communication protocols. Automatic acquiring items, automatic constructing code architecture and automatic encoding are used to form a new program quickly with adaption. After simple steps, a standard customized new interface testing equipment with matching test program and interface(s) is ready for a waiting-test system in minutes. The efficient and simple interface testing equipment for opto-electric tracking system has worked for many opto-electric tracking system to test entire or part interfaces, reducing test time from days to hours, greatly improving test efficiency, with high software quality and stability, without manual coding. Used as a common tool, the efficient and simple interface testing equipment for opto-electric tracking system promoted by this paper has changed traditional interface testing method and created much higher efficiency.
Sabne, Amit J.; Sakdhnagool, Putt; Lee, Seyong; ...
2015-07-13
Accelerator-based heterogeneous computing is gaining momentum in the high-performance computing arena. However, the increased complexity of heterogeneous architectures demands more generic, high-level programming models. OpenACC is one such attempt to tackle this problem. Although the abstraction provided by OpenACC offers productivity, it raises questions concerning both functional and performance portability. In this article, the authors propose HeteroIR, a high-level, architecture-independent intermediate representation, to map high-level programming models, such as OpenACC, to heterogeneous architectures. They present a compiler approach that translates OpenACC programs into HeteroIR and accelerator kernels to obtain OpenACC functional portability. They then evaluate the performance portability obtained bymore » OpenACC with their approach on 12 OpenACC programs on Nvidia CUDA, AMD GCN, and Intel Xeon Phi architectures. They study the effects of various compiler optimizations and OpenACC program settings on these architectures to provide insights into the achieved performance portability.« less
Analysis of Organizational Architectures for the Air Force Tuition Assistance Program
2003-03-01
FORCE TUITION ASSISTANCE PROGRAM THESIS Krista Zimmerman LaPietra AFIT/GOR/ENS/03-15 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR...ANALYSIS OF ORGANIZATIONAL ARCHITECTURES FOR THE AIR FORCE TUITION ASSISTANCE PROGRAM THESIS Presented to the Faculty Department...ANALYSIS OF ORGANIZATIONAL ARCHITECTURES FOR THE AIR FORCE TUITION ASSISTANCE PROGRAM Krista Zimmerman LaPietra, BS
NASA Astrophysics Data System (ADS)
Sanna, N.; Morelli, G.
2004-09-01
In this paper we present the new version of the SCELib program (CPC Catalogue identifier ADMG) a full numerical implementation of the Single Center Expansion (SCE) method. The physics involved is that of producing the SCE description of molecular electronic densities, of molecular electrostatic potentials and of molecular perturbed potentials due to a point negative or positive charge. This new revision of the program has been optimized to run in serial as well as in parallel execution mode, to support a larger set of molecular symmetries and to permit the restart of long-lasting calculations. To measure the performance of this new release, a comparative study has been carried out on the most powerful computing architectures in serial and parallel runs. The results of the calculations reported in this paper refer to real cases medium to large molecular systems and they are reported in full details to benchmark at best the parallel architectures the new SCELib code will run on. Program summaryTitle of program: SCELib2 Catalogue identifier: ADGU Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADGU Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Reference to previous versions: Comput. Phys. Commun. 128 (2) (2000) 139 (CPC catalogue identifier: ADMG) Does the new version supersede the original program?: Yes Computer for which the program is designed and others on which it has been tested: HP ES45 and rx2600, SUN ES4500, IBM SP and any single CPU workstation based on Alpha, SPARC, POWER, Itanium2 and X86 processors Installations: CASPUR, local Operating systems under which the program has been tested: HP Tru64 V5.X, SUNOS V5.8, IBM AIX V5.X, Linux RedHat V8.0 Programming language used: C Memory required to execute with typical data: 10 Mwords. Up to 2000 Mwords depending on the molecular system and runtime parameters No. of bits in a word: 64 No. of processors used: 1 to 32 Has the code been vectorized or parallelized?: Yes No. of bytes in distributed program, including test data, etc.: 3 798 507 No. of lines in distributed program, including test data, etc.: 187 226 Distribution format: tar.gz Nature of physical problem: In this set of codes an efficient procedure is implemented to describe the wavefunction and related molecular properties of a polyatomic molecular system within the Single Center of Expansion (SCE) approximation. The resulting SCE wavefunction, electron density, electrostatic and exchange/correlation potentials can then be used via a proper Application Programming Interface (API) to describe the target molecular system which can be employed in electron-molecule scattering calculations. The molecular properties expanded over a single center turn out to also be of more general application and some possible uses in quantum chemistry, biomodelling and drug design are also outlined. Method of solution: The polycentre Hartee-Fock solution for a molecule of arbitrary geometry, based on linear combination of Gaussian-Type Orbital (GTO), is expanded over a single center, typically the Center Of Mass (C.O.M.), by means of a Gauss-Legendre/Chebyschev quadrature over the θ, φ angular coordinates. The resulting SCE numerical wavefunction is then used to calculate the one-particle electron density, the electrostatic potential and two different models for the correlation/polarization potentials induced by the impinging electron, which have the correct asymptotic behaviour for the leading dipole molecular polarizabilities. Restrictions on the complexity of the problem: Depending on the molecular system under study and on the operating conditions the program may or may not fit into available RAM memory. In this case a feature of the program is to memory map a disk file in order to efficiently access the memory data through a disk device. Typical running time: The execution time strongly depends on the molecular target description and on the hardware/OS chosen, it is directly proportional to the ( r, θ, φ) grid size and to the number of angular basis functions used. Thus, from the program printout of the main arrays memory occupancy, the user can approximately derive the expected computer time needed for a given calculation executed in serial mode. For parallel executions the overall efficiency must be further taken into account, and this depends on the no. of processors used as well as on the parallel architecture chosen, so a simple general law is at present not determinable. Unusual features of the program: The code has been engineered to use dynamical, runtime determined, global parameters with the aim to have all the data fitted in the RAM memory. Some unusual circumstances, e.g., when using large values of those parameters, may cause the program to run with unexpected performance reductions due to runtime bottlenecks like those caused by memory swap operations which strongly depend on the hardware used. In such cases, a parallel execution of the code is generally sufficient to fix the problem since the data size is partitioned over the available processors. When a suitable parallel system is not available for execution, a mechanism of memory mapped file can be used; with this option on, all the available memory will be used as a buffer for a disk file which contains the whole data set, thus having a better throughput with respect to the traditional swapping/paging of the Unix OS.
NASA Astrophysics Data System (ADS)
Ren, Danping; Wu, Shanshan; Zhang, Lijing
2016-09-01
In view of the characteristics of the global control and flexible monitor of software-defined networks (SDN), we proposes a new optical access network architecture dedicated to Wavelength Division Multiplexing-Passive Optical Network (WDM-PON) systems based on SDN. The network coding (NC) technology is also applied into this architecture to enhance the utilization of wavelength resource and reduce the costs of light source. Simulation results show that this scheme can optimize the throughput of the WDM-PON network, greatly reduce the system time delay and energy consumption.
Highly parallel sparse Cholesky factorization
NASA Technical Reports Server (NTRS)
Gilbert, John R.; Schreiber, Robert
1990-01-01
Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak
1999-01-01
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
Component-based integration of chemistry and optimization software.
Kenny, Joseph P; Benson, Steven J; Alexeev, Yuri; Sarich, Jason; Janssen, Curtis L; McInnes, Lois Curfman; Krishnan, Manojkumar; Nieplocha, Jarek; Jurrus, Elizabeth; Fahlstrom, Carl; Windus, Theresa L
2004-11-15
Typical scientific software designs make rigid assumptions regarding programming language and data structures, frustrating software interoperability and scientific collaboration. Component-based software engineering is an emerging approach to managing the increasing complexity of scientific software. Component technology facilitates code interoperability and reuse. Through the adoption of methodology and tools developed by the Common Component Architecture Forum, we have developed a component architecture for molecular structure optimization. Using the NWChem and Massively Parallel Quantum Chemistry packages, we have produced chemistry components that provide capacity for energy and energy derivative evaluation. We have constructed geometry optimization applications by integrating the Toolkit for Advanced Optimization, Portable Extensible Toolkit for Scientific Computation, and Global Arrays packages, which provide optimization and linear algebra capabilities. We present a brief overview of the component development process and a description of abstract interfaces for chemical optimizations. The components conforming to these abstract interfaces allow the construction of applications using different chemistry and mathematics packages interchangeably. Initial numerical results for the component software demonstrate good performance, and highlight potential research enabled by this platform.
NELS 2.0 - A general system for enterprise wide information management
NASA Technical Reports Server (NTRS)
Smith, Stephanie L.
1993-01-01
NELS, the NASA Electronic Library System, is an information management tool for creating distributed repositories of documents, drawings, and code for use and reuse by the aerospace community. The NELS retrieval engine can load metadata and source files of full text objects, perform natural language queries to retrieve ranked objects, and create links to connect user interfaces. For flexibility, the NELS architecture has layered interfaces between the application program and the stored library information. The session manager provides the interface functions for development of NELS applications. The data manager is an interface between session manager and the structured data system. The center of the structured data system is the Wide Area Information Server. This system architecture provides access to information across heterogeneous platforms in a distributed environment. There are presently three user interfaces that connect to the NELS engine; an X-Windows interface, and ASCII interface and the Spatial Data Management System. This paper describes the design and operation of NELS as an information management tool and repository.
Automated Generation of Message-Passing Programs: An Evaluation Using CAPTools
NASA Technical Reports Server (NTRS)
Hribar, Michelle R.; Jin, Haoqiang; Yan, Jerry C.; Saini, Subhash (Technical Monitor)
1998-01-01
Scientists at NASA Ames Research Center have been developing computational aeroscience applications on highly parallel architectures over the past ten years. During that same time period, a steady transition of hardware and system software also occurred, forcing us to expend great efforts into migrating and re-coding our applications. As applications and machine architectures become increasingly complex, the cost and time required for this process will become prohibitive. In this paper, we present the first set of results in our evaluation of interactive parallelization tools. In particular, we evaluate CAPTool's ability to parallelize computational aeroscience applications. CAPTools was tested on serial versions of the NAS Parallel Benchmarks and ARC3D, a computational fluid dynamics application, on two platforms: the SGI Origin 2000 and the Cray T3E. This evaluation includes performance, amount of user interaction required, limitations and portability. Based on these results, a discussion on the feasibility of computer aided parallelization of aerospace applications is presented along with suggestions for future work.
Shrimankar, D D; Sathe, S R
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Shrimankar, D. D.; Sathe, S. R.
2016-01-01
Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
2DRMP: A suite of two-dimensional R-matrix propagation codes
NASA Astrophysics Data System (ADS)
Scott, N. S.; Scott, M. P.; Burke, P. G.; Stitt, T.; Faro-Maza, V.; Denis, C.; Maniopoulou, A.
2009-12-01
The R-matrix method has proved to be a remarkably stable, robust and efficient technique for solving the close-coupling equations that arise in electron and photon collisions with atoms, ions and molecules. During the last thirty-four years a series of related R-matrix program packages have been published periodically in CPC. These packages are primarily concerned with low-energy scattering where the incident energy is insufficient to ionise the target. In this paper we describe 2DRMP, a suite of two-dimensional R-matrix propagation programs aimed at creating virtual experiments on high performance and grid architectures to enable the study of electron scattering from H-like atoms and ions at intermediate energies. Program summaryProgram title: 2DRMP Catalogue identifier: AEEA_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEA_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 196 717 No. of bytes in distributed program, including test data, etc.: 3 819 727 Distribution format: tar.gz Programming language: Fortran 95, MPI Computer: Tested on CRAY XT4 [1]; IBM eServer 575 [2]; Itanium II cluster [3] Operating system: Tested on UNICOS/lc [1]; IBM AIX [2]; Red Hat Linux Enterprise AS [3] Has the code been vectorised or parallelised?: Yes. 16 cores were used for small test run Classification: 2.4 External routines: BLAS, LAPACK, PBLAS, ScaLAPACK Subprograms used: ADAZ_v1_1 Nature of problem: 2DRMP is a suite of programs aimed at creating virtual experiments on high performance architectures to enable the study of electron scattering from H-like atoms and ions at intermediate energies. Solution method: Two-dimensional R-matrix propagation theory. The (r,r) space of the internal region is subdivided into a number of subregions. Local R-matrices are constructed within each subregion and used to propagate a global R-matrix, ℜ, across the internal region. On the boundary of the internal region ℜ is transformed onto the IERM target state basis. Thus, the two-dimensional R-matrix propagation technique transforms an intractable problem into a series of tractable problems enabling the internal region to be extended far beyond that which is possible with the standard one-sector codes. A distinctive feature of the method is that both electrons are treated identically and the R-matrix basis states are constructed to allow for both electrons to be in the continuum. The subregion size is flexible and can be adjusted to accommodate the number of cores available. Restrictions: The implementation is currently restricted to electron scattering from H-like atoms and ions. Additional comments: The programs have been designed to operate on serial computers and to exploit the distributed memory parallelism found on tightly coupled high performance clusters and supercomputers. 2DRMP has been systematically and comprehensively documented using ROBODoc [4] which is an API documentation tool that works by extracting specially formatted headers from the program source code and writing them to documentation files. Running time: The wall clock running time for the small test run using 16 cores and performed on [3] is as follows: bp (7 s); rint2 (34 s); newrd (32 s); diag (21 s); amps (11 s); prop (24 s). References:HECToR, CRAY XT4 running UNICOS/lc, http://www.hector.ac.uk/, accessed 22 July, 2009. HPCx, IBM eServer 575 running IBM AIX, http://www.hpcx.ac.uk/, accessed 22 July, 2009. HP Cluster, Itanium II cluster running Red Hat Linux Enterprise AS, Queen s University Belfast, http://www.qub.ac.uk/directorates/InformationServices/Research/HighPerformanceComputing/Services/Hardware/HPResearch/, accessed 22 July, 2009. Automating Software Documentation with ROBODoc, http://www.xs4all.nl/~rfsber/Robo/, accessed 22 July, 2009.
The emerging complexity of ubiquitin architecture.
Ohtake, Fumiaki; Tsuchiya, Hikaru
2017-02-01
Ubiquitylation is an essential post-translational modification (PTM) of proteins with diverse cellular functions. Polyubiquitin chains with different topologies have different cellular roles, and are referred to as a 'ubiquitin code'. Recent studies have begun to reveal that more complex ubiquitin architectures function as important signals in several biological pathways. These include PTMs of ubiquitin itself, such as acetylated ubiquitin and phospho-ubiquitin. Moreover, important roles for heterogeneous polyubiquitin chains, such as mixed or branched chains, have been reported, which significantly increase the diversity of the ubiquitin code. In this review, we describe mass spectrometry-based methods to characterize the ubiquitin signal. We also describe recent advances in our understanding of complex ubiquitin architectures, including our own findings concerning ubiquitin acetylation and branching within polyubiquitin chains. © The Authors 2016. Published by Oxford University Press on behalf of the Japanese Biochemical Society. All rights reserved.
NASA Technical Reports Server (NTRS)
Hribar, Michelle R.; Frumkin, Michael; Jin, Haoqiang; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1998-01-01
Over the past decade, high performance computing has evolved rapidly; systems based on commodity microprocessors have been introduced in quick succession from at least seven vendors/families. Porting codes to every new architecture is a difficult problem; in particular, here at NASA, there are many large CFD applications that are very costly to port to new machines by hand. The LCM ("Legacy Code Modernization") Project is the development of an integrated parallelization environment (IPE) which performs the automated mapping of legacy CFD (Fortran) applications to state-of-the-art high performance computers. While most projects to port codes focus on the parallelization of the code, we consider porting to be an iterative process consisting of several steps: 1) code cleanup, 2) serial optimization,3) parallelization, 4) performance monitoring and visualization, 5) intelligent tools for automated tuning using performance prediction and 6) machine specific optimization. The approach for building this parallelization environment is to build the components for each of the steps simultaneously and then integrate them together. The demonstration will exhibit our latest research in building this environment: 1. Parallelizing tools and compiler evaluation. 2. Code cleanup and serial optimization using automated scripts 3. Development of a code generator for performance prediction 4. Automated partitioning 5. Automated insertion of directives. These demonstrations will exhibit the effectiveness of an automated approach for all the steps involved with porting and tuning a legacy code application for a new architecture.
A heterogeneous hierarchical architecture for real-time computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Skroch, D.A.; Fornaro, R.J.
The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-05-17
... CFR Parts 1724 and 1726 RIN 0572-AC20 Electric Engineering, Architectural Services, Design Policies... standard forms of contracts promulgated by RUS for construction, procurement, engineering services and... XVII of title 7 of the Code of Federal Regulations as follows: PART 1724--ELECTRIC ENGINEERING...
Fast quantum Monte Carlo on a GPU
NASA Astrophysics Data System (ADS)
Lutsyshyn, Y.
2015-02-01
We present a scheme for the parallelization of quantum Monte Carlo method on graphical processing units, focusing on variational Monte Carlo simulation of bosonic systems. We use asynchronous execution schemes with shared memory persistence, and obtain an excellent utilization of the accelerator. The CUDA code is provided along with a package that simulates liquid helium-4. The program was benchmarked on several models of Nvidia GPU, including Fermi GTX560 and M2090, and the Kepler architecture K20 GPU. Special optimization was developed for the Kepler cards, including placement of data structures in the register space of the Kepler GPUs. Kepler-specific optimization is discussed.
77 FR 74178 - Notice of Intent To Grant Exclusive Patent License: Kismet Management Fund LLC
Federal Register 2010, 2011, 2012, 2013, 2014
2012-12-13
...,248: Program Control for Resource Management Architecture and Corresponding Programs//U.S. Patent No. 7,171,654: System Specification Language for Resource Management Architecture and Corresponding... Architecture and Corresponding Programs//U.S. Patent No. 7,552,438: Resource Management Device. DATES: Anyone...
Architectural/Building Programming: An Annotated Bibliography.
ERIC Educational Resources Information Center
Turner, George E.
This 34-item bibliography brings together the random articles about programing, which have appeared in a variety of publications, to establish a resource for architectural students and practitioners who need a clearer understanding of the nature of architectural programing. The entries are divided into those that serve as an introduction to either…
High Performance Input/Output for Parallel Computer Systems
NASA Technical Reports Server (NTRS)
Ligon, W. B.
1996-01-01
The goal of our project is to study the I/O characteristics of parallel applications used in Earth Science data processing systems such as Regional Data Centers (RDCs) or EOSDIS. Our approach is to study the runtime behavior of typical programs and the effect of key parameters of the I/O subsystem both under simulation and with direct experimentation on parallel systems. Our three year activity has focused on two items: developing a test bed that facilitates experimentation with parallel I/O, and studying representative programs from the Earth science data processing application domain. The Parallel Virtual File System (PVFS) has been developed for use on a number of platforms including the Tiger Parallel Architecture Workbench (TPAW) simulator, The Intel Paragon, a cluster of DEC Alpha workstations, and the Beowulf system (at CESDIS). PVFS provides considerable flexibility in configuring I/O in a UNIX- like environment. Access to key performance parameters facilitates experimentation. We have studied several key applications fiom levels 1,2 and 3 of the typical RDC processing scenario including instrument calibration and navigation, image classification, and numerical modeling codes. We have also considered large-scale scientific database codes used to organize image data.
lpNet: a linear programming approach to reconstruct signal transduction networks.
Matos, Marta R A; Knapp, Bettina; Kaderali, Lars
2015-10-01
With the widespread availability of high-throughput experimental technologies it has become possible to study hundreds to thousands of cellular factors simultaneously, such as coding- or non-coding mRNA or protein concentrations. Still, extracting information about the underlying regulatory or signaling interactions from these data remains a difficult challenge. We present a flexible approach towards network inference based on linear programming. Our method reconstructs the interactions of factors from a combination of perturbation/non-perturbation and steady-state/time-series data. We show both on simulated and real data that our methods are able to reconstruct the underlying networks fast and efficiently, thus shedding new light on biological processes and, in particular, into disease's mechanisms of action. We have implemented the approach as an R package available through bioconductor. This R package is freely available under the Gnu Public License (GPL-3) from bioconductor.org (http://bioconductor.org/packages/release/bioc/html/lpNet.html) and is compatible with most operating systems (Windows, Linux, Mac OS) and hardware architectures. bettina.knapp@helmholtz-muenchen.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Huffman coding in advanced audio coding standard
NASA Astrophysics Data System (ADS)
Brzuchalski, Grzegorz
2012-05-01
This article presents several hardware architectures of Advanced Audio Coding (AAC) Huffman noiseless encoder, its optimisations and working implementation. Much attention has been paid to optimise the demand of hardware resources especially memory size. The aim of design was to get as short binary stream as possible in this standard. The Huffman encoder with whole audio-video system has been implemented in FPGA devices.
Hypercube matrix computation task
NASA Technical Reports Server (NTRS)
Calalo, R.; Imbriale, W.; Liewer, P.; Lyons, J.; Manshadi, F.; Patterson, J.
1987-01-01
The Hypercube Matrix Computation (Year 1986-1987) task investigated the applicability of a parallel computing architecture to the solution of large scale electromagnetic scattering problems. Two existing electromagnetic scattering codes were selected for conversion to the Mark III Hypercube concurrent computing environment. They were selected so that the underlying numerical algorithms utilized would be different thereby providing a more thorough evaluation of the appropriateness of the parallel environment for these types of problems. The first code was a frequency domain method of moments solution, NEC-2, developed at Lawrence Livermore National Laboratory. The second code was a time domain finite difference solution of Maxwell's equations to solve for the scattered fields. Once the codes were implemented on the hypercube and verified to obtain correct solutions by comparing the results with those from sequential runs, several measures were used to evaluate the performance of the two codes. First, a comparison was provided of the problem size possible on the hypercube with 128 megabytes of memory for a 32-node configuration with that available in a typical sequential user environment of 4 to 8 megabytes. Then, the performance of the codes was anlyzed for the computational speedup attained by the parallel architecture.
Gel, Aytekin; Hu, Jonathan; Ould-Ahmed-Vall, ElMoustapha; ...
2017-03-20
Legacy codes remain a crucial element of today's simulation-based engineering ecosystem due to the extensive validation process and investment in such software. The rapid evolution of high-performance computing architectures necessitates the modernization of these codes. One approach to modernization is a complete overhaul of the code. However, this could require extensive investments, such as rewriting in modern languages, new data constructs, etc., which will necessitate systematic verification and validation to re-establish the credibility of the computational models. The current study advocates using a more incremental approach and is a culmination of several modernization efforts of the legacy code MFIX, whichmore » is an open-source computational fluid dynamics code that has evolved over several decades, widely used in multiphase flows and still being developed by the National Energy Technology Laboratory. Two different modernization approaches,‘bottom-up’ and ‘top-down’, are illustrated. Here, preliminary results show up to 8.5x improvement at the selected kernel level with the first approach, and up to 50% improvement in total simulated time with the latter were achieved for the demonstration cases and target HPC systems employed.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gel, Aytekin; Hu, Jonathan; Ould-Ahmed-Vall, ElMoustapha
Legacy codes remain a crucial element of today's simulation-based engineering ecosystem due to the extensive validation process and investment in such software. The rapid evolution of high-performance computing architectures necessitates the modernization of these codes. One approach to modernization is a complete overhaul of the code. However, this could require extensive investments, such as rewriting in modern languages, new data constructs, etc., which will necessitate systematic verification and validation to re-establish the credibility of the computational models. The current study advocates using a more incremental approach and is a culmination of several modernization efforts of the legacy code MFIX, whichmore » is an open-source computational fluid dynamics code that has evolved over several decades, widely used in multiphase flows and still being developed by the National Energy Technology Laboratory. Two different modernization approaches,‘bottom-up’ and ‘top-down’, are illustrated. Here, preliminary results show up to 8.5x improvement at the selected kernel level with the first approach, and up to 50% improvement in total simulated time with the latter were achieved for the demonstration cases and target HPC systems employed.« less
Deploying electromagnetic particle-in-cell (EM-PIC) codes on Xeon Phi accelerators boards
NASA Astrophysics Data System (ADS)
Fonseca, Ricardo
2014-10-01
The complexity of the phenomena involved in several relevant plasma physics scenarios, where highly nonlinear and kinetic processes dominate, makes purely theoretical descriptions impossible. Further understanding of these scenarios requires detailed numerical modeling, but fully relativistic particle-in-cell codes such as OSIRIS are computationally intensive. The quest towards Exaflop computer systems has lead to the development of HPC systems based on add-on accelerator cards, such as GPGPUs and more recently the Xeon Phi accelerators that power the current number 1 system in the world. These cards, also referred to as Intel Many Integrated Core Architecture (MIC) offer peak theoretical performances of >1 TFlop/s for general purpose calculations in a single board, and are receiving significant attention as an attractive alternative to CPUs for plasma modeling. In this work we report on our efforts towards the deployment of an EM-PIC code on a Xeon Phi architecture system. We will focus on the parallelization and vectorization strategies followed, and present a detailed performance evaluation of code performance in comparison with the CPU code.
Parallelization of a Monte Carlo particle transport simulation code
NASA Astrophysics Data System (ADS)
Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.
2010-05-01
We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Accelerating Monte Carlo simulations with an NVIDIA ® graphics processor
NASA Astrophysics Data System (ADS)
Martinsen, Paul; Blaschke, Johannes; Künnemeyer, Rainer; Jordan, Robert
2009-10-01
Modern graphics cards, commonly used in desktop computers, have evolved beyond a simple interface between processor and display to incorporate sophisticated calculation engines that can be applied to general purpose computing. The Monte Carlo algorithm for modelling photon transport in turbid media has been implemented on an NVIDIA ® 8800 GT graphics card using the CUDA toolkit. The Monte Carlo method relies on following the trajectory of millions of photons through the sample, often taking hours or days to complete. The graphics-processor implementation, processing roughly 110 million scattering events per second, was found to run more than 70 times faster than a similar, single-threaded implementation on a 2.67 GHz desktop computer. Program summaryProgram title: Phoogle-C/Phoogle-G Catalogue identifier: AEEB_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEB_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 51 264 No. of bytes in distributed program, including test data, etc.: 2 238 805 Distribution format: tar.gz Programming language: C++ Computer: Designed for Intel PCs. Phoogle-G requires a NVIDIA graphics card with support for CUDA 1.1 Operating system: Windows XP Has the code been vectorised or parallelized?: Phoogle-G is written for SIMD architectures RAM: 1 GB Classification: 21.1 External routines: Charles Karney Random number library. Microsoft Foundation Class library. NVIDA CUDA library [1]. Nature of problem: The Monte Carlo technique is an effective algorithm for exploring the propagation of light in turbid media. However, accurate results require tracing the path of many photons within the media. The independence of photons naturally lends the Monte Carlo technique to implementation on parallel architectures. Generally, parallel computing can be expensive, but recent advances in consumer grade graphics cards have opened the possibility of high-performance desktop parallel-computing. Solution method: In this pair of programmes we have implemented the Monte Carlo algorithm described by Prahl et al. [2] for photon transport in infinite scattering media to compare the performance of two readily accessible architectures: a standard desktop PC and a consumer grade graphics card from NVIDIA. Restrictions: The graphics card implementation uses single precision floating point numbers for all calculations. Only photon transport from an isotropic point-source is supported. The graphics-card version has no user interface. The simulation parameters must be set in the source code. The desktop version has a simple user interface; however some properties can only be accessed through an ActiveX client (such as Matlab). Additional comments: The random number library used has a LGPL ( http://www.gnu.org/copyleft/lesser.html) licence. Running time: Runtime can range from minutes to months depending on the number of photons simulated and the optical properties of the medium. References:http://www.nvidia.com/object/cuda_home.html. S. Prahl, M. Keijzer, Sl. Jacques, A. Welch, SPIE Institute Series 5 (1989) 102.
Parallel Semi-Implicit Spectral Element Atmospheric Model
NASA Astrophysics Data System (ADS)
Fournier, A.; Thomas, S.; Loft, R.
2001-05-01
The shallow-water equations (SWE) have long been used to test atmospheric-modeling numerical methods. The SWE contain essential wave-propagation and nonlinear effects of more complete models. We present a semi-implicit (SI) improvement of the Spectral Element Atmospheric Model to solve the SWE (SEAM, Taylor et al. 1997, Fournier et al. 2000, Thomas & Loft 2000). SE methods are h-p finite element methods combining the geometric flexibility of size-h finite elements with the accuracy of degree-p spectral methods. Our work suggests that exceptional parallel-computation performance is achievable by a General-Circulation-Model (GCM) dynamical core, even at modest climate-simulation resolutions (>1o). The code derivation involves weak variational formulation of the SWE, Gauss(-Lobatto) quadrature over the collocation points, and Legendre cardinal interpolators. Appropriate weak variation yields a symmetric positive-definite Helmholtz operator. To meet the Ladyzhenskaya-Babuska-Brezzi inf-sup condition and avoid spurious modes, we use a staggered grid. The SI scheme combines leapfrog and Crank-Nicholson schemes for the nonlinear and linear terms respectively. The localization of operations to elements ideally fits the method to cache-based microprocessor computer architectures --derivatives are computed as collections of small (8x8), naturally cache-blocked matrix-vector products. SEAM also has desirable boundary-exchange communication, like finite-difference models. Timings on on the IBM SP and Compaq ES40 supercomputers indicate that the SI code (20-min timestep) requires 1/3 the CPU time of the explicit code (2-min timestep) for T42 resolutions. Both codes scale nearly linearly out to 400 processors. We achieved single-processor performance up to 30% of peak for both codes on the 375-MHz IBM Power-3 processors. Fast computation and linear scaling lead to a useful climate-simulation dycore only if enough model time is computed per unit wall-clock time. An efficient SI solver is essential to substantially increase this rate. Parallel preconditioning for an iterative conjugate-gradient elliptic solver is described. We are building a GCM dycore capable of 200 GF% lOPS sustained performance on clustered RISC/cache architectures using hybrid MPI/OpenMP programming.
Bumper 3 Update for IADC Protection Manual
NASA Technical Reports Server (NTRS)
Christiansen, Eric L.; Nagy, Kornel; Hyde, Jim
2016-01-01
The Bumper code has been the standard in use by NASA and contractors to perform meteoroid/debris risk assessments since 1990. It has undergone extensive revisions and updates [NASA JSC HITF website; Christiansen et al., 1992, 1997]. NASA Johnson Space Center (JSC) has applied BUMPER to risk assessments for Space Station, Shuttle, Mir, Extravehicular Mobility Units (EMU) space suits, and other spacecraft (e.g., LDEF, Iridium, TDRS, and Hubble Space Telescope). Bumper continues to be updated with changes in the ballistic limit equations describing failure threshold of various spacecraft components, as well as changes in the meteoroid and debris environment models. Significant efforts are expended to validate Bumper and benchmark it to other meteoroid/debris risk assessment codes. Bumper 3 is a refactored version of Bumper II. The structure of the code was extensively modified to improve maintenance, performance and flexibility. The architecture was changed to separate the frequently updated ballistic limit equations from the relatively stable common core functions of the program. These updates allow NASA to produce specific editions of the Bumper 3 that are tailored for specific customer requirements. The core consists of common code necessary to process the Micrometeoroid and Orbital Debris (MMOD) environment models, assess shadowing and calculate MMOD risk. The library of target response subroutines includes a board range of different types of MMOD shield ballistic limit equations as well as equations describing damage to various spacecraft subsystems or hardware (thermal protection materials, windows, radiators, solar arrays, cables, etc.). The core and library of ballistic response subroutines are maintained under configuration control. A change in the core will affect all editions of the code, whereas a change in one or more of the response subroutines will affect all editions of the code that contain the particular response subroutines which are modified. Note that the Bumper II program is no longer maintained or distributed by NASA.
Examining the architecture of cellular computing through a comparative study with a computer
Wang, Degeng; Gribskov, Michael
2005-01-01
The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software–hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's ‘hardware’ equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the ‘bandwidth’ of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed. PMID:16849179
Examining the architecture of cellular computing through a comparative study with a computer.
Wang, Degeng; Gribskov, Michael
2005-06-22
The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software-hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's "hardware" equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the "bandwidth" of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed.
NASA Astrophysics Data System (ADS)
Tanikawa, Ataru; Yoshikawa, Kohji; Okamoto, Takashi; Nitadori, Keigo
2012-02-01
We present a high-performance N-body code for self-gravitating collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8 MB cache and 3.40 GHz) based on Sandy Bridge micro-architecture, we implemented a fourth-order Hermite scheme with individual timestep scheme ( Makino and Aarseth, 1992), and achieved the performance of ˜20 giga floating point number operations per second (GFLOPS) for double-precision accuracy, which is two times and five times higher than that of the previously developed code implemented with the SSE instructions ( Nitadori et al., 2006b), and that of a code implemented without any explicit use of SIMD instructions with the same processor core, respectively. We have parallelized the code by using so-called NINJA scheme ( Nitadori et al., 2006a), and achieved ˜90 GFLOPS for a system containing more than N = 8192 particles with 8 MPI processes on four cores. We expect to achieve about 10 tera FLOPS (TFLOPS) for a self-gravitating collisional system with N ˜ 10 5 on massively parallel systems with at most 800 cores with Sandy Bridge micro-architecture. This performance will be comparable to that of Graphic Processing Unit (GPU) cluster systems, such as the one with about 200 Tesla C1070 GPUs ( Spurzem et al., 2010). This paper offers an alternative to collisional N-body simulations with GRAPEs and GPUs.
Testolin, Alberto; De Filippo De Grazia, Michele; Zorzi, Marco
2017-01-01
The recent "deep learning revolution" in artificial neural networks had strong impact and widespread deployment for engineering applications, but the use of deep learning for neurocomputational modeling has been so far limited. In this article we argue that unsupervised deep learning represents an important step forward for improving neurocomputational models of perception and cognition, because it emphasizes the role of generative learning as opposed to discriminative (supervised) learning. As a case study, we present a series of simulations investigating the emergence of neural coding of visual space for sensorimotor transformations. We compare different network architectures commonly used as building blocks for unsupervised deep learning by systematically testing the type of receptive fields and gain modulation developed by the hidden neurons. In particular, we compare Restricted Boltzmann Machines (RBMs), which are stochastic, generative networks with bidirectional connections trained using contrastive divergence, with autoencoders, which are deterministic networks trained using error backpropagation. For both learning architectures we also explore the role of sparse coding, which has been identified as a fundamental principle of neural computation. The unsupervised models are then compared with supervised, feed-forward networks that learn an explicit mapping between different spatial reference frames. Our simulations show that both architectural and learning constraints strongly influenced the emergent coding of visual space in terms of distribution of tuning functions at the level of single neurons. Unsupervised models, and particularly RBMs, were found to more closely adhere to neurophysiological data from single-cell recordings in the primate parietal cortex. These results provide new insights into how basic properties of artificial neural networks might be relevant for modeling neural information processing in biological systems.
Testolin, Alberto; De Filippo De Grazia, Michele; Zorzi, Marco
2017-01-01
The recent “deep learning revolution” in artificial neural networks had strong impact and widespread deployment for engineering applications, but the use of deep learning for neurocomputational modeling has been so far limited. In this article we argue that unsupervised deep learning represents an important step forward for improving neurocomputational models of perception and cognition, because it emphasizes the role of generative learning as opposed to discriminative (supervised) learning. As a case study, we present a series of simulations investigating the emergence of neural coding of visual space for sensorimotor transformations. We compare different network architectures commonly used as building blocks for unsupervised deep learning by systematically testing the type of receptive fields and gain modulation developed by the hidden neurons. In particular, we compare Restricted Boltzmann Machines (RBMs), which are stochastic, generative networks with bidirectional connections trained using contrastive divergence, with autoencoders, which are deterministic networks trained using error backpropagation. For both learning architectures we also explore the role of sparse coding, which has been identified as a fundamental principle of neural computation. The unsupervised models are then compared with supervised, feed-forward networks that learn an explicit mapping between different spatial reference frames. Our simulations show that both architectural and learning constraints strongly influenced the emergent coding of visual space in terms of distribution of tuning functions at the level of single neurons. Unsupervised models, and particularly RBMs, were found to more closely adhere to neurophysiological data from single-cell recordings in the primate parietal cortex. These results provide new insights into how basic properties of artificial neural networks might be relevant for modeling neural information processing in biological systems. PMID:28377709
Automating tasks in protein structure determination with the clipper python module.
McNicholas, Stuart; Croll, Tristan; Burnley, Tom; Palmer, Colin M; Hoh, Soon Wen; Jenkins, Huw T; Dodson, Eleanor; Cowtan, Kevin; Agirre, Jon
2018-01-01
Scripting programming languages provide the fastest means of prototyping complex functionality. Those with a syntax and grammar resembling human language also greatly enhance the maintainability of the produced source code. Furthermore, the combination of a powerful, machine-independent scripting language with binary libraries tailored for each computer architecture allows programs to break free from the tight boundaries of efficiency traditionally associated with scripts. In the present work, we describe how an efficient C++ crystallographic library such as Clipper can be wrapped, adapted and generalized for use in both crystallographic and electron cryo-microscopy applications, scripted with the Python language. We shall also place an emphasis on best practices in automation, illustrating how this can be achieved with this new Python module. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Hierarchical algorithms for modeling the ocean on hierarchical architectures
NASA Astrophysics Data System (ADS)
Hill, C. N.
2012-12-01
This presentation will describe an approach to using accelerator/co-processor technology that maps hierarchical, multi-scale modeling techniques to an underlying hierarchical hardware architecture. The focus of this work is on making effective use of both CPU and accelerator/co-processor parts of a system, for large scale ocean modeling. In the work, a lower resolution basin scale ocean model is locally coupled to multiple, "embedded", limited area higher resolution sub-models. The higher resolution models execute on co-processor/accelerator hardware and do not interact directly with other sub-models. The lower resolution basin scale model executes on the system CPU(s). The result is a multi-scale algorithm that aligns with hardware designs in the co-processor/accelerator space. We demonstrate this approach being used to substitute explicit process models for standard parameterizations. Code for our sub-models is implemented through a generic abstraction layer, so that we can target multiple accelerator architectures with different programming environments. We will present two application and implementation examples. One uses the CUDA programming environment and targets GPU hardware. This example employs a simple non-hydrostatic two dimensional sub-model to represent vertical motion more accurately. The second example uses a highly threaded three-dimensional model at high resolution. This targets a MIC/Xeon Phi like environment and uses sub-models as a way to explicitly compute sub-mesoscale terms. In both cases the accelerator/co-processor capability provides extra compute cycles that allow improved model fidelity for little or no extra wall-clock time cost.
NASA Astrophysics Data System (ADS)
Rossi, Francesco; Londrillo, Pasquale; Sgattoni, Andrea; Sinigardi, Stefano; Turchetti, Giorgio
2012-12-01
We present `jasmine', an implementation of a fully relativistic, 3D, electromagnetic Particle-In-Cell (PIC) code, capable of running simulations in various laser plasma acceleration regimes on Graphics-Processing-Units (GPUs) HPC clusters. Standard energy/charge preserving FDTD-based algorithms have been implemented using double precision and quadratic (or arbitrary sized) shape functions for the particle weighting. When porting a PIC scheme to the GPU architecture (or, in general, a shared memory environment), the particle-to-grid operations (e.g. the evaluation of the current density) require special care to avoid memory inconsistencies and conflicts. Here we present a robust implementation of this operation that is efficient for any number of particles per cell and particle shape function order. Our algorithm exploits the exposed GPU memory hierarchy and avoids the use of atomic operations, which can hurt performance especially when many particles lay on the same cell. We show the code multi-GPU scalability results and present a dynamic load-balancing algorithm. The code is written using a python-based C++ meta-programming technique which translates in a high level of modularity and allows for easy performance tuning and simple extension of the core algorithms to various simulation schemes.
ORNL Cray X1 evaluation status report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, P.K.; Alexander, R.A.; Apra, E.
2004-05-01
On August 15, 2002 the Department of Energy (DOE) selected the Center for Computational Sciences (CCS) at Oak Ridge National Laboratory (ORNL) to deploy a new scalable vector supercomputer architecture for solving important scientific problems in climate, fusion, biology, nanoscale materials and astrophysics. ''This program is one of the first steps in an initiative designed to provide U.S. scientists with the computational power that is essential to 21st century scientific leadership,'' said Dr. Raymond L. Orbach, director of the department's Office of Science. In FY03, CCS procured a 256-processor Cray X1 to evaluate the processors, memory subsystem, scalability of themore » architecture, software environment and to predict the expected sustained performance on key DOE applications codes. The results of the micro-benchmarks and kernel bench marks show the architecture of the Cray X1 to be exceptionally fast for most operations. The best results are shown on large problems, where it is not possible to fit the entire problem into the cache of the processors. These large problems are exactly the types of problems that are important for the DOE and ultra-scale simulation. Application performance is found to be markedly improved by this architecture: - Large-scale simulations of high-temperature superconductors run 25 times faster than on an IBM Power4 cluster using the same number of processors. - Best performance of the parallel ocean program (POP v1.4.3) is 50 percent higher than on Japan s Earth Simulator and 5 times higher than on an IBM Power4 cluster. - A fusion application, global GYRO transport, was found to be 16 times faster on the X1 than on an IBM Power3. The increased performance allowed simulations to fully resolve questions raised by a prior study. - The transport kernel in the AGILE-BOLTZTRAN astrophysics code runs 15 times faster than on an IBM Power4 cluster using the same number of processors. - Molecular dynamics simulations related to the phenomenon of photon echo run 8 times faster than previously achieved. Even at 256 processors, the Cray X1 system is already outperforming other supercomputers with thousands of processors for a certain class of applications such as climate modeling and some fusion applications. This evaluation is the outcome of a number of meetings with both high-performance computing (HPC) system vendors and application experts over the past 9 months and has received broad-based support from the scientific community and other agencies.« less
Measurement-free implementations of small-scale surface codes for quantum-dot qubits
NASA Astrophysics Data System (ADS)
Ercan, H. Ekmel; Ghosh, Joydip; Crow, Daniel; Premakumar, Vickram N.; Joynt, Robert; Friesen, Mark; Coppersmith, S. N.
2018-01-01
The performance of quantum-error-correction schemes depends sensitively on the physical realizations of the qubits and the implementations of various operations. For example, in quantum-dot spin qubits, readout is typically much slower than gate operations, and conventional surface-code implementations that rely heavily on syndrome measurements could therefore be challenging. However, fast and accurate reset of quantum-dot qubits, without readout, can be achieved via tunneling to a reservoir. Here we propose small-scale surface-code implementations for which syndrome measurements are replaced by a combination of Toffoli gates and qubit reset. For quantum-dot qubits, this enables much faster error correction than measurement-based schemes, but requires additional ancilla qubits and non-nearest-neighbor interactions. We have performed numerical simulations of two different coding schemes, obtaining error thresholds on the orders of 10-2 for a one-dimensional architecture that only corrects bit-flip errors and 10-4 for a two-dimensional architecture that corrects bit- and phase-flip errors.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-01-14
... art from living American artists. One-half of one percent of the estimated construction cost of new or... for OMB Review; Art-in- Architecture Program National Artist Registry AGENCY: Public Buildings Service... extension of a previously approved information collection requirement regarding Art-in Architecture Program...
Using Multimedia for Teaching Analysis in History of Modern Architecture.
ERIC Educational Resources Information Center
Perryman, Garry
This paper presents a case for the development and support of a computer-based interactive multimedia program for teaching analysis in community college architecture design programs. Analysis in architecture design is an extremely important strategy for the teaching of higher-order thinking skills, which senior schools of architecture look for in…
Verified OS Interface Code Synthesis
2016-12-01
in this case we are using the ARMv7 processor architecture ). The application accomplishes this task by issuing the swi (“software interrupt...manual version 4.0.0) on the ARM architecture . To alleviate this problem,we developed an XML-based domain specific language (DSL) in which each...Untyped Retype Table 2.1: seL4 Architecture Independent System Calls. of r2, r3, r4 and r5 into the message registers of the thread’s IPC buffer and
Evaluation of hardware costs of implementing PSK signal detection circuit based on "system on chip"
NASA Astrophysics Data System (ADS)
Sokolovskiy, A. V.; Dmitriev, D. D.; Veisov, E. A.; Gladyshev, A. B.
2018-05-01
The article deals with the choice of the architecture of digital signal processing units for implementing the PSK signal detection scheme. As an assessment of the effectiveness of architectures, the required number of shift registers and computational processes are used when implementing the "system on a chip" on the chip. A statistical estimation of the normalized code sequence offset in the signal synchronization scheme for various hardware block architectures is used.
System for Automated Geoscientific Analyses (SAGA) v. 2.1.4
NASA Astrophysics Data System (ADS)
Conrad, O.; Bechtel, B.; Bock, M.; Dietrich, H.; Fischer, E.; Gerlitz, L.; Wehberg, J.; Wichmann, V.; Böhner, J.
2015-02-01
The System for Automated Geoscientific Analyses (SAGA) is an open-source Geographic Information System (GIS), mainly licensed under the GNU General Public License. Since its first release in 2004, SAGA has rapidly developed from a specialized tool for digital terrain analysis to a comprehensive and globally established GIS platform for scientific analysis and modeling. SAGA is coded in C++ in an object oriented design and runs under several operating systems including Windows and Linux. Key functional features of the modular organized software architecture comprise an application programming interface for the development and implementation of new geoscientific methods, an easily approachable graphical user interface with many visualization options, a command line interpreter, and interfaces to scripting and low level programming languages like R and Python. The current version 2.1.4 offers more than 700 tools, which are implemented in dynamically loadable libraries or shared objects and represent the broad scopes of SAGA in numerous fields of geoscientific endeavor and beyond. In this paper, we inform about the system's architecture, functionality, and its current state of development and implementation. Further, we highlight the wide spectrum of scientific applications of SAGA in a review of published studies with special emphasis on the core application areas digital terrain analysis, geomorphology, soil science, climatology and meteorology, as well as remote sensing.
caGrid 1.0 : an enterprise Grid infrastructure for biomedical research.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oster, S.; Langella, S.; Hastings, S.
To develop software infrastructure that will provide support for discovery, characterization, integrated access, and management of diverse and disparate collections of information sources, analysis methods, and applications in biomedical research. Design: An enterprise Grid software infrastructure, called caGrid version 1.0 (caGrid 1.0), has been developed as the core Grid architecture of the NCI-sponsored cancer Biomedical Informatics Grid (caBIG{trademark}) program. It is designed to support a wide range of use cases in basic, translational, and clinical research, including (1) discovery, (2) integrated and large-scale data analysis, and (3) coordinated study. Measurements: The caGrid is built as a Grid software infrastructure andmore » leverages Grid computing technologies and the Web Services Resource Framework standards. It provides a set of core services, toolkits for the development and deployment of new community provided services, and application programming interfaces for building client applications. Results: The caGrid 1.0 was released to the caBIG community in December 2006. It is built on open source components and caGrid source code is publicly and freely available under a liberal open source license. The core software, associated tools, and documentation can be downloaded from the following URL:
A smooth particle hydrodynamics code to model collisions between solid, self-gravitating objects
NASA Astrophysics Data System (ADS)
Schäfer, C.; Riecker, S.; Maindl, T. I.; Speith, R.; Scherrer, S.; Kley, W.
2016-05-01
Context. Modern graphics processing units (GPUs) lead to a major increase in the performance of the computation of astrophysical simulations. Owing to the different nature of GPU architecture compared to traditional central processing units (CPUs) such as x86 architecture, existing numerical codes cannot be easily migrated to run on GPU. Here, we present a new implementation of the numerical method smooth particle hydrodynamics (SPH) using CUDA and the first astrophysical application of the new code: the collision between Ceres-sized objects. Aims: The new code allows for a tremendous increase in speed of astrophysical simulations with SPH and self-gravity at low costs for new hardware. Methods: We have implemented the SPH equations to model gas, liquids and elastic, and plastic solid bodies and added a fragmentation model for brittle materials. Self-gravity may be optionally included in the simulations and is treated by the use of a Barnes-Hut tree. Results: We find an impressive performance gain using NVIDIA consumer devices compared to our existing OpenMP code. The new code is freely available to the community upon request. If you are interested in our CUDA SPH code miluphCUDA, please write an email to Christoph Schäfer. miluphCUDA is the CUDA port of miluph. miluph is pronounced [maßl2v]. We do not support the use of the code for military purposes.
Methodology, status and plans for development and assessment of the code ATHLET
DOE Office of Scientific and Technical Information (OSTI.GOV)
Teschendorff, V.; Austregesilo, H.; Lerchl, G.
1997-07-01
The thermal-hydraulic computer code ATHLET (Analysis of THermal-hydraulics of LEaks and Transients) is being developed by the Gesellschaft fuer Anlagen- und Reaktorsicherheit (GRS) for the analysis of anticipated and abnormal plant transients, small and intermediate leaks as well as large breaks in light water reactors. The aim of the code development is to cover the whole spectrum of design basis and beyond design basis accidents (without core degradation) for PWRs and BWRs with only one code. The main code features are: advanced thermal-hydraulics; modular code architecture; separation between physical models and numerical methods; pre- and post-processing tools; portability. The codemore » has features that are of special interest for applications to small leaks and transients with accident management, e.g. initialization by a steady-state calculation, full-range drift-flux model, dynamic mixture level tracking. The General Control Simulation Module of ATHLET is a flexible tool for the simulation of the balance-of-plant and control systems including the various operator actions in the course of accident sequences with AM measures. The code development is accompained by a systematic and comprehensive validation program. A large number of integral experiments and separate effect tests, including the major International Standard Problems, have been calculated by GRS and by independent organizations. The ATHLET validation matrix is a well balanced set of integral and separate effects tests derived from the CSNI proposal emphasizing, however, the German combined ECC injection system which was investigated in the UPTF, PKL and LOBI test facilities.« less
NASA Astrophysics Data System (ADS)
Payne, Joshua; Taitano, William; Knoll, Dana; Liebs, Chris; Murthy, Karthik; Feltman, Nicolas; Wang, Yijie; McCarthy, Colleen; Cieren, Emanuel
2012-10-01
In order to solve problems such as the ion coalescence and slow MHD shocks fully kinetically we developed a fully implicit 2D energy and charge conserving electromagnetic PIC code, PlasmaApp2D. PlasmaApp2D differs from previous implicit PIC implementations in that it will utilize advanced architectures such as GPUs and shared memory CPU systems, with problems too large to fit into cache. PlasmaApp2D will be a hybrid CPU-GPU code developed primarily to run on the DARWIN cluster at LANL utilizing four 12-core AMD Opteron CPUs and two NVIDIA Tesla GPUs per node. MPI will be used for cross-node communication, OpenMP will be used for on-node parallelism, and CUDA will be used for the GPUs. Development progress and initial results will be presented.
Programming parallel architectures: The BLAZE family of languages
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush
1988-01-01
Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McGhee, J.M.; Roberts, R.M.; Morel, J.E.
1997-06-01
A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner formore » scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated.« less
Vectorization, threading, and cache-blocking considerations for hydrocodes on emerging architectures
Fung, J.; Aulwes, R. T.; Bement, M. T.; ...
2015-07-14
This work reports on considerations for improving computational performance in preparation for current and expected changes to computer architecture. The algorithms studied will include increasingly complex prototypes for radiation hydrodynamics codes, such as gradient routines and diffusion matrix assembly (e.g., in [1-6]). The meshes considered for the algorithms are structured or unstructured meshes. The considerations applied for performance improvements are meant to be general in terms of architecture (not specifically graphical processing unit (GPUs) or multi-core machines, for example) and include techniques for vectorization, threading, tiling, and cache blocking. Out of a survey of optimization techniques on applications such asmore » diffusion and hydrodynamics, we make general recommendations with a view toward making these techniques conceptually accessible to the applications code developer. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.« less
Perspectives in numerical astrophysics:
NASA Astrophysics Data System (ADS)
Reverdy, V.
2016-12-01
In this discussion paper, we investigate the current and future status of numerical astrophysics and highlight key questions concerning the transition to the exascale era. We first discuss the fact that one of the main motivation behind high performance simulations should not be the reproduction of observational or experimental data, but the understanding of the emergence of complexity from fundamental laws. This motivation is put into perspective regarding the quest for more computational power and we argue that extra computational resources can be used to gain in abstraction. Then, the readiness level of present-day simulation codes in regard to upcoming exascale architecture is examined and two major challenges are raised concerning both the central role of data movement for performances and the growing complexity of codes. Software architecture is finally presented as a key component to make the most of upcoming architectures while solving original physics problems.
Las Palmeras Molecular Dynamics: A flexible and modular molecular dynamics code
NASA Astrophysics Data System (ADS)
Davis, Sergio; Loyola, Claudia; González, Felipe; Peralta, Joaquín
2010-12-01
Las Palmeras Molecular Dynamics (LPMD) is a highly modular and extensible molecular dynamics (MD) code using interatomic potential functions. LPMD is able to perform equilibrium MD simulations of bulk crystalline solids, amorphous solids and liquids, as well as non-equilibrium MD (NEMD) simulations such as shock wave propagation, projectile impacts, cluster collisions, shearing, deformation under load, heat conduction, heterogeneous melting, among others, which involve unusual MD features like non-moving atoms and walls, unstoppable atoms with constant-velocity, and external forces like electric fields. LPMD is written in C++ as a compromise between efficiency and clarity of design, and its architecture is based on separate components or plug-ins, implemented as modules which are loaded on demand at runtime. The advantage of this architecture is the ability to completely link together the desired components involved in the simulation in different ways at runtime, using a user-friendly control file language which describes the simulation work-flow. As an added bonus, the plug-in API (Application Programming Interface) makes it possible to use the LPMD components to analyze data coming from other simulation packages, convert between input file formats, apply different transformations to saved MD atomic trajectories, and visualize dynamical processes either in real-time or as a post-processing step. Individual components, such as a new potential function, a new integrator, a new file format, new properties to calculate, new real-time visualizers, and even a new algorithm for handling neighbor lists can be easily coded, compiled and tested within LPMD by virtue of its object-oriented API, without the need to modify the rest of the code. LPMD includes already several pair potential functions such as Lennard-Jones, Morse, Buckingham, MCY and the harmonic potential, as well as embedded-atom model (EAM) functions such as the Sutton-Chen and Gupta potentials. Integrators to choose include Euler (if only for demonstration purposes), Verlet and Velocity Verlet, Leapfrog and Beeman, among others. Electrostatic forces are treated as another potential function, by default using the plug-in implementing the Ewald summation method. Program summaryProgram title: LPMD Catalogue identifier: AEHG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License version 3 No. of lines in distributed program, including test data, etc.: 509 490 No. of bytes in distributed program, including test data, etc.: 6 814 754 Distribution format: tar.gz Programming language: C++ Computer: 32-bit and 64-bit workstation Operating system: UNIX RAM: Minimum 1024 bytes Classification: 7.7 External routines: zlib, OpenGL Nature of problem: Study of Statistical Mechanics and Thermodynamics of condensed matter systems, as well as kinetics of non-equilibrium processes in the same systems. Solution method: Equilibrium and non-equilibrium molecular dynamics method, Monte Carlo methods. Restrictions: Rigid molecules are not supported. Polarizable atoms and chemical bonds (proteins) either. Unusual features: The program is able to change the temperature of the simulation cell, the pressure, cut regions of the cell, color the atoms by properties, even during the simulation. It is also possible to fix the positions and/or velocity of groups of atoms. Visualization of atoms and some physical properties during the simulation. Additional comments: The program does not only perform molecular dynamics and Monte Carlo simulations, it is also able to filter and manipulate atomic configurations, read and write different file formats, convert between them, evaluate different structural and dynamical properties. Running time: 50 seconds on a 1000-step simulation of 4000 argon atoms, running on a single 2.67 GHz Intel processor.
ABINIT: Plane-Wave-Based Density-Functional Theory on High Performance Computers
NASA Astrophysics Data System (ADS)
Torrent, Marc
2014-03-01
For several years, a continuous effort has been produced to adapt electronic structure codes based on Density-Functional Theory to the future computing architectures. Among these codes, ABINIT is based on a plane-wave description of the wave functions which allows to treat systems of any kind. Porting such a code on petascale architectures pose difficulties related to the many-body nature of the DFT equations. To improve the performances of ABINIT - especially for what concerns standard LDA/GGA ground-state and response-function calculations - several strategies have been followed: A full multi-level parallelisation MPI scheme has been implemented, exploiting all possible levels and distributing both computation and memory. It allows to increase the number of distributed processes and could not be achieved without a strong restructuring of the code. The core algorithm used to solve the eigen problem (``Locally Optimal Blocked Congugate Gradient''), a Blocked-Davidson-like algorithm, is based on a distribution of processes combining plane-waves and bands. In addition to the distributed memory parallelization, a full hybrid scheme has been implemented, using standard shared-memory directives (openMP/openACC) or porting some comsuming code sections to Graphics Processing Units (GPU). As no simple performance model exists, the complexity of use has been increased; the code efficiency strongly depends on the distribution of processes among the numerous levels. ABINIT is able to predict the performances of several process distributions and automatically choose the most favourable one. On the other hand, a big effort has been carried out to analyse the performances of the code on petascale architectures, showing which sections of codes have to be improved; they all are related to Matrix Algebra (diagonalisation, orthogonalisation). The different strategies employed to improve the code scalability will be described. They are based on an exploration of new diagonalization algorithm, as well as the use of external optimized librairies. Part of this work has been supported by the european Prace project (PaRtnership for Advanced Computing in Europe) in the framework of its workpackage 8.
Three Program Architecture for Design Optimization
NASA Technical Reports Server (NTRS)
Miura, Hirokazu; Olson, Lawrence E. (Technical Monitor)
1998-01-01
In this presentation, I would like to review historical perspective on the program architecture used to build design optimization capabilities based on mathematical programming and other numerical search techniques. It is rather straightforward to classify the program architecture in three categories as shown above. However, the relative importance of each of the three approaches has not been static, instead dynamically changing as the capabilities of available computational resource increases. For example, we considered that the direct coupling architecture would never be used for practical problems, but availability of such computer systems as multi-processor. In this presentation, I would like to review the roles of three architecture from historical as well as current and future perspective. There may also be some possibility for emergence of hybrid architecture. I hope to provide some seeds for active discussion where we are heading to in the very dynamic environment for high speed computing and communication.
The Role of Ontologies in Schema-based Program Synthesis
NASA Technical Reports Server (NTRS)
Bures, Tomas; Denney, Ewen; Fischer, Bernd; Nistor, Eugen C.
2004-01-01
Program synthesis is the process of automatically deriving executable code from (non-executable) high-level specifications. It is more flexible and powerful than conventional code generation techniques that simply translate algorithmic specifications into lower-level code or only create code skeletons from structural specifications (such as UML class diagrams). Key to building a successful synthesis system is specializing to an appropriate application domain. The AUTOBAYES and AUTOFILTER systems, under development at NASA Ames, operate in the two domains of data analysis and state estimation, respectively. The central concept of both systems is the schema, a representation of reusable computational knowledge. This can take various forms, including high-level algorithm templates, code optimizations, datatype refinements, or architectural information. A schema also contains applicability conditions that are used to determine when it can be applied safely. These conditions can refer to the initial specification, to intermediate results, or to elements of the partially-instantiated code. Schema-based synthesis uses AI technology to recursively apply schemas to gradually refine a specification into executable code. This process proceeds in two main phases. A front-end gradually transforms the problem specification into a program represented in an abstract intermediate code. A backend then compiles this further down into a concrete target programming language of choice. A core engine applies schemas on the initial problem specification, then uses the output of those schemas as the input for other schemas, until the full implementation is generated. Since there might be different schemas that implement different solutions to the same problem this process can generate an entire solution tree. AUTOBAYES and AUTOFILTER have reached the level of maturity where they enable users to solve interesting application problems, e.g., the analysis of Hubble Space Telescope images. They are large (in total around 100kLoC Prolog), knowledge intensive systems that employ complex symbolic reasoning to generate a wide range of non-trivial programs for complex application do- mains. Their schemas can have complex interactions, which make it hard to change them in isolation or even understand what an existing schema actually does. Adding more capabilities by increasing the number of schemas will only worsen this situation, ultimately leading to the entropy death of the synthesis system. The root came of this problem is that the domain knowledge is scattered throughout the entire system and only represented implicitly in the schema implementations. In our current work, we are addressing this problem by making explicit the knowledge from Merent parts of the synthesis system. Here; we discuss how Gruber's definition of an ontology as an explicit specification of a conceptualization matches our efforts in identifying and explicating the domain-specific concepts. We outline the dual role ontologies play in schema-based synthesis and argue that they address different audiences and serve different purposes. Their first role is descriptive: they serve as explicit documentation, and help to understand the internal structure of the system. Their second role is prescriptive: they provide the formal basis against which the other parts of the system (e.g., schemas) can be checked. Their final role is referential: ontologies also provide semantically meaningful "hooks" which allow schemas and tools to access the internal state of the program derivation process (e.g., fragments of the generated code) in domain-specific rather than language-specific terms, and thus to modify it in a controlled fashion. For discussion purposes we use AUTOLINEAR, a small synthesis system we are currently experimenting with, which can generate code for solving a system of linear equations, Az = b.
Fast underdetermined BSS architecture design methodology for real time applications.
Mopuri, Suresh; Reddy, P Sreenivasa; Acharyya, Amit; Naik, Ganesh R
2015-01-01
In this paper, we propose a high speed architecture design methodology for the Under-determined Blind Source Separation (UBSS) algorithm using our recently proposed high speed Discrete Hilbert Transform (DHT) targeting real time applications. In UBSS algorithm, unlike the typical BSS, the number of sensors are less than the number of the sources, which is of more interest in the real time applications. The DHT architecture has been implemented based on sub matrix multiplication method to compute M point DHT, which uses N point architecture recursively and where M is an integer multiples of N. The DHT architecture and state of the art architecture are coded in VHDL for 16 bit word length and ASIC implementation is carried out using UMC 90 - nm technology @V DD = 1V and @ 1MHZ clock frequency. The proposed architecture implementation and experimental comparison results show that the DHT design is two times faster than state of the art architecture.
Architecture Adaptive Computing Environment
NASA Technical Reports Server (NTRS)
Dorband, John E.
2006-01-01
Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
The Plasma Simulation Code: A modern particle-in-cell code with patch-based load-balancing
NASA Astrophysics Data System (ADS)
Germaschewski, Kai; Fox, William; Abbott, Stephen; Ahmadi, Narges; Maynard, Kristofor; Wang, Liang; Ruhl, Hartmut; Bhattacharjee, Amitava
2016-08-01
This work describes the Plasma Simulation Code (PSC), an explicit, electromagnetic particle-in-cell code with support for different order particle shape functions. We review the basic components of the particle-in-cell method as well as the computational architecture of the PSC code that allows support for modular algorithms and data structure in the code. We then describe and analyze in detail a distinguishing feature of PSC: patch-based load balancing using space-filling curves which is shown to lead to major efficiency gains over unbalanced methods and a previously used simpler balancing method.
Demonstration of Weight-Four Parity Measurements in the Surface Code Architecture.
Takita, Maika; Córcoles, A D; Magesan, Easwar; Abdo, Baleegh; Brink, Markus; Cross, Andrew; Chow, Jerry M; Gambetta, Jay M
2016-11-18
We present parity measurements on a five-qubit lattice with connectivity amenable to the surface code quantum error correction architecture. Using all-microwave controls of superconducting qubits coupled via resonators, we encode the parities of four data qubit states in either the X or the Z basis. Given the connectivity of the lattice, we perform a full characterization of the static Z interactions within the set of five qubits, as well as dynamical Z interactions brought along by single- and two-qubit microwave drives. The parity measurements are significantly improved by modifying the microwave two-qubit gates to dynamically remove nonideal Z errors.
Roads towards fault-tolerant universal quantum computation
NASA Astrophysics Data System (ADS)
Campbell, Earl T.; Terhal, Barbara M.; Vuillot, Christophe
2017-09-01
A practical quantum computer must not merely store information, but also process it. To prevent errors introduced by noise from multiplying and spreading, a fault-tolerant computational architecture is required. Current experiments are taking the first steps toward noise-resilient logical qubits. But to convert these quantum devices from memories to processors, it is necessary to specify how a universal set of gates is performed on them. The leading proposals for doing so, such as magic-state distillation and colour-code techniques, have high resource demands. Alternative schemes, such as those that use high-dimensional quantum codes in a modular architecture, have potential benefits, but need to be explored further.
Roads towards fault-tolerant universal quantum computation.
Campbell, Earl T; Terhal, Barbara M; Vuillot, Christophe
2017-09-13
A practical quantum computer must not merely store information, but also process it. To prevent errors introduced by noise from multiplying and spreading, a fault-tolerant computational architecture is required. Current experiments are taking the first steps toward noise-resilient logical qubits. But to convert these quantum devices from memories to processors, it is necessary to specify how a universal set of gates is performed on them. The leading proposals for doing so, such as magic-state distillation and colour-code techniques, have high resource demands. Alternative schemes, such as those that use high-dimensional quantum codes in a modular architecture, have potential benefits, but need to be explored further.
Parallel-Processing CMOS Circuitry for M-QAM and 8PSK TCM
NASA Technical Reports Server (NTRS)
Gray, Andrew; Lee, Dennis; Hoy, Scott; Fisher, Dave; Fong, Wai; Ghuman, Parminder
2009-01-01
There has been some additional development of parts reported in "Multi-Modulator for Bandwidth-Efficient Communication" (NPO-40807), NASA Tech Briefs, Vol. 32, No. 6 (June 2009), page 34. The focus was on 1) The generation of M-order quadrature amplitude modulation (M-QAM) and octonary-phase-shift-keying, trellis-coded modulation (8PSK TCM), 2) The use of square-root raised-cosine pulse-shaping filters, 3) A parallel-processing architecture that enables low-speed [complementary metal oxide/semiconductor (CMOS)] circuitry to perform the coding, modulation, and pulse-shaping computations at a high rate; and 4) Implementation of the architecture in a CMOS field-programmable gate array.
A CPU benchmark for protein crystallographic refinement.
Bourne, P E; Hendrickson, W A
1990-01-01
The CPU time required to complete a cycle of restrained least-squares refinement of a protein structure from X-ray crystallographic data using the FORTRAN codes PROTIN and PROLSQ are reported for 48 different processors, ranging from single-user workstations to supercomputers. Sequential, vector, VLIW, multiprocessor, and RISC hardware architectures are compared using both a small and a large protein structure. Representative compile times for each hardware type are also given, and the improvement in run-time when coding for a specific hardware architecture considered. The benchmarks involve scalar integer and vector floating point arithmetic and are representative of the calculations performed in many scientific disciplines.
Characterization of Proxy Application Performance on Advanced Architectures. UMT2013, MCB, AMG2013
DOE Office of Scientific and Technical Information (OSTI.GOV)
Howell, Louis H.; Gunney, Brian T.; Bhatele, Abhinav
2015-10-09
Three codes were tested at LLNL as part of a Tri-Lab effort to make detailed assessments of several proxy applications on various advanced architectures, with the eventual goal of extending these assessments to codes of programmatic interest running more realistic simulations. Teams from Sandia and Los Alamos tested proxy apps of their own. The focus in this report is on the LLNL codes UMT2013, MCB, and AMG2013. We present weak and strong MPI scaling results and studies of OpenMP efficiency on a large BG/Q system at LLNL, with comparison against similar tests on an Intel Sandy Bridge TLCC2 system. Themore » hardware counters on BG/Q provide detailed information on many aspects of on-node performance, while information from the mpiP tool gives insight into the reasons for the differing scaling behavior on these two different architectures. Results from three more speculative tests are also included: one that exploits NVRAM as extended memory, one that studies performance under a power bound, and one that illustrates the effects of changing the torus network mapping on BG/Q.« less
Web Services Provide Access to SCEC Scientific Research Application Software
NASA Astrophysics Data System (ADS)
Gupta, N.; Gupta, V.; Okaya, D.; Kamb, L.; Maechling, P.
2003-12-01
Web services offer scientific communities a new paradigm for sharing research codes and communicating results. While there are formal technical definitions of what constitutes a web service, for a user community such as the Southern California Earthquake Center (SCEC), we may conceptually consider a web service to be functionality provided on-demand by an application which is run on a remote computer located elsewhere on the Internet. The value of a web service is that it can (1) run a scientific code without the user needing to install and learn the intricacies of running the code; (2) provide the technical framework which allows a user's computer to talk to the remote computer which performs the service; (3) provide the computational resources to run the code; and (4) bundle several analysis steps and provide the end results in digital or (post-processed) graphical form. Within an NSF-sponsored ITR project coordinated by SCEC, we are constructing web services using architectural protocols and programming languages (e.g., Java). However, because the SCEC community has a rich pool of scientific research software (written in traditional languages such as C and FORTRAN), we also emphasize making existing scientific codes available by constructing web service frameworks which wrap around and directly run these codes. In doing so we attempt to broaden community usage of these codes. Web service wrapping of a scientific code can be done using a "web servlet" construction or by using a SOAP/WSDL-based framework. This latter approach is widely adopted in IT circles although it is subject to rapid evolution. Our wrapping framework attempts to "honor" the original codes with as little modification as is possible. For versatility we identify three methods of user access: (A) a web-based GUI (written in HTML and/or Java applets); (B) a Linux/OSX/UNIX command line "initiator" utility (shell-scriptable); and (C) direct access from within any Java application (and with the correct API interface from within C++ and/or C/Fortran). This poster presentation will provide descriptions of the following selected web services and their origin as scientific application codes: 3D community velocity models for Southern California, geocoordinate conversions (latitude/longitude to UTM), execution of GMT graphical scripts, data format conversions (Gocad to Matlab format), and implementation of Seismic Hazard Analysis application programs that calculate hazard curve and hazard map data sets.
Understanding GPU Power. A Survey of Profiling, Modeling, and Simulation Methods
Bridges, Robert A.; Imam, Neena; Mintz, Tiffany M.
2016-09-01
Modern graphics processing units (GPUs) have complex architectures that admit exceptional performance and energy efficiency for high throughput applications.Though GPUs consume large amounts of power, their use for high throughput applications facilitate state-of-the-art energy efficiency and performance. Consequently, continued development relies on understanding their power consumption. Our work is a survey of GPU power modeling and profiling methods with increased detail on noteworthy efforts. Moreover, as direct measurement of GPU power is necessary for model evaluation and parameter initiation, internal and external power sensors are discussed. Hardware counters, which are low-level tallies of hardware events, share strong correlation to powermore » use and performance. Statistical correlation between power and performance counters has yielded worthwhile GPU power models, yet the complexity inherent to GPU architectures presents new hurdles for power modeling. Developments and challenges of counter-based GPU power modeling is discussed. Often building on the counter-based models, research efforts for GPU power simulation, which make power predictions from input code and hardware knowledge, provide opportunities for optimization in programming or architectural design. Noteworthy strides in power simulations for GPUs are included along with their performance or functional simulator counterparts when appropriate. Lastly, possible directions for future research are discussed.« less
JAtlasView: a Java atlas-viewer for browsing biomedical 3D images and atlases.
Feng, Guangjie; Burton, Nick; Hill, Bill; Davidson, Duncan; Kerwin, Janet; Scott, Mark; Lindsay, Susan; Baldock, Richard
2005-03-09
Many three-dimensional (3D) images are routinely collected in biomedical research and a number of digital atlases with associated anatomical and other information have been published. A number of tools are available for viewing this data ranging from commercial visualization packages to freely available, typically system architecture dependent, solutions. Here we discuss an atlas viewer implemented to run on any workstation using the architecture neutral Java programming language. We report the development of a freely available Java based viewer for 3D image data, descibe the structure and functionality of the viewer and how automated tools can be developed to manage the Java Native Interface code. The viewer allows arbitrary re-sectioning of the data and interactive browsing through the volume. With appropriately formatted data, for example as provided for the Electronic Atlas of the Developing Human Brain, a 3D surface view and anatomical browsing is available. The interface is developed in Java with Java3D providing the 3D rendering. For efficiency the image data is manipulated using the Woolz image-processing library provided as a dynamically linked module for each machine architecture. We conclude that Java provides an appropriate environment for efficient development of these tools and techniques exist to allow computationally efficient image-processing libraries to be integrated relatively easily.
Understanding GPU Power. A Survey of Profiling, Modeling, and Simulation Methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bridges, Robert A.; Imam, Neena; Mintz, Tiffany M.
Modern graphics processing units (GPUs) have complex architectures that admit exceptional performance and energy efficiency for high throughput applications.Though GPUs consume large amounts of power, their use for high throughput applications facilitate state-of-the-art energy efficiency and performance. Consequently, continued development relies on understanding their power consumption. Our work is a survey of GPU power modeling and profiling methods with increased detail on noteworthy efforts. Moreover, as direct measurement of GPU power is necessary for model evaluation and parameter initiation, internal and external power sensors are discussed. Hardware counters, which are low-level tallies of hardware events, share strong correlation to powermore » use and performance. Statistical correlation between power and performance counters has yielded worthwhile GPU power models, yet the complexity inherent to GPU architectures presents new hurdles for power modeling. Developments and challenges of counter-based GPU power modeling is discussed. Often building on the counter-based models, research efforts for GPU power simulation, which make power predictions from input code and hardware knowledge, provide opportunities for optimization in programming or architectural design. Noteworthy strides in power simulations for GPUs are included along with their performance or functional simulator counterparts when appropriate. Lastly, possible directions for future research are discussed.« less
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2002-01-01
Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
Cross-language Babel structs—making scientific interfaces more efficient
NASA Astrophysics Data System (ADS)
Prantl, Adrian; Ebner, Dietmar; Epperly, Thomas G. W.
2013-01-01
Babel is an open-source language interoperability framework tailored to the needs of high-performance scientific computing. As an integral element of the Common Component Architecture, it is employed in a wide range of scientific applications where it is used to connect components written in different programming languages. In this paper we describe how we extended Babel to support interoperable tuple data types (structs). Structs are a common idiom in (mono-lingual) scientific application programming interfaces (APIs); they are an efficient way to pass tuples of nonuniform data between functions, and are supported natively by most programming languages. Using our extended version of Babel, developers of scientific codes can now pass structs as arguments between functions implemented in any of the supported languages. In C, C++, Fortran 2003/2008 and Chapel, structs can be passed without the overhead of data marshaling or copying, providing language interoperability at minimal cost. Other supported languages are Fortran 77, Fortran 90/95, Java and Python. We will show how we designed a struct implementation that is interoperable with all of the supported languages and present benchmark data to compare the performance of all language bindings, highlighting the differences between languages that offer native struct support and an object-oriented interface with getter/setter methods. A case study shows how structs can help simplify the interfaces of scientific codes significantly.
Targeting multiple heterogeneous hardware platforms with OpenCL
NASA Astrophysics Data System (ADS)
Fox, Paul A.; Kozacik, Stephen T.; Humphrey, John R.; Paolini, Aaron; Kuller, Aryeh; Kelmelis, Eric J.
2014-06-01
The OpenCL API allows for the abstract expression of parallel, heterogeneous computing, but hardware implementations have substantial implementation differences. The abstractions provided by the OpenCL API are often insufficiently high-level to conceal differences in hardware architecture. Additionally, implementations often do not take advantage of potential performance gains from certain features due to hardware limitations and other factors. These factors make it challenging to produce code that is portable in practice, resulting in much OpenCL code being duplicated for each hardware platform being targeted. This duplication of effort offsets the principal advantage of OpenCL: portability. The use of certain coding practices can mitigate this problem, allowing a common code base to be adapted to perform well across a wide range of hardware platforms. To this end, we explore some general practices for producing performant code that are effective across platforms. Additionally, we explore some ways of modularizing code to enable optional optimizations that take advantage of hardware-specific characteristics. The minimum requirement for portability implies avoiding the use of OpenCL features that are optional, not widely implemented, poorly implemented, or missing in major implementations. Exposing multiple levels of parallelism allows hardware to take advantage of the types of parallelism it supports, from the task level down to explicit vector operations. Static optimizations and branch elimination in device code help the platform compiler to effectively optimize programs. Modularization of some code is important to allow operations to be chosen for performance on target hardware. Optional subroutines exploiting explicit memory locality allow for different memory hierarchies to be exploited for maximum performance. The C preprocessor and JIT compilation using the OpenCL runtime can be used to enable some of these techniques, as well as to factor in hardware-specific optimizations as necessary.
Practical Application of Model-based Programming and State-based Architecture to Space Missions
NASA Technical Reports Server (NTRS)
Horvath, Gregory; Ingham, Michel; Chung, Seung; Martin, Oliver; Williams, Brian
2006-01-01
A viewgraph presentation to develop models from systems engineers that accomplish mission objectives and manage the health of the system is shown. The topics include: 1) Overview; 2) Motivation; 3) Objective/Vision; 4) Approach; 5) Background: The Mission Data System; 6) Background: State-based Control Architecture System; 7) Background: State Analysis; 8) Overview of State Analysis; 9) Background: MDS Software Frameworks; 10) Background: Model-based Programming; 10) Background: Titan Model-based Executive; 11) Model-based Execution Architecture; 12) Compatibility Analysis of MDS and Titan Architectures; 13) Integrating Model-based Programming and Execution into the Architecture; 14) State Analysis and Modeling; 15) IMU Subsystem State Effects Diagram; 16) Titan Subsystem Model: IMU Health; 17) Integrating Model-based Programming and Execution into the Software IMU; 18) Testing Program; 19) Computationally Tractable State Estimation & Fault Diagnosis; 20) Diagnostic Algorithm Performance; 21) Integration and Test Issues; 22) Demonstrated Benefits; and 23) Next Steps
Modulation and Coding for NASA's New Space Communications Architecture
NASA Technical Reports Server (NTRS)
Deutsch, Leslie J.; Stocklin, Frank J.; Rush, John J.
2008-01-01
With the release in 2006 of NASA's Space Communications and Navigation Architecture, the agency defined its vision for the future in these areas. The results reported in this paper help define the myriad communications links included in this architecture through the year 2030. While these results represent the work of multiple NASA Centers and some of the best experts in the Agency, this is only a first step toward developing international telecommunication link standards that will take the world into the next era of space exploration.
Multiprocessor architecture: Synthesis and evaluation
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1990-01-01
Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
Binary translation using peephole translation rules
Bansal, Sorav; Aiken, Alex
2010-05-04
An efficient binary translator uses peephole translation rules to directly translate executable code from one instruction set to another. In a preferred embodiment, the translation rules are generated using superoptimization techniques that enable the translator to automatically learn translation rules for translating code from the source to target instruction set architecture.
Clustalnet: the joining of Clustal and CORBA.
Campagne, F
2000-07-01
Performing sequence alignment operations from a different program than the original sequence alignment code, and/or through a network connection, is often required. Interactive alignment editors and large-scale biological data analysis are common examples where such a flexibility is important. Interoperability between the alignment engine and the client should be obtained regardless of the architectures and programming languages of the server and client. Clustalnet, a Clustal alignment CORBA server is described, which was developed on the basis of Clustalw. This server brings the robustness of the algorithms and implementations of Clustal to a new level of reuse. A Clustalnet server object can be accessed from a program, transparently through the network. We present interfaces to perform the alignment operations and to control these operations via immutable contexts. The interfaces that select the contexts do not depend on the nature of the operation to be performed, making the design modular. The IDL interfaces presented here are not specific to Clustal and can be implemented on top of different sequence alignment algorithm implementations.
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction
Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...
1995-01-01
In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less
Open System Architecture design for planet surface systems
NASA Technical Reports Server (NTRS)
Petri, D. A.; Pieniazek, L. A.; Toups, L. D.
1992-01-01
The Open System Architecture is an approach to meeting the needs for flexibility and evolution of the U.S. Space Exploration Initiative program of the manned exploration of the solar system and its permanent settlement. This paper investigates the issues that future activities of the planet exploration program must confront, defines the basic concepts that provide the basis for establishing an Open System Architecture, identifies the appropriate features of such an architecture, and discusses examples of Open System Architectures.
NASA Astrophysics Data System (ADS)
van Dyk, Danny; Geveler, Markus; Mallach, Sven; Ribbrock, Dirk; Göddeke, Dominik; Gutwenger, Carsten
2009-12-01
We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI's libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI's SSE backend, and additional 3-4 and 4-16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development. Program summaryProgram title: HONEI Catalogue identifier: AEDW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPLv2 No. of lines in distributed program, including test data, etc.: 216 180 No. of bytes in distributed program, including test data, etc.: 1 270 140 Distribution format: tar.gz Programming language: C++ Computer: x86, x86_64, NVIDIA CUDA GPUs, Cell blades and PlayStation 3 Operating system: Linux RAM: at least 500 MB free Classification: 4.8, 4.3, 6.1 External routines: SSE: none; [1] for GPU, [2] for Cell backend Nature of problem: Computational science in general and numerical simulation in particular have reached a turning point. The revolution developers are facing is not primarily driven by a change in (problem-specific) methodology, but rather by the fundamental paradigm shift of the underlying hardware towards heterogeneity and parallelism. This is particularly relevant for data-intensive problems stemming from discretisations with local support, such as finite differences, volumes and elements. Solution method: To address these issues, we present a hardware aware collection of libraries combining the advantages of modern software techniques and hardware oriented programming. Applications built on top of these libraries can be configured trivially to execute on CPUs, GPUs or the Cell processor. In order to evaluate the performance and accuracy of our approach, we provide two domain specific applications; a multigrid solver for the Poisson problem and a fully explicit solver for 2D shallow water equations. Restrictions: HONEI is actively being developed, and its feature list is continuously expanded. Not all combinations of operations and architectures might be supported in earlier versions of the code. Obtaining snapshots from http://www.honei.org is recommended. Unusual features: The considered applications as well as all library operations can be run on NVIDIA GPUs and the Cell BE. Running time: Depending on the application, and the input sizes. The Poisson solver executes in few seconds, while the SWE solver requires up to 5 minutes for large spatial discretisations or small timesteps. References:http://www.nvidia.com/cuda. http://www.ibm.com/developerworks/power/cell.
Jiansen Li; Jianqi Sun; Ying Song; Yanran Xu; Jun Zhao
2014-01-01
An effective way to improve the data acquisition speed of magnetic resonance imaging (MRI) is using under-sampled k-space data, and dictionary learning method can be used to maintain the reconstruction quality. Three-dimensional dictionary trains the atoms in dictionary in the form of blocks, which can utilize the spatial correlation among slices. Dual-dictionary learning method includes a low-resolution dictionary and a high-resolution dictionary, for sparse coding and image updating respectively. However, the amount of data is huge for three-dimensional reconstruction, especially when the number of slices is large. Thus, the procedure is time-consuming. In this paper, we first utilize the NVIDIA Corporation's compute unified device architecture (CUDA) programming model to design the parallel algorithms on graphics processing unit (GPU) to accelerate the reconstruction procedure. The main optimizations operate in the dictionary learning algorithm and the image updating part, such as the orthogonal matching pursuit (OMP) algorithm and the k-singular value decomposition (K-SVD) algorithm. Then we develop another version of CUDA code with algorithmic optimization. Experimental results show that more than 324 times of speedup is achieved compared with the CPU-only codes when the number of MRI slices is 24.
Parallel design of JPEG-LS encoder on graphics processing units
NASA Astrophysics Data System (ADS)
Duan, Hao; Fang, Yong; Huang, Bormin
2012-01-01
With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.
An algebraic hypothesis about the primeval genetic code architecture.
Sánchez, Robersy; Grau, Ricardo
2009-09-01
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berra, P.B.; Chung, S.M.; Hachem, N.I.
This article presents techniques for managing a very large data/knowledge base to support multiple inference-mechanisms for logic programming. Because evaluation of goals can require accessing data from the extensional database, or EDB, in very general ways, one must often resort to indexing on all fields of the extensional database facts. This presents a formidable management problem in that the index data may be larger than the EDB itself. This problem becomes even more serious in this case of very large data/knowledge bases (hundreds of gigabytes), since considerably more hardware will be required to process and store the index data. Inmore » order to reduce the amount of index data considerably without losing generality, the authors form a surrogate file, which is a hashing transformation of the facts. Superimposed code words (SCW), concatenated code words (CCW), and transformed inverted lists (TIL) are possible structures for the surrogate file. since these transformations are quite regular and compact, the authors consider possible computer architecture for the processing of the surrogate file.« less
van Dyck, Peter C; Rinaldo, Piero; McDonald, Clement; Howell, R Rodrey; Zuckerman, Alan; Downing, Gregory
2010-01-01
Capture, coding and communication of newborn screening (NBS) information represent a challenge for public health laboratories, health departments, hospitals, and ambulatory care practices. An increasing number of conditions targeted for screening and the complexity of interpretation contribute to a growing need for integrated information-management strategies. This makes NBS an important test of tools and architecture for electronic health information exchange (HIE) in this convergence of individual patient care and population health activities. For this reason, the American Health Information Community undertook three tasks described in this paper. First, a newborn screening use case was established to facilitate standards harmonization for common terminology and interoperability specifications guiding HIE. Second, newborn screening coding and terminology were developed for integration into electronic HIE activities. Finally, clarification of privacy, security, and clinical laboratory regulatory requirements governing information exchange was provided, serving as a framework to establish pathways for improving screening program timeliness, effectiveness, and efficiency of quality patient care services. PMID:20064796
Enhancer-Derived lncRNAs Regulate Genome Architecture: Fact or Fiction?
Fanucchi, Stephanie; Mhlanga, Musa M
2017-06-01
How does the non-coding portion of the genome contribute to the regulation of genome architecture? A recent paper by Tan et al. focuses on the relationship between cis-acting complex-trait-associated lincRNAs and the formation of chromosomal contacts in topologically associating domains (TADs). Copyright © 2017 Elsevier Ltd. All rights reserved.
Lexical architecture based on a hierarchy of codes for high-speed string correction
NASA Astrophysics Data System (ADS)
de Bertrand de Beuvron, Francois; Trigano, Philippe
1992-03-01
AI systems for the general public have to be really tolerant to errors. These errors could be of several kinds: typographic, phonetic, grammatical, or semantic. A special lexical dictionary architecture has been designed to deal with the first two. It extends the hierarchical file method of E. Tanaka and Y. Kojima.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deslippe, Jack; da Jornada, Felipe H.; Vigil-Fowler, Derek
2016-10-06
We profile and optimize calculations performed with the BerkeleyGW code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW methodmore » is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.« less
Hybrid petacomputing meets cosmology: The Roadrunner Universe project
NASA Astrophysics Data System (ADS)
Habib, Salman; Pope, Adrian; Lukić, Zarija; Daniel, David; Fasel, Patricia; Desai, Nehal; Heitmann, Katrin; Hsu, Chung-Hsing; Ankeny, Lee; Mark, Graham; Bhattacharya, Suman; Ahrens, James
2009-07-01
The target of the Roadrunner Universe project at Los Alamos National Laboratory is a set of very large cosmological N-body simulation runs on the hybrid supercomputer Roadrunner, the world's first petaflop platform. Roadrunner's architecture presents opportunities and difficulties characteristic of next-generation supercomputing. We describe a new code designed to optimize performance and scalability by explicitly matching the underlying algorithms to the machine architecture, and by using the physics of the problem as an essential aid in this process. While applications will differ in specific exploits, we believe that such a design process will become increasingly important in the future. The Roadrunner Universe project code, MC3 (Mesh-based Cosmology Code on the Cell), uses grid and direct particle methods to balance the capabilities of Roadrunner's conventional (Opteron) and accelerator (Cell BE) layers. Mirrored particle caches and spectral techniques are used to overcome communication bandwidth limitations and possible difficulties with complicated particle-grid interaction templates.
Network Coded Cooperative Communication in a Real-Time Wireless Hospital Sensor Network.
Prakash, R; Balaji Ganesh, A; Sivabalan, Somu
2017-05-01
The paper presents a network coded cooperative communication (NC-CC) enabled wireless hospital sensor network architecture for monitoring health as well as postural activities of a patient. A wearable device, referred as a smartband is interfaced with pulse rate, body temperature sensors and an accelerometer along with wireless protocol services, such as Bluetooth and Radio-Frequency transceiver and Wi-Fi. The energy efficiency of wearable device is improved by embedding a linear acceleration based transmission duty cycling algorithm (NC-DRDC). The real-time demonstration is carried-out in a hospital environment to evaluate the performance characteristics, such as power spectral density, energy consumption, signal to noise ratio, packet delivery ratio and transmission offset. The resource sharing and energy efficiency features of network coding technique are improved by proposing an algorithm referred as network coding based dynamic retransmit/rebroadcast decision control (LA-TDC). From the experimental results, it is observed that the proposed LA-TDC algorithm reduces network traffic and end-to-end delay by an average of 27.8% and 21.6%, respectively than traditional network coded wireless transmission. The wireless architecture is deployed in a hospital environment and results are then successfully validated.
A software bus for thread objects
NASA Technical Reports Server (NTRS)
Callahan, John R.; Li, Dehuai
1995-01-01
The authors have implemented a software bus for lightweight threads in an object-oriented programming environment that allows for rapid reconfiguration and reuse of thread objects in discrete-event simulation experiments. While previous research in object-oriented, parallel programming environments has focused on direct communication between threads, our lightweight software bus, called the MiniBus, provides a means to isolate threads from their contexts of execution by restricting communications between threads to message-passing via their local ports only. The software bus maintains a topology of connections between these ports. It routes, queues, and delivers messages according to this topology. This approach allows for rapid reconfiguration and reuse of thread objects in other systems without making changes to the specifications or source code. A layered approach that provides the needed transparency to developers is presented. Examples of using the MiniBus are given, and the value of bus architectures in building and conducting simulations of discrete-event systems is discussed.
Expert systems for space power supply - Design, analysis, and evaluation
NASA Technical Reports Server (NTRS)
Cooper, Ralph S.; Thomson, M. Kemer; Hoshor, Alan
1987-01-01
The feasibility of applying expert systems to the conceptual design, analysis, and evaluation of space power supplies in particular, and complex systems in general is evaluated. To do this, the space power supply design process and its associated knowledge base were analyzed and characterized in a form suitable for computer emulation of a human expert. The existing expert system tools and the results achieved with them were evaluated to assess their applicability to power system design. Some new concepts for combining program architectures (modular expert systems and algorithms) with information about the domain were applied to create a 'deep' system for handling the complex design problem. NOVICE, a code to solve a simplified version of a scoping study of a wide variety of power supply types for a broad range of missions, has been developed, programmed, and tested as a concrete feasibility demonstration.
The Computer Aided Aircraft-design Package (CAAP)
NASA Technical Reports Server (NTRS)
Yalif, Guy U.
1994-01-01
The preliminary design of an aircraft is a complex, labor-intensive, and creative process. Since the 1970's, many computer programs have been written to help automate preliminary airplane design. Time and resource analyses have identified, 'a substantial decrease in project duration with the introduction of an automated design capability'. Proof-of-concept studies have been completed which establish 'a foundation for a computer-based airframe design capability', Unfortunately, today's design codes exist in many different languages on many, often expensive, hardware platforms. Through the use of a module-based system architecture, the Computer aided Aircraft-design Package (CAAP) will eventually bring together many of the most useful features of existing programs. Through the use of an expert system, it will add an additional feature that could be described as indispensable to entry level engineers and students: the incorporation of 'expert' knowledge into the automated design process.
DOE Office of Scientific and Technical Information (OSTI.GOV)
G.A. Pope; K. Sephernoori; D.C. McKinney
1996-03-15
This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less
Extreme Programming: Maestro Style
NASA Technical Reports Server (NTRS)
Norris, Jeffrey; Fox, Jason; Rabe, Kenneth; Shu, I-Hsiang; Powell, Mark
2009-01-01
"Extreme Programming: Maestro Style" is the name of a computer programming methodology that has evolved as a custom version of a methodology, called extreme programming that has been practiced in the software industry since the late 1990s. The name of this version reflects its origin in the work of the Maestro team at NASA's Jet Propulsion Laboratory that develops software for Mars exploration missions. Extreme programming is oriented toward agile development of software resting on values of simplicity, communication, testing, and aggressiveness. Extreme programming involves use of methods of rapidly building and disseminating institutional knowledge among members of a computer-programming team to give all the members a shared view that matches the view of the customers for whom the software system is to be developed. Extreme programming includes frequent planning by programmers in collaboration with customers, continually examining and rewriting code in striving for the simplest workable software designs, a system metaphor (basically, an abstraction of the system that provides easy-to-remember software-naming conventions and insight into the architecture of the system), programmers working in pairs, adherence to a set of coding standards, collaboration of customers and programmers, frequent verbal communication, frequent releases of software in small increments of development, repeated testing of the developmental software by both programmers and customers, and continuous interaction between the team and the customers. The environment in which the Maestro team works requires the team to quickly adapt to changing needs of its customers. In addition, the team cannot afford to accept unnecessary development risk. Extreme programming enables the Maestro team to remain agile and provide high-quality software and service to its customers. However, several factors in the Maestro environment have made it necessary to modify some of the conventional extreme-programming practices. The single most influential of these factors is that continuous interaction between customers and programmers is not feasible.
System architectures for telerobotic research
NASA Technical Reports Server (NTRS)
Harrison, F. Wallace
1989-01-01
Several activities are performed related to the definition and creation of telerobotic systems. The effort and investment required to create architectures for these complex systems can be enormous; however, the magnitude of process can be reduced if structured design techniques are applied. A number of informal methodologies supporting certain aspects of the design process are available. More recently, prototypes of integrated tools supporting all phases of system design from requirements analysis to code generation and hardware layout have begun to appear. Activities related to system architecture of telerobots are described, including current activities which are designed to provide a methodology for the comparison and quantitative analysis of alternative system architectures.
ng: What next-generation languages can teach us about HENP frameworks in the manycore era
NASA Astrophysics Data System (ADS)
Binet, Sébastien
2011-12-01
Current High Energy and Nuclear Physics (HENP) frameworks were written before multicore systems became widely deployed. A 'single-thread' execution model naturally emerged from that environment, however, this no longer fits into the processing model on the dawn of the manycore era. Although previous work focused on minimizing the changes to be applied to the LHC frameworks (because of the data taking phase) while still trying to reap the benefits of the parallel-enhanced CPU architectures, this paper explores what new languages could bring to the design of the next-generation frameworks. Parallel programming is still in an intensive phase of R&D and no silver bullet exists despite the 30+ years of literature on the subject. Yet, several parallel programming styles have emerged: actors, message passing, communicating sequential processes, task-based programming, data flow programming, ... to name a few. We present the work of the prototyping of a next-generation framework in new and expressive languages (python and Go) to investigate how code clarity and robustness are affected and what are the downsides of using languages younger than FORTRAN/C/C++.
Update on Integrated Optical Design Analyzer
NASA Technical Reports Server (NTRS)
Moore, James D., Jr.; Troy, Ed
2003-01-01
Updated information on the Integrated Optical Design Analyzer (IODA) computer program has become available. IODA was described in Software for Multidisciplinary Concurrent Optical Design (MFS-31452), NASA Tech Briefs, Vol. 25, No. 10 (October 2001), page 8a. To recapitulate: IODA facilitates multidisciplinary concurrent engineering of highly precise optical instruments. The architecture of IODA was developed by reviewing design processes and software in an effort to automate design procedures. IODA significantly reduces design iteration cycle time and eliminates many potential sources of error. IODA integrates the modeling efforts of a team of experts in different disciplines (e.g., optics, structural analysis, and heat transfer) working at different locations and provides seamless fusion of data among thermal, structural, and optical models used to design an instrument. IODA is compatible with data files generated by the NASTRAN structural-analysis program and the Code V (Registered Trademark) optical-analysis program, and can be used to couple analyses performed by these two programs. IODA supports multiple-load-case analysis for quickly accomplishing trade studies. IODA can also model the transient response of an instrument under the influence of dynamic loads and disturbances.
Programming model for distributed intelligent systems
NASA Technical Reports Server (NTRS)
Sztipanovits, J.; Biegl, C.; Karsai, G.; Bogunovic, N.; Purves, B.; Williams, R.; Christiansen, T.
1988-01-01
A programming model and architecture which was developed for the design and implementation of complex, heterogeneous measurement and control systems is described. The Multigraph Architecture integrates artificial intelligence techniques with conventional software technologies, offers a unified framework for distributed and shared memory based parallel computational models and supports multiple programming paradigms. The system can be implemented on different hardware architectures and can be adapted to strongly different applications.
Developing CORBA-Based Distributed Scientific Applications from Legacy Fortran Programs
NASA Technical Reports Server (NTRS)
Sang, Janche; Kim, Chan; Lopez, Isaac
2000-01-01
Recent progress in distributed object technology has enabled software applications to be developed and deployed easily such that objects or components can work together across the boundaries of the network, different operating systems, and different languages. A distributed object is not necessarily a complete application but rather a reusable, self-contained piece of software that co-operates with other objects in a plug-and-play fashion via a well-defined interface. The Common Object Request Broker Architecture (CORBA), a middleware standard defined by the Object Management Group (OMG), uses the Interface Definition Language (IDL) to specify such an interface for transparent communication between distributed objects. Since IDL can be mapped to any programming language, such as C++, Java, Smalltalk, etc., existing applications can be integrated into a new application and hence the tasks of code re-writing and software maintenance can be reduced. Many scientific applications in aerodynamics and solid mechanics are written in Fortran. Refitting these legacy Fortran codes with CORBA objects can increase the codes reusability. For example, scientists could link their scientific applications to vintage Fortran programs such as Partial Differential Equation(PDE) solvers in a plug-and-play fashion. Unfortunately, CORBA IDL to Fortran mapping has not been proposed and there seems to be no direct method of generating CORBA objects from Fortran without having to resort to manually writing C/C++ wrappers. In this paper, we present an efficient methodology to integrate Fortran legacy programs into a distributed object framework. Issues and strategies regarding the conversion and decomposition of Fortran codes into CORBA objects are discussed. The following diagram shows the conversion and decomposition mechanism we proposed. Our goal is to keep the Fortran codes unmodified. The conversion- aided tool takes the Fortran application program as input and helps programmers generate C/C++ header file and IDL file for wrapping the Fortran code. Programmers need to determine by themselves how to decompose the legacy application into several reusable components based on the cohesion and coupling factors among the functions and subroutines. However, programming effort still can be greatly reduced because function headings and types have been converted to C++ and IDL styles. Most Fortran applications use the COMMON block to facilitate the transfer of large amount of variables among several functions. The COMMON block plays the similar role of global variables used in C. In the CORBA-compliant programming environment, global variables can not be used to pass values between objects. One approach to dealing with this problem is to put the COMMON variables into the parameter list. We do not adopt this approach because it requires modification of the Fortran source code which violates our design consideration. Our approach is to extract the COMMON blocks and convert them into a structure-typed attribute in C++. Through attributes, each component can initialize the variables and return the computation result back to the client. We have tested successfully the proposed conversion methodology based on the f2c converter. Since f2c only translates Fortran to C, we still needed to edit the converted code to meet the C++ and IDL syntax. For example, C++/IDL requires a tag in the structure type, while C does not. In this paper, we identify the necessary changes to the f2c converter in order to directly generate the C++ header and the IDL file. Our future work is to add GUI interface to ease the decomposition task by simply dragging and dropping icons.
Architectures of small satellite programs in developing countries
NASA Astrophysics Data System (ADS)
Wood, Danielle; Weigel, Annalisa
2014-04-01
Global participation in space activity is growing as satellite technology matures and spreads. Countries in Africa, Asia and Latin America are creating or reinvigorating national satellite programs. These countries are building local capability in space through technological learning. This paper analyzes implementation approaches in small satellite programs within developing countries. The study addresses diverse examples of approaches used to master, adapt, diffuse and apply satellite technology in emerging countries. The work focuses on government programs that represent the nation and deliver services that provide public goods such as environmental monitoring. An original framework developed by the authors examines implementation approaches and contextual factors using the concept of Systems Architecture. The Systems Architecture analysis defines the satellite programs as systems within a context which execute functions via forms in order to achieve stakeholder objectives. These Systems Architecture definitions are applied to case studies of six satellite projects executed by countries in Africa and Asia. The architectural models used by these countries in various projects reveal patterns in the areas of training, technical specifications and partnership style. Based on these patterns, three Archetypal Project Architectures are defined which link the contextual factors to the implementation approaches. The three Archetypal Project Architectures lead to distinct opportunities for training, capability building and end user services.
NASA Technical Reports Server (NTRS)
Benbenek, Daniel; Soloff, Jason; Lieb, Erica
2010-01-01
Selecting a communications and network architecture for future manned space flight requires an evaluation of the varying goals and objectives of the program, development of communications and network architecture evaluation criteria, and assessment of critical architecture trades. This paper uses Cx Program proposed exploration activities as a guideline; lunar sortie, outpost, Mars, and flexible path options are described. A set of proposed communications network architecture criteria are proposed and described. They include: interoperability, security, reliability, and ease of automating topology changes. Finally a key set of architecture options are traded including (1) multiplexing data at a common network layer vs. at the data link layer, (2) implementing multiple network layers vs. a single network layer, and (3) the use of a particular network layer protocol, primarily IPv6 vs. Delay Tolerant Networking (DTN). In summary, the protocol options are evaluated against the proposed exploration activities and their relative performance with respect to the criteria are assessed. An architectural approach which includes (a) the capability of multiplexing at both the network layer and the data link layer and (b) a single network layer for operations at each program phase, as these solutions are best suited to respond to the widest array of program needs and meet each of the evaluation criteria.
NASA Astrophysics Data System (ADS)
Zou, Ding; Djordjevic, Ivan B.
2016-02-01
Forward error correction (FEC) is as one of the key technologies enabling the next-generation high-speed fiber optical communications. In this paper, we propose a rate-adaptive scheme using a class of generalized low-density parity-check (GLDPC) codes with a Hamming code as local code. We show that with the proposed unified GLDPC decoder architecture, a variable net coding gains (NCGs) can be achieved with no error floor at BER down to 10-15, making it a viable solution in the next-generation high-speed fiber optical communications.
Processor-in-memory-and-storage architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeBenedictis, Erik
A method and apparatus for performing reliable general-purpose computing. Each sub-core of a plurality of sub-cores of a processor core processes a same instruction at a same time. A code analyzer receives a plurality of residues that represents a code word corresponding to the same instruction and an indication of whether the code word is a memory address code or a data code from the plurality of sub-cores. The code analyzer determines whether the plurality of residues are consistent or inconsistent. The code analyzer and the plurality of sub-cores perform a set of operations based on whether the code wordmore » is a memory address code or a data code and a determination of whether the plurality of residues are consistent or inconsistent.« less
ERIC Educational Resources Information Center
Campbell, Ann
This document consists of two teaching manuals designed to accompany a commercially-available "multicultural, interdisciplinary video program," consisting of four still videotape programs (72 minutes, 226 frames), one teaching poster, and these two manuals. "Teacher's Manual: Ancient Greek Architecture" covers: "Ancient…
A high-performance Fortran code to calculate spin- and parity-dependent nuclear level densities
NASA Astrophysics Data System (ADS)
Sen'kov, R. A.; Horoi, M.; Zelevinsky, V. G.
2013-01-01
A high-performance Fortran code is developed to calculate the spin- and parity-dependent shell model nuclear level densities. The algorithm is based on the extension of methods of statistical spectroscopy and implies exact calculation of the first and second Hamiltonian moments for different configurations at fixed spin and parity. The proton-neutron formalism is used. We have applied the method for calculating the level densities for a set of nuclei in the sd-, pf-, and pf+g- model spaces. Examples of the calculations for 28Si (in the sd-model space) and 64Ge (in the pf+g-model space) are presented. To illustrate the power of the method we estimate the ground state energy of 64Ge in the larger model space pf+g, which is not accessible to direct shell model diagonalization due to the prohibitively large dimension, by comparing with the nuclear level densities at low excitation energy calculated in the smaller model space pf. Program summaryProgram title: MM Catalogue identifier: AENM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AENM_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 193181 No. of bytes in distributed program, including test data, etc.: 1298585 Distribution format: tar.gz Programming language: Fortran 90, MPI. Computer: Any architecture with a Fortran 90 compiler and MPI. Operating system: Linux. RAM: Proportional to the system size, in our examples, up to 75Mb Classification: 17.15. External routines: MPICH2 (http://www.mcs.anl.gov/research/projects/mpich2/) Nature of problem: Calculating of the spin- and parity-dependent nuclear level density. Solution method: The algorithm implies exact calculation of the first and second Hamiltonian moments for different configurations at fixed spin and parity. The code is parallelized using the Message Passing Interface and a master-slaves dynamical load-balancing approach. Restrictions: The program uses two-body interaction in a restricted single-level basis. For example, GXPF1A in the pf-valence space. Running time: Depends on the system size and the number of processors used (from 1 min to several hours).
Spatial correlation-based side information refinement for distributed video coding
NASA Astrophysics Data System (ADS)
Taieb, Mohamed Haj; Chouinard, Jean-Yves; Wang, Demin
2013-12-01
Distributed video coding (DVC) architecture designs, based on distributed source coding principles, have benefitted from significant progresses lately, notably in terms of achievable rate-distortion performances. However, a significant performance gap still remains when compared to prediction-based video coding schemes such as H.264/AVC. This is mainly due to the non-ideal exploitation of the video sequence temporal correlation properties during the generation of side information (SI). In fact, the decoder side motion estimation provides only an approximation of the true motion. In this paper, a progressive DVC architecture is proposed, which exploits the spatial correlation of the video frames to improve the motion-compensated temporal interpolation (MCTI). Specifically, Wyner-Ziv (WZ) frames are divided into several spatially correlated groups that are then sent progressively to the receiver. SI refinement (SIR) is performed as long as these groups are being decoded, thus providing more accurate SI for the next groups. It is shown that the proposed progressive SIR method leads to significant improvements over the Discover DVC codec as well as other SIR schemes recently introduced in the literature.
Benchmarking and tuning the MILC code on clusters and supercomputers
NASA Astrophysics Data System (ADS)
Gottlieb, Steven
2002-03-01
Recently, we have benchmarked and tuned the MILC code on a number of architectures including Intel Itanium and Pentium IV (PIV), dual-CPU Athlon, and the latest Compaq Alpha nodes. Results will be presented for many of these, and we shall discuss some simple code changes that can result in a very dramatic speedup of the KS conjugate gradient on processors with more advanced memory systems such as PIV, IBM SP and Alpha.
Benchmarking and tuning the MILC code on clusters and supercomputers
NASA Astrophysics Data System (ADS)
Gottlieb, Steven
Recently, we have benchmarked and tuned the MILC code on a number of architectures including Intel Itanium and Pentium IV (PIV), dual-CPU Athlon, and the latest Compaq Alpha nodes. Results will be presented for many of these, and we shall discuss some simple code changes that can result in a very dramatic speedup of the KS conjugate gradient on processors with more advanced memory systems such as PIV, IBM SP and Alpha.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems.
Stone, John E; Gohara, David; Shi, Guochun
2010-05-01
We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures.
Vaccarino, Anthony L; Dharsee, Moyez; Strother, Stephen; Aldridge, Don; Arnott, Stephen R; Behan, Brendan; Dafnas, Costas; Dong, Fan; Edgecombe, Kenneth; El-Badrawi, Rachad; El-Emam, Khaled; Gee, Tom; Evans, Susan G; Javadi, Mojib; Jeanson, Francis; Lefaivre, Shannon; Lutz, Kristen; MacPhee, F Chris; Mikkelsen, Jordan; Mikkelsen, Tom; Mirotchnick, Nicholas; Schmah, Tanya; Studzinski, Christa M; Stuss, Donald T; Theriault, Elizabeth; Evans, Kenneth R
2018-01-01
Historically, research databases have existed in isolation with no practical avenue for sharing or pooling medical data into high dimensional datasets that can be efficiently compared across databases. To address this challenge, the Ontario Brain Institute's "Brain-CODE" is a large-scale neuroinformatics platform designed to support the collection, storage, federation, sharing and analysis of different data types across several brain disorders, as a means to understand common underlying causes of brain dysfunction and develop novel approaches to treatment. By providing researchers access to aggregated datasets that they otherwise could not obtain independently, Brain-CODE incentivizes data sharing and collaboration and facilitates analyses both within and across disorders and across a wide array of data types, including clinical, neuroimaging and molecular. The Brain-CODE system architecture provides the technical capabilities to support (1) consolidated data management to securely capture, monitor and curate data, (2) privacy and security best-practices, and (3) interoperable and extensible systems that support harmonization, integration, and query across diverse data modalities and linkages to external data sources. Brain-CODE currently supports collaborative research networks focused on various brain conditions, including neurodevelopmental disorders, cerebral palsy, neurodegenerative diseases, epilepsy and mood disorders. These programs are generating large volumes of data that are integrated within Brain-CODE to support scientific inquiry and analytics across multiple brain disorders and modalities. By providing access to very large datasets on patients with different brain disorders and enabling linkages to provincial, national and international databases, Brain-CODE will help to generate new hypotheses about the biological bases of brain disorders, and ultimately promote new discoveries to improve patient care.
Unaligned instruction relocation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bertolli, Carlo; O'Brien, John K.; Sallenave, Olivier H.
In one embodiment, a computer-implemented method includes receiving source code to be compiled into an executable file for an unaligned instruction set architecture (ISA). Aligned assembled code is generated, by a computer processor. The aligned assembled code complies with an aligned ISA and includes aligned processor code for a processor and aligned accelerator code for an accelerator. A first linking pass is performed on the aligned assembled code, including relocating a first relocation target in the aligned accelerator code that refers to a first object outside the aligned accelerator code. Unaligned assembled code is generated in accordance with the unalignedmore » ISA and includes unaligned accelerator code for the accelerator and unaligned processor code for the processor. A second linking pass is performed on the unaligned assembled code, including relocating a second relocation target outside the unaligned accelerator code that refers to an object in the unaligned accelerator code.« less
Unaligned instruction relocation
Bertolli, Carlo; O'Brien, John K.; Sallenave, Olivier H.; Sura, Zehra N.
2018-01-23
In one embodiment, a computer-implemented method includes receiving source code to be compiled into an executable file for an unaligned instruction set architecture (ISA). Aligned assembled code is generated, by a computer processor. The aligned assembled code complies with an aligned ISA and includes aligned processor code for a processor and aligned accelerator code for an accelerator. A first linking pass is performed on the aligned assembled code, including relocating a first relocation target in the aligned accelerator code that refers to a first object outside the aligned accelerator code. Unaligned assembled code is generated in accordance with the unaligned ISA and includes unaligned accelerator code for the accelerator and unaligned processor code for the processor. A second linking pass is performed on the unaligned assembled code, including relocating a second relocation target outside the unaligned accelerator code that refers to an object in the unaligned accelerator code.
A Reference Architecture for Space Information Management
NASA Technical Reports Server (NTRS)
Mattmann, Chris A.; Crichton, Daniel J.; Hughes, J. Steven; Ramirez, Paul M.; Berrios, Daniel C.
2006-01-01
We describe a reference architecture for space information management systems that elegantly overcomes the rigid design of common information systems in many domains. The reference architecture consists of a set of flexible, reusable, independent models and software components that function in unison, but remain separately managed entities. The main guiding principle of the reference architecture is to separate the various models of information (e.g., data, metadata, etc.) from implemented system code, allowing each to evolve independently. System modularity, systems interoperability, and dynamic evolution of information system components are the primary benefits of the design of the architecture. The architecture requires the use of information models that are substantially more advanced than those used by the vast majority of information systems. These models are more expressive and can be more easily modularized, distributed and maintained than simpler models e.g., configuration files and data dictionaries. Our current work focuses on formalizing the architecture within a CCSDS Green Book and evaluating the architecture within the context of the C3I initiative.
Fault-Tolerant Software-Defined Radio on Manycore
NASA Technical Reports Server (NTRS)
Ricketts, Scott
2015-01-01
Software-defined radio (SDR) platforms generally rely on field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), but such architectures require significant software development. In addition, application demands for radiation mitigation and fault tolerance exacerbate programming challenges. MaXentric Technologies, LLC, has developed a manycore-based SDR technology that provides 100 times the throughput of conventional radiationhardened general purpose processors. Manycore systems (30-100 cores and beyond) have the potential to provide high processing performance at error rates that are equivalent to current space-deployed uniprocessor systems. MaXentric's innovation is a highly flexible radio, providing over-the-air reconfiguration; adaptability; and uninterrupted, real-time, multimode operation. The technology is also compliant with NASA's Space Telecommunications Radio System (STRS) architecture. In addition to its many uses within NASA communications, the SDR can also serve as a highly programmable research-stage prototyping device for new waveforms and other communications technologies. It can also support noncommunication codes on its multicore processor, collocated with the communications workload-reducing the size, weight, and power of the overall system by aggregating processing jobs to a single board computer.
NASA Astrophysics Data System (ADS)
Ames, D. P.
2013-12-01
As has been seen in other informatics fields, well-documented and appropriately licensed open source software tools have the potential to significantly increase both opportunities and motivation for inter-institutional science and technology collaboration. The CUAHSI HIS (and related HydroShare) projects have aimed to foster such activities in hydrology resulting in the development of many useful community software components including the HydroDesktop software application. HydroDesktop is an open source, GIS-based, scriptable software application for discovering data on the CUAHSI Hydrologic Information System and related resources. It includes a well-defined plugin architecture and interface to allow 3rd party developers to create extensions and add new functionality without requiring recompiling of the full source code. HydroDesktop is built in the C# programming language and uses the open source DotSpatial GIS engine for spatial data management. Capabilities include data search, discovery, download, visualization, and export. An extension that integrates the R programming language with HydroDesktop provides scripting and data automation capabilities and an OpenMI plugin provides the ability to link models. Current revision and updates to HydroDesktop include migration of core business logic to cross platform, scriptable Python code modules that can be executed in any operating system or linked into other software front-end applications.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.
2000-01-01
Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
A PROGRAM FOR ARCHITECTURAL TECHNICIAN'S TRAINING.
ERIC Educational Resources Information Center
TERNSTROM, CLINTON C.; AND OTHERS
ARCHITECTURAL TECHNICIANS TRANSLATE DESIGN AND SYSTEMS SOLUTIONS INTO GRAPHIC AND WRITTEN FORM AND ASSIST IN RENDERING ARCHITECTURAL SERVICES. IN 1966, A STUDY GROUP FROM THE AMERICAN INSTITUTE OF ARCHITECTS FOUND THAT EXISTING 2-YEAR PROGRAMS WERE INADEQUATE, FALLING INTO ONE OF TWO CATEGORIES--(1) DRAFTING COURSES WHICH LACKED BREADTH AND FAILED…
Hands-On Educational Programs and Projects at SICSA
NASA Astrophysics Data System (ADS)
Bell, L.
2002-01-01
The Sasakawa International Center for Space Architecture (SICSA) has a long history of projects that involve the design of space structures, including habitats for low-Earth orbit (LEO) and planetary applications. Some of these projects are supported by corporate sponsors, such as a space tourism research, planning and design study conducted for the owner of national U.S. hotel chain. Some have been undertaken in support of programs sponsored by the State Government of Texas, including current commercial spaceport development planning for the Texas Aerospace Commission and three counties that represent candidate spaceport sites. Other projects have been supported by NASA and the Texas Aerospace Consortium, including the design and development of SICSA's "Space Habitation Laboratory", a space station module sized environmental simulator facility which has been featured in the "NASA Select" television broadcast series. This presentation will highlight representative projects. SICSA is internationally recognized for its leadership in the field of space architecture. Many program graduates have embarked upon productive and rewarding careers with aerospace organizations throughout the world. NASA has awarded certificates of appreciation to SICSA for significant achievements contributing to its advanced design initiatives. SICSA and its work have been featured in numerous popular magazines, professional publications, and public media broadcasts in many countries. SICSA applies a very comprehensive scope of activities to the practice of space architecture. Important roles include mission planning conceptualization of orbital and planetary structures and assembly processes, and design of habitats to optimize human safety, adaptation and productivity. SICSA sponsors educational programs for upper division undergraduate students and graduate students with interests in space and experimental architecture. Many fourth year participants continue in the SICSA program throughout their remaining undergraduate studies, and are joined by other new fifth year students. Selected graduate applicants holding a professional degree in architecture from accredited colleges and universities can earn a Master of Architecture degree with a specialization in space and experimental architecture upon completion of 32 credit hours of study which includes two six-hour design studios. Accepted graduate students seeking a Master of Architecture degree who do not hold a professional architecture degree may enter SICSA studios during the final year of their minimum 72 credit hours of study. Subject to necessary University of Houston and Texas Higher Education Coordinating Board approvals, SICSA and the College of Architecture propose to expand their graduate education role to add a Master of Science in Space Architecture degree program. This new program is primarily being planned in response to known interests of non-architect professionals from NASA and aerospace corporations who wish to pursue advanced space architecture research and design studies. The program will be also available to working professionals holding an undergraduate architectural degree.
NASA Technical Reports Server (NTRS)
Koppenhoefer, Kyle C.; Gullerud, Arne S.; Ruggieri, Claudio; Dodds, Robert H., Jr.; Healy, Brian E.
1998-01-01
This report describes theoretical background material and commands necessary to use the WARP3D finite element code. WARP3D is under continuing development as a research code for the solution of very large-scale, 3-D solid models subjected to static and dynamic loads. Specific features in the code oriented toward the investigation of ductile fracture in metals include a robust finite strain formulation, a general J-integral computation facility (with inertia, face loading), an element extinction facility to model crack growth, nonlinear material models including viscoplastic effects, and the Gurson-Tver-gaard dilatant plasticity model for void growth. The nonlinear, dynamic equilibrium equations are solved using an incremental-iterative, implicit formulation with full Newton iterations to eliminate residual nodal forces. The history integration of the nonlinear equations of motion is accomplished with Newmarks Beta method. A central feature of WARP3D involves the use of a linear-preconditioned conjugate gradient (LPCG) solver implemented in an element-by-element format to replace a conventional direct linear equation solver. This software architecture dramatically reduces both the memory requirements and CPU time for very large, nonlinear solid models since formation of the assembled (dynamic) stiffness matrix is avoided. Analyses thus exhibit the numerical stability for large time (load) steps provided by the implicit formulation coupled with the low memory requirements characteristic of an explicit code. In addition to the much lower memory requirements of the LPCG solver, the CPU time required for solution of the linear equations during each Newton iteration is generally one-half or less of the CPU time required for a traditional direct solver. All other computational aspects of the code (element stiffnesses, element strains, stress updating, element internal forces) are implemented in the element-by- element, blocked architecture. This greatly improves vectorization of the code on uni-processor hardware and enables straightforward parallel-vector processing of element blocks on multi-processor hardware.
Federal Register 2010, 2011, 2012, 2013, 2014
2013-01-18
...) Program using the Quality Reporting Document Architecture (QRDA) Category I. The comment period for the... Reporting (IQR) Program using the Quality Reporting Document Architecture (QRDA) Category I beginning with...
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
Stone, John E.; Gohara, David; Shi, Guochun
2010-01-01
We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures. PMID:21037981
Orthorectification by Using Gpgpu Method
NASA Astrophysics Data System (ADS)
Sahin, H.; Kulur, S.
2012-07-01
Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Architectural Analysis of Systems Based on the Publisher-Subscriber Style
NASA Technical Reports Server (NTRS)
Ganesun, Dharmalingam; Lindvall, Mikael; Ruley, Lamont; Wiegand, Robert; Ly, Vuong; Tsui, Tina
2010-01-01
Architectural styles impose constraints on both the topology and the interaction behavior of involved parties. In this paper, we propose an approach for analyzing implemented systems based on the publisher-subscriber architectural style. From the style definition, we derive a set of reusable questions and show that some of them can be answered statically whereas others are best answered using dynamic analysis. The paper explains how the results of static analysis can be used to orchestrate dynamic analysis. The proposed method was successfully applied on the NASA's Goddard Mission Services Evolution Center (GMSEC) software product line. The results show that the GMSEC has a) a novel reusable vendor-independent middleware abstraction layer that allows the NASA's missions to configure the middleware of interest without changing the publishers' or subscribers' source code, and b) some high priority bugs due to behavioral discrepancies, which were eluded during testing and code reviews, among different implementations of the same APIs for different vendors.
Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers
NASA Astrophysics Data System (ADS)
Oyarzun, Guillermo; Borrell, Ricard; Gorobets, Andrey; Oliva, Assensi
2017-10-01
Nowadays, high performance computing (HPC) systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation (LES) of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix-vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the nonlinear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.
NASA Astrophysics Data System (ADS)
Yan, Hui; Wang, K. G.; Jones, Jim E.
2016-06-01
A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
Energy Programming for Buildings.
ERIC Educational Resources Information Center
Parshall, Steven; Diserens, Steven
1982-01-01
Programing is described as a process leading to an explicit statement of an architectural problem. The programing phase is seen as the most critical period in the delivery process in which energy analysis can have an impact on design. A programing method appropriate for standard architectural practice is provided. (MLW)
Evaluation of computational endomicroscopy architectures for minimally-invasive optical biopsy
NASA Astrophysics Data System (ADS)
Dumas, John P.; Lodhi, Muhammad A.; Bajwa, Waheed U.; Pierce, Mark C.
2017-02-01
We are investigating compressive sensing architectures for applications in endomicroscopy, where the narrow diameter probes required for tissue access can limit the achievable spatial resolution. We hypothesize that the compressive sensing framework can be used to overcome the fundamental pixel number limitation in fiber-bundle based endomicroscopy by reconstructing images with more resolvable points than fibers in the bundle. An experimental test platform was assembled to evaluate and compare two candidate architectures, based on introducing a coded amplitude mask at either a conjugate image or Fourier plane within the optical system. The benchtop platform consists of a common illumination and object path followed by separate imaging arms for each compressive architecture. The imaging arms contain a digital micromirror device (DMD) as a reprogrammable mask, with a CCD camera for image acquisition. One arm has the DMD positioned at a conjugate image plane ("IP arm"), while the other arm has the DMD positioned at a Fourier plane ("FP arm"). Lenses were selected and positioned within each arm to achieve an element-to-pixel ratio of 16 (230,400 mask elements mapped onto 14,400 camera pixels). We discuss our mathematical model for each system arm and outline the importance of accounting for system non-idealities. Reconstruction of a 1951 USAF resolution target using optimization-based compressive sensing algorithms produced images with higher spatial resolution than bicubic interpolation for both system arms when system non-idealities are included in the model. Furthermore, images generated with image plane coding appear to exhibit higher spatial resolution, but more noise, than images acquired through Fourier plane coding.
Reuse: A knowledge-based approach
NASA Technical Reports Server (NTRS)
Iscoe, Neil; Liu, Zheng-Yang; Feng, Guohui
1992-01-01
This paper describes our research in automating the reuse process through the use of application domain models. Application domain models are explicit formal representations of the application knowledge necessary to understand, specify, and generate application programs. Furthermore, they provide a unified repository for the operational structure, rules, policies, and constraints of a specific application area. In our approach, domain models are expressed in terms of a transaction-based meta-modeling language. This paper has described in detail the creation and maintenance of hierarchical structures. These structures are created through a process that includes reverse engineering of data models with supplementary enhancement from application experts. Source code is also reverse engineered but is not a major source of domain model instantiation at this time. In the second phase of the software synthesis process, program specifications are interactively synthesized from an instantiated domain model. These specifications are currently integrated into a manual programming process but will eventually be used to derive executable code with mechanically assisted transformations. This research is performed within the context of programming-in-the-large types of systems. Although our goals are ambitious, we are implementing the synthesis system in an incremental manner through which we can realize tangible results. The client/server architecture is capable of supporting 16 simultaneous X/Motif users and tens of thousands of attributes and classes. Domain models have been partially synthesized from five different application areas. As additional domain models are synthesized and additional knowledge is gathered, we will inevitably add to and modify our representation. However, our current experience indicates that it will scale and expand to meet our modeling needs.
A high throughput architecture for a low complexity soft-output demapping algorithm
NASA Astrophysics Data System (ADS)
Ali, I.; Wasenmüller, U.; Wehn, N.
2015-11-01
Iterative channel decoders such as Turbo-Code and LDPC decoders show exceptional performance and therefore they are a part of many wireless communication receivers nowadays. These decoders require a soft input, i.e., the logarithmic likelihood ratio (LLR) of the received bits with a typical quantization of 4 to 6 bits. For computing the LLR values from a received complex symbol, a soft demapper is employed in the receiver. The implementation cost of traditional soft-output demapping methods is relatively large in high order modulation systems, and therefore low complexity demapping algorithms are indispensable in low power receivers. In the presence of multiple wireless communication standards where each standard defines multiple modulation schemes, there is a need to have an efficient demapper architecture covering all the flexibility requirements of these standards. Another challenge associated with hardware implementation of the demapper is to achieve a very high throughput in double iterative systems, for instance, MIMO and Code-Aided Synchronization. In this paper, we present a comprehensive communication and hardware performance evaluation of low complexity soft-output demapping algorithms to select the best algorithm for implementation. The main goal of this work is to design a high throughput, flexible, and area efficient architecture. We describe architectures to execute the investigated algorithms. We implement these architectures on a FPGA device to evaluate their hardware performance. The work has resulted in a hardware architecture based on the figured out best low complexity algorithm delivering a high throughput of 166 Msymbols/second for Gray mapped 16-QAM modulation on Virtex-5. This efficient architecture occupies only 127 slice registers, 248 slice LUTs and 2 DSP48Es.
Richmond, Paul; Buesing, Lars; Giugliano, Michele; Vasilaki, Eleni
2011-05-04
High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU programming to simulate a two-layer network of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and investigate its ability to learn a simplified navigation task using a policy-gradient learning rule stemming from Reinforcement Learning. The purpose of this paper is twofold. First, we want to support the use of GPUs in the field of Computational Neuroscience. Second, using GPU computing power, we investigate the conditions under which the said architecture and learning rule demonstrate best performance. Our work indicates that networks featuring strong Mexican-Hat-shaped recurrent connections in the top layer, where decision making is governed by the formation of a stable activity bump in the neural population (a "non-democratic" mechanism), achieve mediocre learning results at best. In absence of recurrent connections, where all neurons "vote" independently ("democratic") for a decision via population vector readout, the task is generally learned better and more robustly. Our study would have been extremely difficult on a desktop computer without the use of GPU programming. We present the routines developed for this purpose and show that a speed improvement of 5x up to 42x is provided versus optimised Python code. The higher speed is achieved when we exploit the parallelism of the GPU in the search of learning parameters. This suggests that efficient GPU programming can significantly reduce the time needed for simulating networks of spiking neurons, particularly when multiple parameter configurations are investigated.
MULTI2D - a computer code for two-dimensional radiation hydrodynamics
NASA Astrophysics Data System (ADS)
Ramis, R.; Meyer-ter-Vehn, J.; Ramírez, J.
2009-06-01
Simulation of radiation hydrodynamics in two spatial dimensions is developed, having in mind, in particular, target design for indirectly driven inertial confinement energy (IFE) and the interpretation of related experiments. Intense radiation pulses by laser or particle beams heat high-Z target configurations of different geometries and lead to a regime which is optically thick in some regions and optically thin in others. A diffusion description is inadequate in this situation. A new numerical code has been developed which describes hydrodynamics in two spatial dimensions (cylindrical R-Z geometry) and radiation transport along rays in three dimensions with the 4 π solid angle discretized in direction. Matter moves on a non-structured mesh composed of trilateral and quadrilateral elements. Radiation flux of a given direction enters on two (one) sides of a triangle and leaves on the opposite side(s) in proportion to the viewing angles depending on the geometry. This scheme allows to propagate sharply edged beams without ray tracing, though at the price of some lateral diffusion. The algorithm treats correctly both the optically thin and optically thick regimes. A symmetric semi-implicit (SSI) method is used to guarantee numerical stability. Program summaryProgram title: MULTI2D Catalogue identifier: AECV_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECV_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 151 098 No. of bytes in distributed program, including test data, etc.: 889 622 Distribution format: tar.gz Programming language: C Computer: PC (32 bits architecture) Operating system: Linux/Unix RAM: 2 Mbytes Word size: 32 bits Classification: 19.7 External routines: X-window standard library (libX11.so) and corresponding heading files (X11/*.h) are required. Nature of problem: In inertial confinement fusion and related experiments with lasers and particle beams, energy transport by thermal radiation becomes important. Under these conditions, the radiation field strongly interacts with the hydrodynamic motion through emission and absorption processes. Solution method: The equations of radiation transfer coupled with Lagrangian hydrodynamics, heat diffusion and beam tracing (laser or ions) are solved, in two-dimensional axial-symmetric geometry ( R-Z coordinates) using a fractional step scheme. Radiation transfer is solved with angular resolution. Matter properties are either interpolated from tables (equations-of-state and opacities) or computed by user routines (conductivities and beam attenuation). Restrictions: The code has been designed for typical conditions prevailing in inertial confinement fusion (ns time scale, matter states close to local thermodynamical equilibrium, negligible radiation pressure, …). Although a wider range of situations can be treated, extrapolations to regions beyond this design range need special care. Unusual features: A special computer language, called r94, is used at top levels of the code. These parts have to be converted to standard C by a translation program (supplied as part of the package). Due to the complexity of code (hydro-code, grid generation, user interface, graphic post-processor, translator program, installation scripts) extensive manuals are supplied as part of the package. Running time: 567 seconds for the example supplied.
ERIC Educational Resources Information Center
Luce, Ann Campbell
This resource contains a teaching manual, reproducible student workbook, and color teaching poster, which were designed to accompany a 2-part, 34-minute videotape, but may be adapted for independent use. Part 1 of the program, "The Old Kingdom," explains Egyptian beliefs concerning life after death as evidenced in art, architecture and…
MAP3D: a media processor approach for high-end 3D graphics
NASA Astrophysics Data System (ADS)
Darsa, Lucia; Stadnicki, Steven; Basoglu, Chris
1999-12-01
Equator Technologies, Inc. has used a software-first approach to produce several programmable and advanced VLIW processor architectures that have the flexibility to run both traditional systems tasks and an array of media-rich applications. For example, Equator's MAP1000A is the world's fastest single-chip programmable signal and image processor targeted for digital consumer and office automation markets. The Equator MAP3D is a proposal for the architecture of the next generation of the Equator MAP family. The MAP3D is designed to achieve high-end 3D performance and a variety of customizable special effects by combining special graphics features with high performance floating-point and media processor architecture. As a programmable media processor, it offers the advantages of a completely configurable 3D pipeline--allowing developers to experiment with different algorithms and to tailor their pipeline to achieve the highest performance for a particular application. With the support of Equator's advanced C compiler and toolkit, MAP3D programs can be written in a high-level language. This allows the compiler to successfully find and exploit any parallelism in a programmer's code, thus decreasing the time to market of a given applications. The ability to run an operating system makes it possible to run concurrent applications in the MAP3D chip, such as video decoding while executing the 3D pipelines, so that integration of applications is easily achieved--using real-time decoded imagery for texturing 3D objects, for instance. This novel architecture enables an affordable, integrated solution for high performance 3D graphics.
Networked Workstations and Parallel Processing Utilizing Functional Languages
1993-03-01
program . This frees the programmer to concentrate on what the program is to do, not how the program is...traditional ’von Neumann’ architecture uses a timer based (e.g., the program counter), sequentially pro- grammed, single processor approach to problem...traditional ’von Neumann’ architecture uses a timer based (e.g., the program counter), sequentially programmed , single processor approach to
Decoding the function of nuclear long non-coding RNAs.
Chen, Ling-Ling; Carmichael, Gordon G
2010-06-01
Long non-coding RNAs (lncRNAs) are mRNA-like, non-protein-coding RNAs that are pervasively transcribed throughout eukaryotic genomes. Rather than silently accumulating in the nucleus, many of these are now known or suspected to play important roles in nuclear architecture or in the regulation of gene expression. In this review, we highlight some recent progress in how lncRNAs regulate these important nuclear processes at the molecular level. Copyright 2010 Elsevier Ltd. All rights reserved.
THE ACTIVITY/SPACE, A LEAST COMMON DENOMINATOR FOR ARCHITECTURAL PROGRAMMING.
ERIC Educational Resources Information Center
HAVILAND, DAVID S.
TWO INTERRELATED PROBLEM AREAS OF ARCHITECTURAL PROGRAMING ARE DISCUSSED--(1) "NEEDS DEFINITION," AND (2) "NEEDS DOCUMENTATION AND COMMUNICATION". FUNDAMENTAL ISSUES AND WORK OF THE CENTER FOR ARCHITECTURAL RESEARCH ARE PRESENTED. ISSUES ARE THE FAILURE TO RECOGNIZE HOW, WHEN, AND IN WHAT FORM THE NEED WILL BE USED. CRITERIA FORMULATION MUST BE…
Proceedings of the Mobile Satellite System Architectures and Multiple Access Techniques Workshop
NASA Technical Reports Server (NTRS)
Dessouky, Khaled
1989-01-01
The Mobile Satellite System Architectures and Multiple Access Techniques Workshop served as a forum for the debate of system and network architecture issues. Particular emphasis was on those issues relating to the choice of multiple access technique(s) for the Mobile Satellite Service (MSS). These proceedings contain articles that expand upon the 12 presentations given in the workshop. Contrasting views on Frequency Division Multiple Access (FDMA), Code Division Multiple Access (CDMA), and Time Division Multiple Access (TDMA)-based architectures are presented, and system issues relating to signaling, spacecraft design, and network management constraints are addressed. An overview article that summarizes the issues raised in the numerous discussion periods of the workshop is also included.
NASA Astrophysics Data System (ADS)
Souami, M. A.
2013-07-01
In a other work, we have highlighted a theoretical point of view that there is an relation between the earthquake-resistant architectural design codes and, the urban and stylistic characteristics of buildings and urban forms of the Algiers architectural heritage dating between 1830 and 1930. Following this, we hypothesized that its various stylistic and urban characteristics have a direct impact on the resilience of buildings to earthquakes. The purpose of this article is to try through the computer simulation examples of some stylistic and urban characteristics to prove the validity or not of our hypothesis.
Scalable and portable visualization of large atomistic datasets
NASA Astrophysics Data System (ADS)
Sharma, Ashish; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya
2004-10-01
A scalable and portable code named Atomsviewer has been developed to interactively visualize a large atomistic dataset consisting of up to a billion atoms. The code uses a hierarchical view frustum-culling algorithm based on the octree data structure to efficiently remove atoms outside of the user's field-of-view. Probabilistic and depth-based occlusion-culling algorithms then select atoms, which have a high probability of being visible. Finally a multiresolution algorithm is used to render the selected subset of visible atoms at varying levels of detail. Atomsviewer is written in C++ and OpenGL, and it has been tested on a number of architectures including Windows, Macintosh, and SGI. Atomsviewer has been used to visualize tens of millions of atoms on a standard desktop computer and, in its parallel version, up to a billion atoms. Program summaryTitle of program: Atomsviewer Catalogue identifier: ADUM Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUM Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: 2.4 GHz Pentium 4/Xeon processor, professional graphics card; Apple G4 (867 MHz)/G5, professional graphics card Operating systems under which the program has been tested: Windows 2000/XP, Mac OS 10.2/10.3, SGI IRIX 6.5 Programming languages used: C++, C and OpenGL Memory required to execute with typical data: 1 gigabyte of RAM High speed storage required: 60 gigabytes No. of lines in the distributed program including test data, etc.: 550 241 No. of bytes in the distributed program including test data, etc.: 6 258 245 Number of bits in a word: Arbitrary Number of processors used: 1 Has the code been vectorized or parallelized: No Distribution format: tar gzip file Nature of physical problem: Scientific visualization of atomic systems Method of solution: Rendering of atoms using computer graphic techniques, culling algorithms for data minimization, and levels-of-detail for minimal rendering Restrictions on the complexity of the problem: None Typical running time: The program is interactive in its execution Unusual features of the program: None References: The conceptual foundation and subsequent implementation of the algorithms are found in [A. Sharma, A. Nakano, R.K. Kalia, P. Vashishta, S. Kodiyalam, P. Miller, W. Zhao, X.L. Liu, T.J. Campbell, A. Haas, Presence—Teleoperators and Virtual Environments 12 (1) (2003)].
Taniguchi, Masahiko; Du, Hai; Lindsey, Jonathan S
2013-09-23
A wide variety of cyclic molecular architectures are built of modular subunits and can be formed combinatorially. The mathematics for enumeration of such objects is well-developed yet lacks key features of importance in chemistry, such as specifying (i) the structures of individual members among a set of isomers, (ii) the distribution (i.e., relative amounts) of products, and (iii) the effect of nonequal ratios of reacting monomers on the product distribution. Here, a software program (Cyclaplex) has been developed to determine the number, identity (including isomers), and relative amounts of linear and cyclic architectures from a given number and ratio of reacting monomers. The program includes both mathematical formulas and generative algorithms for enumeration; the latter go beyond the former to provide desired molecular-relevant information and data-mining features. The program is equipped to enumerate four types of architectures: (i) linear architectures with directionality (macroscopic equivalent = electrical extension cords), (ii) linear architectures without directionality (batons), (iii) cyclic architectures with directionality (necklaces), and (iv) cyclic architectures without directionality (bracelets). The program can be applied to cyclic peptides, cycloveratrylenes, cyclens, calixarenes, cyclodextrins, crown ethers, cucurbiturils, annulenes, expanded meso-substituted porphyrin(ogen)s, and diverse supramolecular (e.g., protein) assemblies. The size of accessible architectures encompasses up to 12 modular subunits derived from 12 reacting monomers or larger architectures (e.g. 13-17 subunits) from fewer types of monomers (e.g. 2-4). A particular application concerns understanding the possible heterogeneity of (natural or biohybrid) photosynthetic light-harvesting oligomers (cyclic, linear) formed from distinct peptide subunits.
NASA Astrophysics Data System (ADS)
Daniluk, Andrzej
2011-06-01
A computational model is a computer program, which attempts to simulate an abstract model of a particular system. Computational models use enormous calculations and often require supercomputer speed. As personal computers are becoming more and more powerful, more laboratory experiments can be converted into computer models that can be interactively examined by scientists and students without the risk and cost of the actual experiments. The future of programming is concurrent programming. The threaded programming model provides application programmers with a useful abstraction of concurrent execution of multiple tasks. The objective of this release is to address the design of architecture for scientific application, which may execute as multiple threads execution, as well as implementations of the related shared data structures. New version program summaryProgram title: GrowthCP Catalogue identifier: ADVL_v4_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADVL_v4_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 32 269 No. of bytes in distributed program, including test data, etc.: 8 234 229 Distribution format: tar.gz Programming language: Free Object Pascal Computer: multi-core x64-based PC Operating system: Windows XP, Vista, 7 Has the code been vectorised or parallelized?: No RAM: More than 1 GB. The program requires a 32-bit or 64-bit processor to run the generated code. Memory is addressed using 32-bit (on 32-bit processors) or 64-bit (on 64-bit processors with 64-bit addressing) pointers. The amount of addressed memory is limited only by the available amount of virtual memory. Supplementary material: The figures mentioned in the "Summary of revisions" section can be obtained here. Classification: 4.3, 7.2, 6.2, 8, 14 External routines: Lazarus [1] Catalogue identifier of previous version: ADVL_v3_0 Journal reference of previous version: Comput. Phys. Comm. 181 (2010) 709 Does the new version supersede the previous version?: Yes Nature of problem: Reflection high-energy electron diffraction (RHEED) is an important in-situ analysis technique, which is capable of giving quantitative information about the growth process of thin layers and its control. It can be used to calibrate growth rate, analyze surface morphology, calibrate surface temperature, monitor the arrangement of the surface atoms, and provide information about growth kinetics. Such control allows the development of structures where the electrons can be confined in space, giving quantum wells or even quantum dots. In order to determine the atomic positions of atoms in the first few layers, the RHEED intensity must be measured as a function of the scattering angles and then compared with dynamic calculations. The objective of this release is to address the design of architecture for application that simulates the rocking curves RHEED intensities during hetero-epitaxial growth process of thin films. Solution method: The GrowthCP is a complex numerical model that uses multiple threads for simulation of epitaxial growth of thin layers. This model consists of two transactional parts. The first part is a mathematical model being based on the Runge-Kutta method with adaptive step-size control. The second part represents first-principles of the one-dimensional RHEED computational model. This model is based on solving a one-dimensional Schrödinger equation. Several problems can arise when applications contain a mixture of data access code, numerical code, and presentation code. Such applications are difficult to maintain, because interdependencies between all the components cause strong ripple effects whenever a change is made anywhere. Adding new data views often requires reimplementing a numerical code, which then requires maintenance in multiple places. In order to solve problems of this type, the computational and threading layers of the project have been implemented in the form of one design pattern as a part of Model-View-Controller architecture. Reasons for new version: Responding to the users' feedback the Growth09 project has been upgraded to a standard that allows the carrying out of sample computations of the RHEED intensities for a disordered surface for a wide range of single- and epitaxial hetero-structures. The design pattern on which the project is based has also been improved. It is shown that this model can be effectively used for multithreaded growth simulations of thin epitaxial layers and corresponding RHEED intensities for a wide range of single- and hetero-structures. Responding to the users' feedback the present release has been implemented using a well-documented free compiler [1] not requiring the special configuration and installation additional libraries. Summary of revisions: The logical structure of the Growth09 program has been modified according to the scheme showed in Fig. 1. The class diagram in Fig. 1 is a static view of the main platform-specific elements of the GrowthCP architecture. Fig. 2 provides a dynamic view by showing the creation and destruction simplistic sequence diagram for the process. The program requires the user to provide the appropriate parameters in the form of a knowledge base for the crystal structures under investigation. These parameters are loaded from the parameters. ini files at run-time. Instructions to prepare the .ini files can be found in the new distribution. The program enables carrying out different growth models and one-dimensional dynamical RHEED calculations for the fcc lattice with basis of three-atoms, fcc lattice with basis of two-atoms, fcc lattice with single atom basis, Zinc-Blende, Sodium Chloride, and Wurtzite crystalline structures and hetero-structures, but yet the Fourier component of the scattering potential in the TRHEEDCalculations.crystPotUgXXX() procedure can be modified and implemented according to users' specific application requirements. The Fourier component of the scattering potential of the whole crystalline hetero-structures can be determined as a sum of contributions coming from all thin slices of individual atomic layers. To carry out one-dimensional calculations of the scattering potentials, the program uses properly constructed self-consistent procedures. Each component of the system shown in Figs. 1 and 2 is fully extendable and can easily be adapted to new changeable requirements. Two essential logical elements of the system, i.e. TGrowthTransaction and TRHEEDCalculations classes, were designed and implemented in this way for them to pass the information to themselves without the need to use the data-exchange files given. In consequence each of them can be independently modified and/or extended. Implementing other types of differential equations and the different algorithm for solving them in the TGrowthTransaction class does not require another implementation of the TRHEEDCalculations class. Similarly, implementing other forms of scattering potential and different algorithm for RHEED calculation stays without the influence on the TGrowthTransaction class construction. Unusual features: The program is distributed in the form of main project GrowthCP.lpr, with associated files, and should be compiled using Lazarus IDE. The program should be compiled with English/USA regional and language options. Running time: The typical running time is machine and user-parameters dependent. References: http://sourceforge.net/projects/lazarus/files/.
NASA Astrophysics Data System (ADS)
Hegde, Ganapathi; Vaya, Pukhraj
2013-10-01
This article presents a parallel architecture for 3-D discrete wavelet transform (3-DDWT). The proposed design is based on the 1-D pipelined lifting scheme. The architecture is fully scalable beyond the present coherent Daubechies filter bank (9, 7). This 3-DDWT architecture has advantages such as no group of pictures restriction and reduced memory referencing. It offers low power consumption, low latency and high throughput. The computing technique is based on the concept that lifting scheme minimises the storage requirement. The application specific integrated circuit implementation of the proposed architecture is done by synthesising it using 65 nm Taiwan Semiconductor Manufacturing Company standard cell library. It offers a speed of 486 MHz with a power consumption of 2.56 mW. This architecture is suitable for real-time video compression even with large frame dimensions.
Engineering High Assurance Distributed Cyber Physical Systems
2015-01-15
decisions: number of interacting agents and co-dependent decisions made in real-time without causing interference . To engineer a high assurance DART...environment specification, architecture definition, domain-specific languages, design patterns, code - generation, analysis, test-generation, and simulation...include synchronization between the models and source code , debugging at the model level, expression of the design intent, and quality of service
SLS-SPEC-159 Cross-Program Design Specification for Natural Environments (DSNE) Revision E
NASA Technical Reports Server (NTRS)
Roberts, Barry C.
2017-01-01
The DSNE completes environment-related specifications for architecture, system-level, and lower-tier documents by specifying the ranges of environmental conditions that must be accounted for by NASA ESD Programs. To assure clarity and consistency, and to prevent requirements documents from becoming cluttered with extensive amounts of technical material, natural environment specifications have been compiled into this document. The intent is to keep a unified specification for natural environments that each Program calls out for appropriate application. This document defines the natural environments parameter limits (maximum and minimum values, energy spectra, or precise model inputs, assumptions, model options, etc.), for all ESD Programs. These environments are developed by the NASA Marshall Space Flight Center (MSFC) Natural Environments Branch (MSFC organization code: EV44). Many of the parameter limits are based on experience with previous programs, such as the Space Shuttle Program. The parameter limits contain no margin and are meant to be evaluated individually to ensure they are reasonable (i.e., do not apply unrealistic extreme-on-extreme conditions). The natural environments specifications in this document should be accounted for by robust design of the flight vehicle and support systems. However, it is understood that in some cases the Programs will find it more effective to account for portions of the environment ranges by operational mitigation or acceptance of risk in accordance with an appropriate program risk management plan and/or hazard analysis process. The DSNE is not intended as a definition of operational models or operational constraints, nor is it adequate, alone, for ground facilities which may have additional requirements (for example, building codes and local environmental constraints). "Natural environments," as the term is used here, refers to the environments that are not the result of intended human activity or intervention. It consists of a variety of external environmental factors (most of natural origin and a few of human origin) which impose restrictions or otherwise impact the development or operation of flight vehicles and destination surface systems.
AutoBayes Program Synthesis System System Internals
NASA Technical Reports Server (NTRS)
Schumann, Johann Martin
2011-01-01
This lecture combines the theoretical background of schema based program synthesis with the hands-on study of a powerful, open-source program synthesis system (Auto-Bayes). Schema-based program synthesis is a popular approach toward program synthesis. The lecture will provide an introduction into this topic and discuss how this technology can be used to generate customized algorithms. The synthesis of advanced numerical algorithms requires the availability of a powerful symbolic (algebra) system. Its task is to symbolically solve equations, simplify expressions, or to symbolically calculate derivatives (among others) such that the synthesized algorithms become as efficient as possible. We will discuss the use and importance of the symbolic system for synthesis. Any synthesis system is a large and complex piece of code. In this lecture, we will study Autobayes in detail. AutoBayes has been developed at NASA Ames and has been made open source. It takes a compact statistical specification and generates a customized data analysis algorithm (in C/C++) from it. AutoBayes is written in SWI Prolog and many concepts from rewriting, logic, functional, and symbolic programming. We will discuss the system architecture, the schema libary and the extensive support infra-structure. Practical hands-on experiments and exercises will enable the student to get insight into a realistic program synthesis system and provides knowledge to use, modify, and extend Autobayes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Marquez, Andres; Choudhury, Sutanay
2012-09-01
Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madduri, Kamesh; Im, Eun-Jin; Ibrahim, Khaled Z.
The next decade of high-performance computing (HPC) systems will see a rapid evolution and divergence of multi- and manycore architectures as power and cooling constraints limit increases in microprocessor clock speeds. Understanding efficient optimization methodologies on diverse multicore designs in the context of demanding numerical methods is one of the greatest challenges faced today by the HPC community. In this paper, we examine the efficient multicore optimization of GTC, a petascale gyrokinetic toroidal fusion code for studying plasma microturbulence in tokamak devices. For GTC’s key computational components (charge deposition and particle push), we explore efficient parallelization strategies across a broadmore » range of emerging multicore designs, including the recently-released Intel Nehalem-EX, the AMD Opteron Istanbul, and the highly multithreaded Sun UltraSparc T2+. We also present the first study on tuning gyrokinetic particle-in-cell (PIC) algorithms for graphics processors, using the NVIDIA C2050 (Fermi). Our work discusses several novel optimization approaches for gyrokinetic PIC, including mixed-precision computation, particle binning and decomposition strategies, grid replication, SIMDized atomic floating-point operations, and effective GPU texture memory utilization. Overall, we achieve significant performance improvements of 1.3–4.7× on these complex PIC kernels, despite the inherent challenges of data dependency and locality. Finally, our work also points to several architectural and programming features that could significantly enhance PIC performance and productivity on next-generation architectures.« less
Design of an integrated airframe/propulsion control system architecture
NASA Technical Reports Server (NTRS)
Cohen, Gerald C.; Lee, C. William; Strickland, Michael J.
1990-01-01
The design of an integrated airframe/propulsion control system architecture is described. The design is based on a prevalidation methodology that used both reliability and performance tools. An account is given of the motivation for the final design and problems associated with both reliability and performance modeling. The appendices contain a listing of the code for both the reliability and performance model used in the design.
Current and anticipated uses of thermal-hydraulic codes in Germany
DOE Office of Scientific and Technical Information (OSTI.GOV)
Teschendorff, V.; Sommer, F.; Depisch, F.
1997-07-01
In Germany, one third of the electrical power is generated by nuclear plants. ATHLET and S-RELAP5 are successfully applied for safety analyses of the existing PWR and BWR reactors and possible future reactors, e.g. EPR. Continuous development and assessment of thermal-hydraulic codes are necessary in order to meet present and future needs of licensing organizations, utilities, and vendors. Desired improvements include thermal-hydraulic models, multi-dimensional simulation, computational speed, interfaces to coupled codes, and code architecture. Real-time capability will be essential for application in full-scope simulators. Comprehensive code validation and quantification of uncertainties are prerequisites for future best-estimate analyses.
FPGA implementation of low complexity LDPC iterative decoder
NASA Astrophysics Data System (ADS)
Verma, Shivani; Sharma, Sanjay
2016-07-01
Low-density parity-check (LDPC) codes, proposed by Gallager, emerged as a class of codes which can yield very good performance on the additive white Gaussian noise channel as well as on the binary symmetric channel. LDPC codes have gained lots of importance due to their capacity achieving property and excellent performance in the noisy channel. Belief propagation (BP) algorithm and its approximations, most notably min-sum, are popular iterative decoding algorithms used for LDPC and turbo codes. The trade-off between the hardware complexity and the decoding throughput is a critical factor in the implementation of the practical decoder. This article presents introduction to LDPC codes and its various decoding algorithms followed by realisation of LDPC decoder by using simplified message passing algorithm and partially parallel decoder architecture. Simplified message passing algorithm has been proposed for trade-off between low decoding complexity and decoder performance. It greatly reduces the routing and check node complexity of the decoder. Partially parallel decoder architecture possesses high speed and reduced complexity. The improved design of the decoder possesses a maximum symbol throughput of 92.95 Mbps and a maximum of 18 decoding iterations. The article presents implementation of 9216 bits, rate-1/2, (3, 6) LDPC decoder on Xilinx XC3D3400A device from Spartan-3A DSP family.
NASA Technical Reports Server (NTRS)
Lin, Shu (Principal Investigator); Uehara, Gregory T.; Nakamura, Eric; Chu, Cecilia W. P.
1996-01-01
The (64, 40, 8) subcode of the third-order Reed-Muller (RM) code for high-speed satellite communications is proposed. The RM subcode can be used either alone or as an inner code of a concatenated coding system with the NASA standard (255, 233, 33) Reed-Solomon (RS) code as the outer code to achieve high performance (or low bit-error rate) with reduced decoding complexity. It can also be used as a component code in a multilevel bandwidth efficient coded modulation system to achieve reliable bandwidth efficient data transmission. The progress made toward achieving the goal of implementing a decoder system based upon this code is summarized. The development of the integrated circuit prototype sub-trellis IC, particularly focusing on the design methodology, is addressed.
NASA Technical Reports Server (NTRS)
2000-01-01
This paper presents, in viewgraph form, the 2005 Earth-Mars Round Trip. The contents include: 1) Lander; 2) Mars Sample Return Project; 3) Rover; 4) Rover Size Comparison; 5) Mars Ascent Vehicle; 6) Return Orbiter; 7) A New Mars Surveyor Program Architecture; 8) Definition Study Summary Result; 9) Mars Surveyor Proposed Architecture 2003, 2005 Opportunities; 10) Mars Micromissions Using Ariane 5; 11) Potential International Partnerships; 12) Proposed Integrated Architecture; and 13) Mars Exploration Program Report of the Architecture Team.
ATDM LANL FleCSI: Topology and Execution Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bergen, Benjamin Karl
FleCSI is a compile-time configurable C++ framework designed to support multi-physics application development. As such, FleCSI attempts to provide a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. This means that FleCSI is potentially useful to many different ECP projects. Current support includes multidimensional mesh topology, mesh geometry, and mesh adjacency information, n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures (to identify data dependencies between distributed-memory address spaces). FleCSI introduces a functional programming model with control, execution, and data abstractionsmore » that are consistent with state-of-the-art task-based runtimes such as Legion and Charm++. The model also provides support for fine-grained, data-parallel execution with backend support for runtimes such as OpenMP and C++17. The FleCSI abstraction layer provides the developer with insulation from the underlying runtimes, while allowing support for multiple runtime systems, including conventional models like asynchronous MPI. The intent is to give developers a concrete set of user-friendly programming tools that can be used now, while allowing flexibility in choosing runtime implementations and optimizations that can be applied to architectures and runtimes that arise in the future. This project is essential to the ECP Ristra Next-Generation Code project, part of ASC ATDM, because it provides a hierarchically parallel programming model that is consistent with the design of modern system architectures, but which allows for the straightforward expression of algorithmic parallelism in a portably performant manner.« less
An extensible circuit QED architecture for quantum computation
NASA Astrophysics Data System (ADS)
Dicarlo, Leo
Realizing a logical qubit robust to single errors in its constituent physical elements is an immediate challenge for quantum information processing platforms. A longer-term challenge will be achieving quantum fault tolerance, i.e., improving logical qubit resilience by increasing redundancy in the underlying quantum error correction code (QEC). In QuTech, we target these challenges in collaboration with industrial and academic partners. I will present the circuit QED quantum hardware, room-temperature control electronics, and software components of the complete architecture. I will show the extensibility of each component to the Surface-17 and -49 circuits needed to reach the objectives with surface-code QEC, and provide an overview of latest developments. Research funded by IARPA and Intel Corporation.
Standardizing the information architecture for spacecraft operations
NASA Technical Reports Server (NTRS)
Easton, C. R.
1994-01-01
This paper presents an information architecture developed for the Space Station Freedom as a model from which to derive an information architecture standard for advanced spacecraft. The information architecture provides a way of making information available across a program, and among programs, assuming that the information will be in a variety of local formats, structures and representations. It provides a format that can be expanded to define all of the physical and logical elements that make up a program, add definitions as required, and import definitions from prior programs to a new program. It allows a spacecraft and its control center to work in different representations and formats, with the potential for supporting existing spacecraft from new control centers. It supports a common view of data and control of all spacecraft, regardless of their own internal view of their data and control characteristics, and of their communications standards, protocols and formats. This information architecture is central to standardizing spacecraft operations, in that it provides a basis for information transfer and translation, such that diverse spacecraft can be monitored and controlled in a common way.
Wavelet subband coding of computer simulation output using the A++ array class library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bradley, J.N.; Brislawn, C.M.; Quinlan, D.J.
1995-07-01
The goal of the project is to produce utility software for off-line compression of existing data and library code that can be called from a simulation program for on-line compression of data dumps as the simulation proceeds. Naturally, we would like the amount of CPU time required by the compression algorithm to be small in comparison to the requirements of typical simulation codes. We also want the algorithm to accomodate a wide variety of smooth, multidimensional data types. For these reasons, the subband vector quantization (VQ) approach employed in has been replaced by a scalar quantization (SQ) strategy using amore » bank of almost-uniform scalar subband quantizers in a scheme similar to that used in the FBI fingerprint image compression standard. This eliminates the considerable computational burdens of training VQ codebooks for each new type of data and performing nearest-vector searches to encode the data. The comparison of subband VQ and SQ algorithms in indicated that, in practice, there is relatively little additional gain from using vector as opposed to scalar quantization on DWT subbands, even when the source imagery is from a very homogeneous population, and our subjective experience with synthetic computer-generated data supports this stance. It appears that a careful study is needed of the tradeoffs involved in selecting scalar vs. vector subband quantization, but such an analysis is beyond the scope of this paper. Our present work is focused on the problem of generating wavelet transform/scalar quantization (WSQ) implementations that can be ported easily between different hardware environments. This is an extremely important consideration given the great profusion of different high-performance computing architectures available, the high cost associated with learning how to map algorithms effectively onto a new architecture, and the rapid rate of evolution in the world of high-performance computing.« less
High Performance Radiation Transport Simulations on TITAN
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Christopher G; Davidson, Gregory G; Evans, Thomas M
2012-01-01
In this paper we describe the Denovo code system. Denovo solves the six-dimensional, steady-state, linear Boltzmann transport equation, of central importance to nuclear technology applications such as reactor core analysis (neutronics), radiation shielding, nuclear forensics and radiation detection. The code features multiple spatial differencing schemes, state-of-the-art linear solvers, the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm for inverting the transport operator, a new multilevel energy decomposition method scaling to hundreds of thousands of processing cores, and a modern, novel code architecture that supports straightforward integration of new features. In this paper we discuss the performance of Denovo on the 10--20 petaflop ORNLmore » GPU-based system, Titan. We describe algorithms and techniques used to exploit the capabilities of Titan's heterogeneous compute node architecture and the challenges of obtaining good parallel performance for this sparse hyperbolic PDE solver containing inherently sequential computations. Numerical results demonstrating Denovo performance on early Titan hardware are presented.« less
The design of an adaptive predictive coder using a single-chip digital signal processor
NASA Astrophysics Data System (ADS)
Randolph, M. A.
1985-01-01
A speech coding processor architecture design study has been performed in which Texas Instruments TMS32010 has been selected from among three commercially available digital signal processing integrated circuits and evaluated in an implementation study of real-time Adaptive Predictive Coding (APC). The TMS32010 has been compared with AR&T Bell Laboratories DSP I and Nippon Electric Co. PD7720 and was found to be most suitable for a single chip implementation of APC. A preliminary design system based on TMS32010 has been performed, and several of the hardware and software design issues are discussed. Particular attention was paid to the design of an external memory controller which permits rapid sequential access of external RAM. As a result, it has been determined that a compact hardware implementation of the APC algorithm is feasible based of the TSM32010. Originator-supplied keywords include: vocoders, speech compression, adaptive predictive coding, digital signal processing microcomputers, speech processor architectures, and special purpose processor.
ERIC Educational Resources Information Center
Mississippi Research and Curriculum Unit for Vocational and Technical Education, State College.
This document, which is intended for use by community and junior colleges throughout Mississippi, contains curriculum frameworks for the two course sequences of the state's postsecondary-level drafting and design technology program: architectural drafting technology and drafting and design technology. Presented first are a program description and…
High-Speed Soft-Decision Decoding of Two Reed-Muller Codes
NASA Technical Reports Server (NTRS)
Lin, Shu; Uehara, Gregory T.
1996-01-01
In his research, we have proposed the (64, 40, 8) subcode of the third-order Reed-Muller (RM) code to NASA for high-speed satellite communications. This RM subcode can be used either alone or as an inner code of a concatenated coding system with the NASA standard (255, 233, 33) Reed-Solomon (RS) code as the outer code to achieve high performance (or low bit-error rate) with reduced decoding complexity. It can also be used as a component code in a multilevel bandwidth efficient coded modulation system to achieve reliable bandwidth efficient data transmission. This report will summarize the key progress we have made toward achieving our eventual goal of implementing a decoder system based upon this code. In the first phase of study, we investigated the complexities of various sectionalized trellis diagrams for the proposed (64, 40, 8) RNI subcode. We found a specific 8-trellis diagram for this code which requires the least decoding complexity with a high possibility of achieving a decoding speed of 600 M bits per second (Mbps). The combination of a large number of states and a hi ch data rate will be made possible due to the utilization of a high degree of parallelism throughout the architecture. This trellis diagram will be presented and briefly described. In the second phase of study which was carried out through the past year, we investigated circuit architectures to determine the feasibility of VLSI implementation of a high-speed Viterbi decoder based on this 8-section trellis diagram. We began to examine specific design and implementation approaches to implement a fully custom integrated circuit (IC) which will be a key building block for a decoder system implementation. The key results will be presented in this report. This report will be divided into three primary sections. First, we will briefly describe the system block diagram in which the proposed decoder is assumed to be operating and present some of the key architectural approaches being used to implement the system at high speed. Second, we will describe details of the 8-trellis diagram we found to best meet the trade-offs between chip and overall system complexity. The chosen approach implements the trellis for the (64, 40, 8) RM subcode with 32 independent sub-trellises. And third, we will describe results of our feasibility study on the implementation of such an IC chip in CMOS technology to implement one of these sub-trellises.
High-Speed Soft-Decision Decoding of Two Reed-Muller Codes
NASA Technical Reports Server (NTRS)
Lin, Shu; Uehara, Gregory T.
1996-01-01
In this research, we have proposed the (64, 40, 8) subcode of the third-order Reed-Muller (RM) code to NASA for high-speed satellite communications. This RM subcode can be used either alone or as an inner code of a concatenated coding system with the NASA standard (255, 233, 33) Reed-Solomon (RS) code as the outer code to achieve high performance (or low bit-error rate) with reduced decoding complexity. It can also be used as a component code in a multilevel bandwidth efficient coded modulation system to achieve reliable bandwidth efficient data transmission. This report will summarize the key progress we have made toward achieving our eventual goal of implementing, a decoder system based upon this code. In the first phase of study, we investigated the complexities of various sectionalized trellis diagrams for the proposed (64, 40, 8) RM subcode. We found a specific 8-trellis diagram for this code which requires the least decoding complexity with a high possibility of achieving a decoding speed of 600 M bits per second (Mbps). The combination of a large number of states and a high data rate will be made possible due to the utilization of a high degree of parallelism throughout the architecture. This trellis diagram will be presented and briefly described. In the second phase of study, which was carried out through the past year, we investigated circuit architectures to determine the feasibility of VLSI implementation of a high-speed Viterbi decoder based on this 8-section trellis diagram. We began to examine specific design and implementation approaches to implement a fully custom integrated circuit (IC) which will be a key building block for a decoder system implementation. The key results will be presented in this report. This report will be divided into three primary sections. First, we will briefly describe the system block diagram in which the proposed decoder is assumed to be operating, and present some of the key architectural approaches being used to implement the system at high speed. Second, we will describe details of the 8-trellis diagram we found to best meet the trade-offs between chip and overall system complexity. The chosen approach implements the trellis for the (64, 40, 8) RM subcode with 32 independent sub-trellises. And third, we will describe results of our feasibility study on the implementation of such an IC chip in CMOS technology to implement one of these sub-trellises.
Introduction to the Security Engineering Risk Analysis (SERA) Framework
2014-11-01
military aircraft has increased from 8% to 80%. At the same time, the size of software in military aircraft has grown from 1,000 lines of code in the F...4A to 1.7 million lines of code in the F-22. This growth trend is expected to con- tinue over time [NASA 2009]. As software exerts more control of...their root causes can be traced to the software’s requirements, architecture, design, or code . Studies have shown that the cost of addressing a software
ERIC Educational Resources Information Center
McMinn, William G.
This report is an update of a report on the development and status of various programs in architecture and related fields in the State University System of Florida, a report that was submitted to the Board of Regents in May 1983. The objectives of this updated report, like those of the earlier one, are to review the anticipated needs of the…
Neural network decoder for quantum error correcting codes
NASA Astrophysics Data System (ADS)
Krastanov, Stefan; Jiang, Liang
Artificial neural networks form a family of extremely powerful - albeit still poorly understood - tools used in anything from image and sound recognition through text generation to, in our case, decoding. We present a straightforward Recurrent Neural Network architecture capable of deducing the correcting procedure for a quantum error-correcting code from a set of repeated stabilizer measurements. We discuss the fault-tolerance of our scheme and the cost of training the neural network for a system of a realistic size. Such decoders are especially interesting when applied to codes, like the quantum LDPC codes, that lack known efficient decoding schemes.
Linearized Programming of Memristors for Artificial Neuro-Sensor Signal Processing
Yang, Changju; Kim, Hyongsuk
2016-01-01
A linearized programming method of memristor-based neural weights is proposed. Memristor is known as an ideal element to implement a neural synapse due to its embedded functions of analog memory and analog multiplication. Its resistance variation with a voltage input is generally a nonlinear function of time. Linearization of memristance variation about time is very important for the easiness of memristor programming. In this paper, a method utilizing an anti-serial architecture for linear programming is proposed. The anti-serial architecture is composed of two memristors with opposite polarities. It linearizes the variation of memristance due to complimentary actions of two memristors. For programming a memristor, additional memristor with opposite polarity is employed. The linearization effect of weight programming of an anti-serial architecture is investigated and memristor bridge synapse which is built with two sets of anti-serial memristor architecture is taken as an application example of the proposed method. Simulations are performed with memristors of both linear drift model and nonlinear model. PMID:27548186
Linearized Programming of Memristors for Artificial Neuro-Sensor Signal Processing.
Yang, Changju; Kim, Hyongsuk
2016-08-19
A linearized programming method of memristor-based neural weights is proposed. Memristor is known as an ideal element to implement a neural synapse due to its embedded functions of analog memory and analog multiplication. Its resistance variation with a voltage input is generally a nonlinear function of time. Linearization of memristance variation about time is very important for the easiness of memristor programming. In this paper, a method utilizing an anti-serial architecture for linear programming is proposed. The anti-serial architecture is composed of two memristors with opposite polarities. It linearizes the variation of memristance due to complimentary actions of two memristors. For programming a memristor, additional memristor with opposite polarity is employed. The linearization effect of weight programming of an anti-serial architecture is investigated and memristor bridge synapse which is built with two sets of anti-serial memristor architecture is taken as an application example of the proposed method. Simulations are performed with memristors of both linear drift model and nonlinear model.
Vaccarino, Anthony L.; Dharsee, Moyez; Strother, Stephen; Aldridge, Don; Arnott, Stephen R.; Behan, Brendan; Dafnas, Costas; Dong, Fan; Edgecombe, Kenneth; El-Badrawi, Rachad; El-Emam, Khaled; Gee, Tom; Evans, Susan G.; Javadi, Mojib; Jeanson, Francis; Lefaivre, Shannon; Lutz, Kristen; MacPhee, F. Chris; Mikkelsen, Jordan; Mikkelsen, Tom; Mirotchnick, Nicholas; Schmah, Tanya; Studzinski, Christa M.; Stuss, Donald T.; Theriault, Elizabeth; Evans, Kenneth R.
2018-01-01
Historically, research databases have existed in isolation with no practical avenue for sharing or pooling medical data into high dimensional datasets that can be efficiently compared across databases. To address this challenge, the Ontario Brain Institute’s “Brain-CODE” is a large-scale neuroinformatics platform designed to support the collection, storage, federation, sharing and analysis of different data types across several brain disorders, as a means to understand common underlying causes of brain dysfunction and develop novel approaches to treatment. By providing researchers access to aggregated datasets that they otherwise could not obtain independently, Brain-CODE incentivizes data sharing and collaboration and facilitates analyses both within and across disorders and across a wide array of data types, including clinical, neuroimaging and molecular. The Brain-CODE system architecture provides the technical capabilities to support (1) consolidated data management to securely capture, monitor and curate data, (2) privacy and security best-practices, and (3) interoperable and extensible systems that support harmonization, integration, and query across diverse data modalities and linkages to external data sources. Brain-CODE currently supports collaborative research networks focused on various brain conditions, including neurodevelopmental disorders, cerebral palsy, neurodegenerative diseases, epilepsy and mood disorders. These programs are generating large volumes of data that are integrated within Brain-CODE to support scientific inquiry and analytics across multiple brain disorders and modalities. By providing access to very large datasets on patients with different brain disorders and enabling linkages to provincial, national and international databases, Brain-CODE will help to generate new hypotheses about the biological bases of brain disorders, and ultimately promote new discoveries to improve patient care. PMID:29875648
Model-Drive Architecture for Agent-Based Systems
NASA Technical Reports Server (NTRS)
Gradanin, Denis; Singh, H. Lally; Bohner, Shawn A.; Hinchey, Michael G.
2004-01-01
The Model Driven Architecture (MDA) approach uses a platform-independent model to define system functionality, or requirements, using some specification language. The requirements are then translated to a platform-specific model for implementation. An agent architecture based on the human cognitive model of planning, the Cognitive Agent Architecture (Cougaar) is selected for the implementation platform. The resulting Cougaar MDA prescribes certain kinds of models to be used, how those models may be prepared and the relationships of the different kinds of models. Using the existing Cougaar architecture, the level of application composition is elevated from individual components to domain level model specifications in order to generate software artifacts. The software artifacts generation is based on a metamodel. Each component maps to a UML structured component which is then converted into multiple artifacts: Cougaar/Java code, documentation, and test cases.
Automated Instrumentation, Monitoring and Visualization of PVM Programs Using AIMS
NASA Technical Reports Server (NTRS)
Mehra, Pankaj; VanVoorst, Brian; Yan, Jerry; Lum, Henry, Jr. (Technical Monitor)
1994-01-01
We present views and analysis of the execution of several PVM (Parallel Virtual Machine) codes for Computational Fluid Dynamics on a networks of Sparcstations, including: (1) NAS Parallel Benchmarks CG and MG; (2) a multi-partitioning algorithm for NAS Parallel Benchmark SP; and (3) an overset grid flowsolver. These views and analysis were obtained using our Automated Instrumentation and Monitoring System (AIMS) version 3.0, a toolkit for debugging the performance of PVM programs. We will describe the architecture, operation and application of AIMS. The AIMS toolkit contains: (1) Xinstrument, which can automatically instrument various computational and communication constructs in message-passing parallel programs; (2) Monitor, a library of runtime trace-collection routines; (3) VK (Visual Kernel), an execution-animation tool with source-code clickback; and (4) Tally, a tool for statistical analysis of execution profiles. Currently, Xinstrument can handle C and Fortran 77 programs using PVM 3.2.x; Monitor has been implemented and tested on Sun 4 systems running SunOS 4.1.2; and VK uses XIIR5 and Motif 1.2. Data and views obtained using AIMS clearly illustrate several characteristic features of executing parallel programs on networked workstations: (1) the impact of long message latencies; (2) the impact of multiprogramming overheads and associated load imbalance; (3) cache and virtual-memory effects; and (4) significant skews between workstation clocks. Interestingly, AIMS can compensate for constant skew (zero drift) by calibrating the skew between a parent and its spawned children. In addition, AIMS' skew-compensation algorithm can adjust timestamps in a way that eliminates physically impossible communications (e.g., messages going backwards in time). Our current efforts are directed toward creating new views to explain the observed performance of PVM programs. Some of the features planned for the near future include: (1) ConfigView, showing the physical topology of the virtual machine, inferred using specially formatted IP (Internet Protocol) packets: and (2) LoadView, synchronous animation of PVM-program execution and resource-utilization patterns.
The VLSI design of an error-trellis syndrome decoder for certain convolutional codes
NASA Technical Reports Server (NTRS)
Reed, I. S.; Jensen, J. M.; Hsu, I.-S.; Truong, T. K.
1986-01-01
A recursive algorithm using the error-trellis decoding technique is developed to decode convolutional codes (CCs). An example, illustrating the very large scale integration (VLSI) architecture of such a decode, is given for a dual-K CC. It is demonstrated that such a decoder can be realized readily on a single chip with metal-nitride-oxide-semiconductor technology.
The VLSI design of error-trellis syndrome decoding for convolutional codes
NASA Technical Reports Server (NTRS)
Reed, I. S.; Jensen, J. M.; Truong, T. K.; Hsu, I. S.
1985-01-01
A recursive algorithm using the error-trellis decoding technique is developed to decode convolutional codes (CCs). An example, illustrating the very large scale integration (VLSI) architecture of such a decode, is given for a dual-K CC. It is demonstrated that such a decoder can be realized readily on a single chip with metal-nitride-oxide-semiconductor technology.
2006-11-01
engines will involve a family of common components. It will consist of a real - time operating system and partitioned application software (AS...system will employ a standard hardware and software architecture. It will consist of a real time operating system and partitioned application...Inputs - Enables Large Cost Reduction 3. Software - FAA Certified Auto Code - Real Time Operating System - Commercial
Analysis of Disaster Preparedness Planning Measures in DoD Computer Facilities
1993-09-01
city, stae, aod ZP code) 10 Source of Funding Numbers SProgram Element No lProject No ITask No lWork Unit Accesion I 11 Title include security...Computer Disaster Recovery .... 13 a. PC and LAN Lessons Learned . . ..... 13 2. Distributed Architectures . . . .. . 14 3. Backups...amount of expense, but no client problems." (Leeke, 1993, p. 8) 2. Distributed Architectures The majority of operations that were disrupted by the
Hardware, Languages, and Architectures for Defense Against Hostile Operating Systems
2015-05-14
Executive Service Directorate (0704-0188). Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to...in privileged system services . ExpressOS requires only a modest annotation burden (annotations were about 3% of code of the kernel), modest performance...INT benchmarks . In addition to enabling the development of an architecture- neutral instrumentation framework, our approach can take advantage of the
Optimizing zonal advection of the Advanced Research WRF (ARW) dynamics for Intel MIC
NASA Astrophysics Data System (ADS)
Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.
2014-10-01
The Weather Research and Forecast (WRF) model is the most widely used community weather forecast and research model in the world. There are two distinct varieties of WRF. The Advanced Research WRF (ARW) is an experimental, advanced research version featuring very high resolution. The WRF Nonhydrostatic Mesoscale Model (WRF-NMM) has been designed for forecasting operations. WRF consists of dynamics code and several physics modules. The WRF-ARW core is based on an Eulerian solver for the fully compressible nonhydrostatic equations. In the paper, we will use Intel Intel Many Integrated Core (MIC) architecture to substantially increase the performance of a zonal advection subroutine for optimization. It is of the most time consuming routines in the ARW dynamics core. Advection advances the explicit perturbation horizontal momentum equations by adding in the large-timestep tendency along with the small timestep pressure gradient tendency. We will describe the challenges we met during the development of a high-speed dynamics code subroutine for MIC architecture. Furthermore, lessons learned from the code optimization process will be discussed. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 2.4x.
Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder.
Song, Jingkuan; Zhang, Hanwang; Li, Xiangpeng; Gao, Lianli; Wang, Meng; Hong, Richang
2018-07-01
Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed self-supervised video hashing (SSVH), which is able to capture the temporal nature of videos in an end-to-end learning to hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary auto-encoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world data sets show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the current best performance on the task of unsupervised video retrieval.
Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder
NASA Astrophysics Data System (ADS)
Song, Jingkuan; Zhang, Hanwang; Li, Xiangpeng; Gao, Lianli; Wang, Meng; Hong, Richang
2018-07-01
Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed Self-Supervised Video Hashing (SSVH), that is able to capture the temporal nature of videos in an end-to-end learning-to-hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos; and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary autoencoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world datasets (FCVID and YFCC) show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the currently best performance on the task of unsupervised video retrieval.
Thorndike, Anne N; Sonnenberg, Lillian; Riis, Jason; Barraclough, Susan; Levy, Douglas E
2012-03-01
We assessed whether a 2-phase labeling and choice architecture intervention would increase sales of healthy food and beverages in a large hospital cafeteria. Phase 1 was a 3-month color-coded labeling intervention (red = unhealthy, yellow = less healthy, green = healthy). Phase 2 added a 3-month choice architecture intervention that increased the visibility and convenience of some green items. We compared relative changes in 3-month sales from baseline to phase 1 and from phase 1 to phase 2. At baseline (977,793 items, including 199,513 beverages), 24.9% of sales were red and 42.2% were green. Sales of red items decreased in both phases (P < .001), and green items increased in phase 1 (P < .001). The largest changes occurred among beverages. Red beverages decreased 16.5% during phase 1 (P < .001) and further decreased 11.4% in phase 2 (P < .001). Green beverages increased 9.6% in phase 1 (P < .001) and further increased 4.0% in phase 2 (P < .001). Bottled water increased 25.8% during phase 2 (P < .001) but did not increase at 2 on-site comparison cafeterias (P < .001). A color-coded labeling intervention improved sales of healthy items and was enhanced by a choice architecture intervention.
NASA Technical Reports Server (NTRS)
Fouts, Douglas J.; Butner, Steven E.
1991-01-01
The design of the processing element of GASP, a GaAs supercomputer with a 500-MHz instruction issue rate and 1-GHz subsystem clocks, is presented. The novel, functionally modular, block data flow architecture of GASP is described. The architecture and design of a GASP processing element is then presented. The processing element (PE) is implemented in a hybrid semiconductor module with 152 custom GaAs ICs of eight different types. The effects of the implementation technology on both the system-level architecture and the PE design are discussed. SPICE simulations indicate that parts of the PE are capable of being clocked at 1 GHz, while the rest of the PE uses a 500-MHz clock. The architecture utilizes data flow techniques at a program block level, which allows efficient execution of parallel programs while maintaining reasonably good performance on sequential programs. A simulation study of the architecture indicates that an instruction execution rate of over 30,000 MIPS can be attained with 65 PEs.