Sample records for implementation compilation optimization

  1. A Language for Specifying Compiler Optimizations for Generic Software

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Willcock, Jeremiah J.

    2007-01-01

    Compiler optimization is important to software performance, and modern processor architectures make optimization even more critical. However, many modern software applications use libraries providing high levels of abstraction. Such libraries often hinder effective optimization — the libraries are difficult to analyze using current compiler technology. For example, high-level libraries often use dynamic memory allocation and indirectly expressed control structures, such as iteratorbased loops. Programs using these libraries often cannot achieve an optimal level of performance. On the other hand, software libraries have also been recognized as potentially aiding in program optimization. One proposed implementation of library-based optimization is to allowmore » the library author, or a library user, to define custom analyses and optimizations. Only limited systems have been created to take advantage of this potential, however. One problem in creating a framework for defining new optimizations and analyses is how users are to specify them: implementing them by hand inside a compiler is difficult and prone to errors. Thus, a domain-specific language for librarybased compiler optimizations would be beneficial. Many optimization specification languages have appeared in the literature, but they tend to be either limited in power or unnecessarily difficult to use. Therefore, I have designed, implemented, and evaluated the Pavilion language for specifying program analyses and optimizations, designed for library authors and users. These analyses and optimizations can be based on the implementation of a particular library, its use in a specific program, or on the properties of a broad range of types, expressed through concepts. The new system is intended to provide a high level of expressiveness, even though the intended users are unlikely to be compiler experts.« less

  2. On Fusing Recursive Traversals of K-d Trees

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rajbhandari, Samyam; Kim, Jinsung; Krishnamoorthy, Sriram

    Loop fusion is a key program transformation for data locality optimization that is implemented in production compilers. But optimizing compilers currently cannot exploit fusion opportunities across a set of recursive tree traversal computations with producer-consumer relationships. In this paper, we develop a compile-time approach to dependence characterization and program transformation to enable fusion across recursively specified traversals over k-ary trees. We present the FuseT source-to-source code transformation framework to automatically generate fused composite recursive operators from an input program containing a sequence of primitive recursive operators. We use our framework to implement fused operators for MADNESS, Multiresolution Adaptive Numerical Environmentmore » for Scientific Simulation. We show that locality optimization through fusion can offer more than an order of magnitude performance improvement.« less

  3. A software methodology for compiling quantum programs

    NASA Astrophysics Data System (ADS)

    Häner, Thomas; Steiger, Damian S.; Svore, Krysta; Troyer, Matthias

    2018-04-01

    Quantum computers promise to transform our notions of computation by offering a completely new paradigm. To achieve scalable quantum computation, optimizing compilers and a corresponding software design flow will be essential. We present a software architecture for compiling quantum programs from a high-level language program to hardware-specific instructions. We describe the necessary layers of abstraction and their differences and similarities to classical layers of a computer-aided design flow. For each layer of the stack, we discuss the underlying methods for compilation and optimization. Our software methodology facilitates more rapid innovation among quantum algorithm designers, quantum hardware engineers, and experimentalists. It enables scalable compilation of complex quantum algorithms and can be targeted to any specific quantum hardware implementation.

  4. A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rajbhandari, Samyam; Kim, Jinsung; Krishnamoorthy, Sriram

    This paper describes the design and implementation of a layered domain-specific compiler to support MADNESS---Multiresolution ADaptive Numerical Environment for Scientific Simulation. MADNESS is a high-level software environment for the solution of integral and differential equations in many dimensions, using adaptive and fast harmonic analysis methods with guaranteed precision. MADNESS uses k-d trees to represent spatial functions and implements operators like addition, multiplication, differentiation, and integration on the numerical representation of functions. The MADNESS runtime system provides global namespace support and a task-based execution model including futures. MADNESS is currently deployed on massively parallel supercomputers and has enabled many science advances.more » Due to the highly irregular and statically unpredictable structure of the k-d trees representing the spatial functions encountered in MADNESS applications, only purely runtime approaches to optimization have previously been implemented in the MADNESS framework. This paper describes a layered domain-specific compiler developed to address some performance bottlenecks in MADNESS. The newly developed static compile-time optimizations, in conjunction with the MADNESS runtime support, enable significant performance improvement for the MADNESS framework.« less

  5. HOPE: Just-in-time Python compiler for astrophysical computations

    NASA Astrophysics Data System (ADS)

    Akeret, Joel; Gamper, Lukas; Amara, Adam; Refregier, Alexandre

    2014-11-01

    HOPE is a specialized Python just-in-time (JIT) compiler designed for numerical astrophysical applications. HOPE focuses on a subset of the language and is able to translate Python code into C++ while performing numerical optimization on mathematical expressions at runtime. To enable the JIT compilation, the user only needs to add a decorator to the function definition. By using HOPE, the user benefits from being able to write common numerical code in Python while getting the performance of compiled implementation.

  6. Context-sensitive trace inlining for Java.

    PubMed

    Häubl, Christian; Wimmer, Christian; Mössenböck, Hanspeter

    2013-12-01

    Method inlining is one of the most important optimizations in method-based just-in-time (JIT) compilers. It widens the compilation scope and therefore allows optimizing multiple methods as a whole, which increases the performance. However, if method inlining is used too frequently, the compilation time increases and too much machine code is generated. This has negative effects on the performance. Trace-based JIT compilers only compile frequently executed paths, so-called traces, instead of whole methods. This may result in faster compilation, less generated machine code, and better optimized machine code. In the previous work, we implemented a trace recording infrastructure and a trace-based compiler for [Formula: see text], by modifying the Java HotSpot VM. Based on this work, we evaluate the effect of trace inlining on the performance and the amount of generated machine code. Trace inlining has several major advantages when compared to method inlining. First, trace inlining is more selective than method inlining, because only frequently executed paths are inlined. Second, the recorded traces may capture information about virtual calls, which simplify inlining. A third advantage is that trace information is context sensitive so that different method parts can be inlined depending on the specific call site. These advantages allow more aggressive inlining while the amount of generated machine code is still reasonable. We evaluate several inlining heuristics on the benchmark suites DaCapo 9.12 Bach, SPECjbb2005, and SPECjvm2008 and show that our trace-based compiler achieves an up to 51% higher peak performance than the method-based Java HotSpot client compiler. Furthermore, we show that the large compilation scope of our trace-based compiler has a positive effect on other compiler optimizations such as constant folding or null check elimination.

  7. Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.

  8. SOL - SIZING AND OPTIMIZATION LANGUAGE COMPILER

    NASA Technical Reports Server (NTRS)

    Scotti, S. J.

    1994-01-01

    SOL is a computer language which is geared to solving design problems. SOL includes the mathematical modeling and logical capabilities of a computer language like FORTRAN but also includes the additional power of non-linear mathematical programming methods (i.e. numerical optimization) at the language level (as opposed to the subroutine level). The language-level use of optimization has several advantages over the traditional, subroutine-calling method of using an optimizer: first, the optimization problem is described in a concise and clear manner which closely parallels the mathematical description of optimization; second, a seamless interface is automatically established between the optimizer subroutines and the mathematical model of the system being optimized; third, the results of an optimization (objective, design variables, constraints, termination criteria, and some or all of the optimization history) are output in a form directly related to the optimization description; and finally, automatic error checking and recovery from an ill-defined system model or optimization description is facilitated by the language-level specification of the optimization problem. Thus, SOL enables rapid generation of models and solutions for optimum design problems with greater confidence that the problem is posed correctly. The SOL compiler takes SOL-language statements and generates the equivalent FORTRAN code and system calls. Because of this approach, the modeling capabilities of SOL are extended by the ability to incorporate existing FORTRAN code into a SOL program. In addition, SOL has a powerful MACRO capability. The MACRO capability of the SOL compiler effectively gives the user the ability to extend the SOL language and can be used to develop easy-to-use shorthand methods of generating complex models and solution strategies. The SOL compiler provides syntactic and semantic error-checking, error recovery, and detailed reports containing cross-references to show where each variable was used. The listings summarize all optimizations, listing the objective functions, design variables, and constraints. The compiler offers error-checking specific to optimization problems, so that simple mistakes will not cost hours of debugging time. The optimization engine used by and included with the SOL compiler is a version of Vanderplatt's ADS system (Version 1.1) modified specifically to work with the SOL compiler. SOL allows the use of the over 100 ADS optimization choices such as Sequential Quadratic Programming, Modified Feasible Directions, interior and exterior penalty function and variable metric methods. Default choices of the many control parameters of ADS are made for the user, however, the user can override any of the ADS control parameters desired for each individual optimization. The SOL language and compiler were developed with an advanced compiler-generation system to ensure correctness and simplify program maintenance. Thus, SOL's syntax was defined precisely by a LALR(1) grammar and the SOL compiler's parser was generated automatically from the LALR(1) grammar with a parser-generator. Hence unlike ad hoc, manually coded interfaces, the SOL compiler's lexical analysis insures that the SOL compiler recognizes all legal SOL programs, can recover from and correct for many errors and report the location of errors to the user. This version of the SOL compiler has been implemented on VAX/VMS computer systems and requires 204 KB of virtual memory to execute. Since the SOL compiler produces FORTRAN code, it requires the VAX FORTRAN compiler to produce an executable program. The SOL compiler consists of 13,000 lines of Pascal code. It was developed in 1986 and last updated in 1988. The ADS and other utility subroutines amount to 14,000 lines of FORTRAN code and were also updated in 1988.

  9. Towards Implementation of a Generalized Architecture for High-Level Quantum Programming Language

    NASA Astrophysics Data System (ADS)

    Ameen, El-Mahdy M.; Ali, Hesham A.; Salem, Mofreh M.; Badawy, Mahmoud

    2017-08-01

    This paper investigates a novel architecture to the problem of quantum computer programming. A generalized architecture for a high-level quantum programming language has been proposed. Therefore, the programming evolution from the complicated quantum-based programming to the high-level quantum independent programming will be achieved. The proposed architecture receives the high-level source code and, automatically transforms it into the equivalent quantum representation. This architecture involves two layers which are the programmer layer and the compilation layer. These layers have been implemented in the state of the art of three main stages; pre-classification, classification, and post-classification stages respectively. The basic building block of each stage has been divided into subsequent phases. Each phase has been implemented to perform the required transformations from one representation to another. A verification process was exposed using a case study to investigate the ability of the compiler to perform all transformation processes. Experimental results showed that the efficacy of the proposed compiler achieves a correspondence correlation coefficient about R ≈ 1 between outputs and the targets. Also, an obvious achievement has been utilized with respect to the consumed time in the optimization process compared to other techniques. In the online optimization process, the consumed time has increased exponentially against the amount of accuracy needed. However, in the proposed offline optimization process has increased gradually.

  10. Kokkos GPU Compiler

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moss, Nicholas

    The Kokkos Clang compiler is a version of the Clang C++ compiler that has been modified to perform targeted code generation for Kokkos constructs in the goal of generating highly optimized code and to provide semantic (domain) awareness throughout the compilation toolchain of these constructs such as parallel for and parallel reduce. This approach is taken to explore the possibilities of exposing the developer’s intentions to the underlying compiler infrastructure (e.g. optimization and analysis passes within the middle stages of the compiler) instead of relying solely on the restricted capabilities of C++ template metaprogramming. To date our current activities havemore » focused on correct GPU code generation and thus we have not yet focused on improving overall performance. The compiler is implemented by recognizing specific (syntactic) Kokkos constructs in order to bypass normal template expansion mechanisms and instead use the semantic knowledge of Kokkos to directly generate code in the compiler’s intermediate representation (IR); which is then translated into an NVIDIA-centric GPU program and supporting runtime calls. In addition, by capturing and maintaining the higher-level semantics of Kokkos directly within the lower levels of the compiler has the potential for significantly improving the ability of the compiler to communicate with the developer in the terms of their original programming model/semantics.« less

  11. HAL/S-FC and HAL/S-360 compiler system program description

    NASA Technical Reports Server (NTRS)

    1976-01-01

    The compiler is a large multi-phase design and can be broken into four phases: Phase 1 inputs the source language and does a syntactic and semantic analysis generating the source listing, a file of instructions in an internal format (HALMAT) and a collection of tables to be used in subsequent phases. Phase 1.5 massages the code produced by Phase 1, performing machine independent optimization. Phase 2 inputs the HALMAT produced by Phase 1 and outputs machine language object modules in a form suitable for the OS-360 or FCOS linkage editor. Phase 3 produces the SDF tables. The four phases described are written in XPL, a language specifically designed for compiler implementation. In addition to the compiler, there is a large library containing all the routines that can be explicitly called by the source language programmer plus a large collection of routines for implementing various facilities of the language.

  12. ProjectQ Software Framework

    NASA Astrophysics Data System (ADS)

    Steiger, Damian S.; Haener, Thomas; Troyer, Matthias

    Quantum computers promise to transform our notions of computation by offering a completely new paradigm. A high level quantum programming language and optimizing compilers are essential components to achieve scalable quantum computation. In order to address this, we introduce the ProjectQ software framework - an open source effort to support both theorists and experimentalists by providing intuitive tools to implement and run quantum algorithms. Here, we present our ProjectQ quantum compiler, which compiles a quantum algorithm from our high-level Python-embedded language down to low-level quantum gates available on the target system. We demonstrate how this compiler can be used to control actual hardware and to run high-performance simulations.

  13. High-Speed, Low-Cost Workstation for Computation-Intensive Statistics. Phase 1

    DTIC Science & Technology

    1990-06-20

    routine implementation and performance. 5 The two compiled versions given in the table were coded in an attempt to obtain an optimized compiled version...level statistics and linear algebra routines (BSAS and BLAS) that have been prototyped in this study. For each routine, both the C code ( Turbo C...OISTRIBUTION /AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE Unlimited distribution 13. ABSTRACT (Maximum 200 words) High-performance and low-cost

  14. Snowflake: A Lightweight Portable Stencil DSL

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Nathan; Driscoll, Michael; Markley, Charles

    Stencil computations are not well optimized by general-purpose production compilers and the increased use of multicore, manycore, and accelerator-based systems makes the optimization problem even more challenging. In this paper we present Snowflake, a Domain Specific Language (DSL) for stencils that uses a 'micro-compiler' approach, i.e., small, focused, domain-specific code generators. The approach is similar to that used in image processing stencils, but Snowflake handles the much more complex stencils that arise in scientific computing, including complex boundary conditions, higher-order operators (larger stencils), higher dimensions, variable coefficients, non-unit-stride iteration spaces, and multiple input or output meshes. Snowflake is embedded inmore » the Python language, allowing it to interoperate with popular scientific tools like SciPy and iPython; it also takes advantage of built-in Python libraries for powerful dependence analysis as part of a just-in-time compiler. We demonstrate the power of the Snowflake language and the micro-compiler approach with a complex scientific benchmark, HPGMG, that exercises the generality of stencil support in Snowflake. By generating OpenMP comparable to, and OpenCL within a factor of 2x of hand-optimized HPGMG, Snowflake demonstrates that a micro-compiler can support diverse processor architectures and is performance-competitive whilst preserving a high-level Python implementation.« less

  15. Snowflake: A Lightweight Portable Stencil DSL

    DOE PAGES

    Zhang, Nathan; Driscoll, Michael; Markley, Charles; ...

    2017-05-01

    Stencil computations are not well optimized by general-purpose production compilers and the increased use of multicore, manycore, and accelerator-based systems makes the optimization problem even more challenging. In this paper we present Snowflake, a Domain Specific Language (DSL) for stencils that uses a 'micro-compiler' approach, i.e., small, focused, domain-specific code generators. The approach is similar to that used in image processing stencils, but Snowflake handles the much more complex stencils that arise in scientific computing, including complex boundary conditions, higher-order operators (larger stencils), higher dimensions, variable coefficients, non-unit-stride iteration spaces, and multiple input or output meshes. Snowflake is embedded inmore » the Python language, allowing it to interoperate with popular scientific tools like SciPy and iPython; it also takes advantage of built-in Python libraries for powerful dependence analysis as part of a just-in-time compiler. We demonstrate the power of the Snowflake language and the micro-compiler approach with a complex scientific benchmark, HPGMG, that exercises the generality of stencil support in Snowflake. By generating OpenMP comparable to, and OpenCL within a factor of 2x of hand-optimized HPGMG, Snowflake demonstrates that a micro-compiler can support diverse processor architectures and is performance-competitive whilst preserving a high-level Python implementation.« less

  16. System, apparatus and methods to implement high-speed network analyzers

    DOEpatents

    Ezick, James; Lethin, Richard; Ros-Giralt, Jordi; Szilagyi, Peter; Wohlford, David E

    2015-11-10

    Systems, apparatus and methods for the implementation of high-speed network analyzers are provided. A set of high-level specifications is used to define the behavior of the network analyzer emitted by a compiler. An optimized inline workflow to process regular expressions is presented without sacrificing the semantic capabilities of the processing engine. An optimized packet dispatcher implements a subset of the functions implemented by the network analyzer, providing a fast and slow path workflow used to accelerate specific processing units. Such dispatcher facility can also be used as a cache of policies, wherein if a policy is found, then packet manipulations associated with the policy can be quickly performed. An optimized method of generating DFA specifications for network signatures is also presented. The method accepts several optimization criteria, such as min-max allocations or optimal allocations based on the probability of occurrence of each signature input bit.

  17. Ada Compiler Validation Summary Report. Certificate Number: 920918S1. 11274 U.S. Navy Ada/M, Version 4.5 (/NO OPTIMIZE) VAX 8550/8600/8650 (Cluster) = Enhanced Processor (EP) AN/UYK-44 (Bare Board)

    DTIC Science & Technology

    1992-10-27

    Institute of Standards and Technology Gaithersburg, MD USA 1 ELECTE I= 7 . PERFORMING ORGANIZATION NAME(S) AND ADDRESS(E JUN 3 1993U . , PERFORMING...Standard [Ada83) using the current Ada Compiler Validation Capability (ACVC). This Validation Summary Report ( VSR ) gives an account of the testing of... 7 - Control Part (Redirection) Options F.14 Compiler Options F-59 LINKER OPTIONS The linker options of this Ada implementation, as described inl this

  18. Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations

    NASA Astrophysics Data System (ADS)

    Hause, Benjamin; Parker, Scott

    2012-10-01

    We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the GPU accelerator compiler directives. We have implemented the GPU acceleration on a Core I7 gaming PC with a NVIDIA GTX 580 GPU. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. Optimization strategies and comparisons between DIRAC and the gaming PC will be presented. We will also discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.

  19. Targeting multiple heterogeneous hardware platforms with OpenCL

    NASA Astrophysics Data System (ADS)

    Fox, Paul A.; Kozacik, Stephen T.; Humphrey, John R.; Paolini, Aaron; Kuller, Aryeh; Kelmelis, Eric J.

    2014-06-01

    The OpenCL API allows for the abstract expression of parallel, heterogeneous computing, but hardware implementations have substantial implementation differences. The abstractions provided by the OpenCL API are often insufficiently high-level to conceal differences in hardware architecture. Additionally, implementations often do not take advantage of potential performance gains from certain features due to hardware limitations and other factors. These factors make it challenging to produce code that is portable in practice, resulting in much OpenCL code being duplicated for each hardware platform being targeted. This duplication of effort offsets the principal advantage of OpenCL: portability. The use of certain coding practices can mitigate this problem, allowing a common code base to be adapted to perform well across a wide range of hardware platforms. To this end, we explore some general practices for producing performant code that are effective across platforms. Additionally, we explore some ways of modularizing code to enable optional optimizations that take advantage of hardware-specific characteristics. The minimum requirement for portability implies avoiding the use of OpenCL features that are optional, not widely implemented, poorly implemented, or missing in major implementations. Exposing multiple levels of parallelism allows hardware to take advantage of the types of parallelism it supports, from the task level down to explicit vector operations. Static optimizations and branch elimination in device code help the platform compiler to effectively optimize programs. Modularization of some code is important to allow operations to be chosen for performance on target hardware. Optional subroutines exploiting explicit memory locality allow for different memory hierarchies to be exploited for maximum performance. The C preprocessor and JIT compilation using the OpenCL runtime can be used to enable some of these techniques, as well as to factor in hardware-specific optimizations as necessary.

  20. An Optimizing Compiler for Petascale I/O on Leadership-Class Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kandemir, Mahmut Taylan; Choudary, Alok; Thakur, Rajeev

    In high-performance computing (HPC), parallel I/O architectures usually have very complex hierarchies with multiple layers that collectively constitute an I/O stack, including high-level I/O libraries such as PnetCDF and HDF5, I/O middleware such as MPI-IO, and parallel file systems such as PVFS and Lustre. Our DOE project explored automated instrumentation and compiler support for I/O intensive applications. Our project made significant progress towards understanding the complex I/O hierarchies of high-performance storage systems (including storage caches, HDDs, and SSDs), and designing and implementing state-of-the-art compiler/runtime system technology that targets I/O intensive HPC applications that target leadership class machine. This final reportmore » summarizes the major achievements of the project and also points out promising future directions Two new sections in this report compared to the previous report are IOGenie and SSD/NVM-specific optimizations.« less

  1. Graphics Processing Unit Acceleration of Gyrokinetic Turbulence Simulations

    NASA Astrophysics Data System (ADS)

    Hause, Benjamin; Parker, Scott; Chen, Yang

    2013-10-01

    We find a substantial increase in on-node performance using Graphics Processing Unit (GPU) acceleration in gyrokinetic delta-f particle-in-cell simulation. Optimization is performed on a two-dimensional slab gyrokinetic particle simulation using the Portland Group Fortran compiler with the OpenACC compiler directives and Fortran CUDA. Mixed implementation of both Open-ACC and CUDA is demonstrated. CUDA is required for optimizing the particle deposition algorithm. We have implemented the GPU acceleration on a third generation Core I7 gaming PC with two NVIDIA GTX 680 GPUs. We find comparable, or better, acceleration relative to the NERSC DIRAC cluster with the NVIDIA Tesla C2050 computing processor. The Tesla C 2050 is about 2.6 times more expensive than the GTX 580 gaming GPU. We also see enormous speedups (10 or more) on the Titan supercomputer at Oak Ridge with Kepler K20 GPUs. Results show speed-ups comparable or better than that of OpenMP models utilizing multiple cores. The use of hybrid OpenACC, CUDA Fortran, and MPI models across many nodes will also be discussed. Optimization strategies will be presented. We will discuss progress on optimizing the comprehensive three dimensional general geometry GEM code.

  2. Optimizing python-based ROOT I/O with PyPy's tracing just-in-time compiler

    NASA Astrophysics Data System (ADS)

    Tlp Lavrijsen, Wim

    2012-12-01

    The Python programming language allows objects and classes to respond dynamically to the execution environment. Most of this, however, is made possible through language hooks which by definition can not be optimized and thus tend to be slow. The PyPy implementation of Python includes a tracing just in time compiler (JIT), which allows similar dynamic responses but at the interpreter-, rather than the application-level. Therefore, it is possible to fully remove the hooks, leaving only the dynamic response, in the optimization stage for hot loops, if the types of interest are opened up to the JIT. A general opening up of types to the JIT, based on reflection information, has already been developed (cppyy). The work described in this paper takes it one step further by customizing access to ROOT I/O to the JIT, allowing for fully automatic optimizations.

  3. Computer Language For Optimization Of Design

    NASA Technical Reports Server (NTRS)

    Scotti, Stephen J.; Lucas, Stephen H.

    1991-01-01

    SOL is computer language geared to solution of design problems. Includes mathematical modeling and logical capabilities of computer language like FORTRAN; also includes additional power of nonlinear mathematical programming methods at language level. SOL compiler takes SOL-language statements and generates equivalent FORTRAN code and system calls. Provides syntactic and semantic checking for recovery from errors and provides detailed reports containing cross-references to show where each variable used. Implemented on VAX/VMS computer systems. Requires VAX FORTRAN compiler to produce executable program.

  4. Domain Specific Language Support for Exascale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sadayappan, Ponnuswamy

    Domain-Specific Languages (DSLs) offer an attractive path to Exascale software since they provide expressive power through appropriate abstractions and enable domain-specific optimizations. But the advantages of a DSL compete with the difficulties of implementing a DSL, even for a narrowly defined domain. The DTEC project addresses how a variety of DSLs can be easily implemented to leverage existing compiler analysis and transformation capabilities within the ROSE open source compiler as part of a research program focusing on Exascale challenges. The OSU contributions to the DTEC project are in the area of code generation from high-level DSL descriptions, as well asmore » verification of the automatically-generated code.« less

  5. Explicit time integration of finite element models on a vectorized, concurrent computer with shared memory

    NASA Technical Reports Server (NTRS)

    Gilbertsen, Noreen D.; Belytschko, Ted

    1990-01-01

    The implementation of a nonlinear explicit program on a vectorized, concurrent computer with shared memory is described and studied. The conflict between vectorization and concurrency is described and some guidelines are given for optimal block sizes. Several example problems are summarized to illustrate the types of speed-ups which can be achieved by reprogramming as compared to compiler optimization.

  6. Fringe pattern demodulation using the one-dimensional continuous wavelet transform: field-programmable gate array implementation.

    PubMed

    Abid, Abdulbasit

    2013-03-01

    This paper presents a thorough discussion of the proposed field-programmable gate array (FPGA) implementation for fringe pattern demodulation using the one-dimensional continuous wavelet transform (1D-CWT) algorithm. This algorithm is also known as wavelet transform profilometry. Initially, the 1D-CWT is programmed using the C programming language and compiled into VHDL using the ImpulseC tool. This VHDL code is implemented on the Altera Cyclone IV GX EP4CGX150DF31C7 FPGA. A fringe pattern image with a size of 512×512 pixels is presented to the FPGA, which processes the image using the 1D-CWT algorithm. The FPGA requires approximately 100 ms to process the image and produce a wrapped phase map. For performance comparison purposes, the 1D-CWT algorithm is programmed using the C language. The C code is then compiled using the Intel compiler version 13.0. The compiled code is run on a Dell Precision state-of-the-art workstation. The time required to process the fringe pattern image is approximately 1 s. In order to further reduce the execution time, the 1D-CWT is reprogramed using Intel Integrated Primitive Performance (IPP) Library Version 7.1. The execution time was reduced to approximately 650 ms. This confirms that at least sixfold speedup was gained using FPGA implementation over a state-of-the-art workstation that executes heavily optimized implementation of the 1D-CWT algorithm.

  7. Performing aggressive code optimization with an ability to rollback changes made by the aggressive optimizations

    DOEpatents

    Gschwind, Michael K

    2013-07-23

    Mechanisms for aggressively optimizing computer code are provided. With these mechanisms, a compiler determines an optimization to apply to a portion of source code and determines if the optimization as applied to the portion of source code will result in unsafe optimized code that introduces a new source of exceptions being generated by the optimized code. In response to a determination that the optimization is an unsafe optimization, the compiler generates an aggressively compiled code version, in which the unsafe optimization is applied, and a conservatively compiled code version in which the unsafe optimization is not applied. The compiler stores both versions and provides them for execution. Mechanisms are provided for switching between these versions during execution in the event of a failure of the aggressively compiled code version. Moreover, predictive mechanisms are provided for predicting whether such a failure is likely.

  8. HERCULES: A Pattern Driven Code Transformation System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kartsaklis, Christos; Hernandez, Oscar R; Hsu, Chung-Hsing

    2012-01-01

    New parallel computers are emerging, but developing efficient scientific code for them remains difficult. A scientist must manage not only the science-domain complexity but also the performance-optimization complexity. HERCULES is a code transformation system designed to help the scientist to separate the two concerns, which improves code maintenance, and facilitates performance optimization. The system combines three technologies, code patterns, transformation scripts and compiler plugins, to provide the scientist with an environment to quickly implement code transformations that suit his needs. Unlike existing code optimization tools, HERCULES is unique in its focus on user-level accessibility. In this paper we discuss themore » design, implementation and an initial evaluation of HERCULES.« less

  9. Effective Vectorization with OpenMP 4.5

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huber, Joseph N.; Hernandez, Oscar R.; Lopez, Matthew Graham

    This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMDmore » is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.« less

  10. Making extreme computations possible with virtual machines

    NASA Astrophysics Data System (ADS)

    Reuter, J.; Chokoufe Nejad, B.; Ohl, T.

    2016-10-01

    State-of-the-art algorithms generate scattering amplitudes for high-energy physics at leading order for high-multiplicity processes as compiled code (in Fortran, C or C++). For complicated processes the size of these libraries can become tremendous (many GiB). We show that amplitudes can be translated to byte-code instructions, which even reduce the size by one order of magnitude. The byte-code is interpreted by a Virtual Machine with runtimes comparable to compiled code and a better scaling with additional legs. We study the properties of this algorithm, as an extension of the Optimizing Matrix Element Generator (O'Mega). The bytecode matrix elements are available as alternative input for the event generator WHIZARD. The bytecode interpreter can be implemented very compactly, which will help with a future implementation on massively parallel GPUs.

  11. Using Agent Base Models to Optimize Large Scale Network for Large System Inventories

    NASA Technical Reports Server (NTRS)

    Shameldin, Ramez Ahmed; Bowling, Shannon R.

    2010-01-01

    The aim of this paper is to use Agent Base Models (ABM) to optimize large scale network handling capabilities for large system inventories and to implement strategies for the purpose of reducing capital expenses. The models used in this paper either use computational algorithms or procedure implementations developed by Matlab to simulate agent based models in a principal programming language and mathematical theory using clusters, these clusters work as a high performance computational performance to run the program in parallel computational. In both cases, a model is defined as compilation of a set of structures and processes assumed to underlie the behavior of a network system.

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Seyong; Kim, Jungwon; Vetter, Jeffrey S

    This paper presents a directive-based, high-level programming framework for high-performance reconfigurable computing. It takes a standard, portable OpenACC C program as input and generates a hardware configuration file for execution on FPGAs. We implemented this prototype system using our open-source OpenARC compiler; it performs source-to-source translation and optimization of the input OpenACC program into an OpenCL code, which is further compiled into a FPGA program by the backend Altera Offline OpenCL compiler. Internally, the design of OpenARC uses a high- level intermediate representation that separates concerns of program representation from underlying architectures, which facilitates portability of OpenARC. In fact, thismore » design allowed us to create the OpenACC-to-FPGA translation framework with minimal extensions to our existing system. In addition, we show that our proposed FPGA-specific compiler optimizations and novel OpenACC pragma extensions assist the compiler in generating more efficient FPGA hardware configuration files. Our empirical evaluation on an Altera Stratix V FPGA with eight OpenACC benchmarks demonstrate the benefits of our strategy. To demonstrate the portability of OpenARC, we show results for the same benchmarks executing on other heterogeneous platforms, including NVIDIA GPUs, AMD GPUs, and Intel Xeon Phis. This initial evidence helps support the goal of using a directive-based, high-level programming strategy for performance portability across heterogeneous HPC architectures.« less

  13. The design and implementation of a parallel unstructured Euler solver using software primitives

    NASA Technical Reports Server (NTRS)

    Das, R.; Mavriplis, D. J.; Saltz, J.; Gupta, S.; Ponnusamy, R.

    1992-01-01

    This paper is concerned with the implementation of a three-dimensional unstructured grid Euler-solver on massively parallel distributed-memory computer architectures. The goal is to minimize solution time by achieving high computational rates with a numerically efficient algorithm. An unstructured multigrid algorithm with an edge-based data structure has been adopted, and a number of optimizations have been devised and implemented in order to accelerate the parallel communication rates. The implementation is carried out by creating a set of software tools, which provide an interface between the parallelization issues and the sequential code, while providing a basis for future automatic run-time compilation support. Large practical unstructured grid problems are solved on the Intel iPSC/860 hypercube and Intel Touchstone Delta machine. The quantitative effect of the various optimizations are demonstrated, and we show that the combined effect of these optimizations leads to roughly a factor of three performance improvement. The overall solution efficiency is compared with that obtained on the CRAY-YMP vector supercomputer.

  14. JANUS: A Compilation System for Balancing Parallelism and Performance in OpenVX

    NASA Astrophysics Data System (ADS)

    Omidian, Hossein; Lemieux, Guy G. F.

    2018-04-01

    Embedded systems typically do not have enough on-chip memory for entire an image buffer. Programming systems like OpenCV operate on entire image frames at each step, making them use excessive memory bandwidth and power. In contrast, the paradigm used by OpenVX is much more efficient; it uses image tiling, and the compilation system is allowed to analyze and optimize the operation sequence, specified as a compute graph, before doing any pixel processing. In this work, we are building a compilation system for OpenVX that can analyze and optimize the compute graph to take advantage of parallel resources in many-core systems or FPGAs. Using a database of prewritten OpenVX kernels, it automatically adjusts the image tile size as well as using kernel duplication and coalescing to meet a defined area (resource) target, or to meet a specified throughput target. This allows a single compute graph to target implementations with a wide range of performance needs or capabilities, e.g. from handheld to datacenter, that use minimal resources and power to reach the performance target.

  15. CIL: Compiler Implementation Language

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gries, David

    1969-03-01

    This report is a manual for the proposed Compiler Implementation Language, CIL. It is not an expository paper on the subject of compiler writing or compiler-compilers. The language definition may change as work progresses on the project. It is designed for writing compilers for the IBM 360 computers.

  16. ProjectQ: Compiling quantum programs for various backends

    NASA Astrophysics Data System (ADS)

    Haener, Thomas; Steiger, Damian S.; Troyer, Matthias

    In order to control quantum computers beyond the current generation, a high level quantum programming language and optimizing compilers will be essential. Therefore, we have developed ProjectQ - an open source software framework to facilitate implementing and running quantum algorithms both in software and on actual quantum hardware. Here, we introduce the backends available in ProjectQ. This includes a high-performance simulator and emulator to test and debug quantum algorithms, tools for resource estimation, and interfaces to several small-scale quantum devices. We demonstrate the workings of the framework and show how easily it can be further extended to control upcoming quantum hardware.

  17. Ada Compiler Validation Summary Report. Certificate Number: 920918S1. 11272, U.S. Navy Ada/M, Version 4.5 (/OPTIMIZE) VAX 8550/8600/8650 (Cluster) Enhanced Processor (EP) AN/UYK-44 (Bare Board)

    DTIC Science & Technology

    1992-09-01

    and Technology Gaithersburg, MD DI USA ELECTE _993_ _ _ _ 7 . PERFORMING ORGANIZATION NAME(S) AND ADDRESS(E JUN 3 1993 8. PERFORMING ORGANIZATION...current Ada Compiler Validation Capability (ACVC). This Validation Summary Report ( VSR ) gives an account of the testing of this Ada implementation. For...34 $MAXLENREALBASEDLITERAL ൘:" & (1..V- 7 => 𔃺’) & "F.E:" SMAXSTRINGLITERAL "’ & (1..V-2 => ’A’) & ’ A-1 The following table contains the values for the remaining macro

  18. Ada Compiler Validation Summary Report: Certificate Number: 910626S1. 11174 U.S. Navy, Ada/M, Version 4.0 (/Optimize), VAX 8550, Running VAX/VMS version 5.3 (Host) to AN/UYK-44 (EMR) (Bare Board) (Target).

    DTIC Science & Technology

    1991-07-30

    Gaithersburg, MD USA 7 PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION National Institute of Standards and Technology REPORT...Ada Compiler Validation Capability (ACVC). This Validation Summary Report ( VSR ) gives an account of the testing of this Ada implementation. For any... 7 => 𔃺’) & "F.E:" $MAXSTRINGLITERAL ’"’ & (1..V-2 => ’A’) & ’"’ A-i The fo~te1-wing table contains the values for the remaining macro parameters

  19. An Optimizing Compiler for Petascale I/O on Leadership Class Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Choudhary, Alok; Kandemir, Mahmut

    In high-performance computing systems, parallel I/O architectures usually have very complex hierarchies with multiple layers that collectively constitute an I/O stack, including high-level I/O libraries such as PnetCDF and HDF5, I/O middleware such as MPI-IO, and parallel file systems such as PVFS and Lustre. Our project explored automated instrumentation and compiler support for I/O intensive applications. Our project made significant progress towards understanding the complex I/O hierarchies of high-performance storage systems (including storage caches, HDDs, and SSDs), and designing and implementing state-of-the-art compiler/runtime system technology that targets I/O intensive HPC applications that target leadership class machine. This final report summarizesmore » the major achievements of the project and also points out promising future directions.« less

  20. An end-to-end workflow for engineering of biological networks from high-level specifications.

    PubMed

    Beal, Jacob; Weiss, Ron; Densmore, Douglas; Adler, Aaron; Appleton, Evan; Babb, Jonathan; Bhatia, Swapnil; Davidsohn, Noah; Haddock, Traci; Loyall, Joseph; Schantz, Richard; Vasilev, Viktor; Yaman, Fusun

    2012-08-17

    We present a workflow for the design and production of biological networks from high-level program specifications. The workflow is based on a sequence of intermediate models that incrementally translate high-level specifications into DNA samples that implement them. We identify algorithms for translating between adjacent models and implement them as a set of software tools, organized into a four-stage toolchain: Specification, Compilation, Part Assignment, and Assembly. The specification stage begins with a Boolean logic computation specified in the Proto programming language. The compilation stage uses a library of network motifs and cellular platforms, also specified in Proto, to transform the program into an optimized Abstract Genetic Regulatory Network (AGRN) that implements the programmed behavior. The part assignment stage assigns DNA parts to the AGRN, drawing the parts from a database for the target cellular platform, to create a DNA sequence implementing the AGRN. Finally, the assembly stage computes an optimized assembly plan to create the DNA sequence from available part samples, yielding a protocol for producing a sample of engineered plasmids with robotics assistance. Our workflow is the first to automate the production of biological networks from a high-level program specification. Furthermore, the workflow's modular design allows the same program to be realized on different cellular platforms simply by swapping workflow configurations. We validated our workflow by specifying a small-molecule sensor-reporter program and verifying the resulting plasmids in both HEK 293 mammalian cells and in E. coli bacterial cells.

  1. An Atmospheric General Circulation Model with Chemistry for the CRAY T3E: Design, Performance Optimization and Coupling to an Ocean Model

    NASA Technical Reports Server (NTRS)

    Farrara, John D.; Drummond, Leroy A.; Mechoso, Carlos R.; Spahr, Joseph A.

    1998-01-01

    The design, implementation and performance optimization on the CRAY T3E of an atmospheric general circulation model (AGCM) which includes the transport of, and chemical reactions among, an arbitrary number of constituents is reviewed. The parallel implementation is based on a two-dimensional (longitude and latitude) data domain decomposition. Initial optimization efforts centered on minimizing the impact of substantial static and weakly-dynamic load imbalances among processors through load redistribution schemes. Recent optimization efforts have centered on single-node optimization. Strategies employed include loop unrolling, both manually and through the compiler, the use of an optimized assembler-code library for special function calls, and restructuring of parts of the code to improve data locality. Data exchanges and synchronizations involved in coupling different data-distributed models can account for a significant fraction of the running time. Therefore, the required scattering and gathering of data must be optimized. In systems such as the T3E, there is much more aggregate bandwidth in the total system than in any particular processor. This suggests a distributed design. The design and implementation of a such distributed 'Data Broker' as a means to efficiently couple the components of our climate system model is described.

  2. Obtaining correct compile results by absorbing mismatches between data types representations

    DOEpatents

    Horie, Michihiro; Horii, Hiroshi H.; Kawachiya, Kiyokuni; Takeuchi, Mikio

    2017-03-21

    Methods and a system are provided. A method includes implementing a function, which a compiler for a first language does not have, using a compiler for a second language. The implementing step includes generating, by the compiler for the first language, a first abstract syntax tree. The implementing step further includes converting, by a converter, the first abstract syntax tree to a second abstract syntax tree of the compiler for the second language using a conversion table from data representation types in the first language to data representation types in the second language. When a compilation error occurs, the implementing step also includes generating a special node for error processing in the second abstract syntax tree and storing an error token in the special node. When unparsing, the implementing step additionally includes outputting the error token, in the form of source code written in the first language.

  3. Obtaining correct compile results by absorbing mismatches between data types representations

    DOEpatents

    Horie, Michihiro; Horii, Hiroshi H.; Kawachiya, Kiyokuni; Takeuchi, Mikio

    2017-11-21

    Methods and a system are provided. A method includes implementing a function, which a compiler for a first language does not have, using a compiler for a second language. The implementing step includes generating, by the compiler for the first language, a first abstract syntax tree. The implementing step further includes converting, by a converter, the first abstract syntax tree to a second abstract syntax tree of the compiler for the second language using a conversion table from data representation types in the first language to data representation types in the second language. When a compilation error occurs, the implementing step also includes generating a special node for error processing in the second abstract syntax tree and storing an error token in the special node. When unparsing, the implementing step additionally includes outputting the error token, in the form of source code written in the first language.

  4. Proceedings: Sisal `93

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feo, J.T.

    1993-10-01

    This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less

  5. Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry

    1998-01-01

    This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.

  6. Compiler-assisted multiple instruction rollback recovery using a read buffer

    NASA Technical Reports Server (NTRS)

    Alewine, N. J.; Chen, S.-K.; Fuchs, W. K.; Hwu, W.-M.

    1993-01-01

    Multiple instruction rollback (MIR) is a technique that has been implemented in mainframe computers to provide rapid recovery from transient processor failures. Hardware-based MIR designs eliminate rollback data hazards by providing data redundancy implemented in hardware. Compiler-based MIR designs have also been developed which remove rollback data hazards directly with data-flow transformations. This paper focuses on compiler-assisted techniques to achieve multiple instruction rollback recovery. We observe that some data hazards resulting from instruction rollback can be resolved efficiently by providing an operand read buffer while others are resolved more efficiently with compiler transformations. A compiler-assisted multiple instruction rollback scheme is developed which combines hardware-implemented data redundancy with compiler-driven hazard removal transformations. Experimental performance evaluations indicate improved efficiency over previous hardware-based and compiler-based schemes.

  7. Algorithmic synthesis using Python compiler

    NASA Astrophysics Data System (ADS)

    Cieszewski, Radoslaw; Romaniuk, Ryszard; Pozniak, Krzysztof; Linczuk, Maciej

    2015-09-01

    This paper presents a python to VHDL compiler. The compiler interprets an algorithmic description of a desired behavior written in Python and translate it to VHDL. FPGA combines many benefits of both software and ASIC implementations. Like software, the programmed circuit is flexible, and can be reconfigured over the lifetime of the system. FPGAs have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. This can be achieved by using many computational resources at the same time. Creating parallel programs implemented in FPGAs in pure HDL is difficult and time consuming. Using higher level of abstraction and High-Level Synthesis compiler implementation time can be reduced. The compiler has been implemented using the Python language. This article describes design, implementation and results of created tools.

  8. Obtaining correct compile results by absorbing mismatches between data types representations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Horie, Michihiro; Horii, Hiroshi H.; Kawachiya, Kiyokuni

    Methods and a system are provided. A method includes implementing a function, which a compiler for a first language does not have, using a compiler for a second language. The implementing step includes generating, by the compiler for the first language, a first abstract syntax tree. The implementing step further includes converting, by a converter, the first abstract syntax tree to a second abstract syntax tree of the compiler for the second language using a conversion table from data representation types in the first language to data representation types in the second language. When a compilation error occurs, the implementingmore » step also includes generating a special node for error processing in the second abstract syntax tree and storing an error token in the special node. When unparsing, the implementing step additionally includes outputting the error token, in the form of source code written in the first language.« less

  9. Extending R packages to support 64-bit compiled code: An illustration with spam64 and GIMMS NDVI3g data

    NASA Astrophysics Data System (ADS)

    Gerber, Florian; Mösinger, Kaspar; Furrer, Reinhard

    2017-07-01

    Software packages for spatial data often implement a hybrid approach of interpreted and compiled programming languages. The compiled parts are usually written in C, C++, or Fortran, and are efficient in terms of computational speed and memory usage. Conversely, the interpreted part serves as a convenient user-interface and calls the compiled code for computationally demanding operations. The price paid for the user friendliness of the interpreted component is-besides performance-the limited access to low level and optimized code. An example of such a restriction is the 64-bit vector support of the widely used statistical language R. On the R side, users do not need to change existing code and may not even notice the extension. On the other hand, interfacing 64-bit compiled code efficiently is challenging. Since many R packages for spatial data could benefit from 64-bit vectors, we investigate strategies to efficiently pass 64-bit vectors to compiled languages. More precisely, we show how to simply extend existing R packages using the foreign function interface to seamlessly support 64-bit vectors. This extension is shown with the sparse matrix algebra R package spam. The new capabilities are illustrated with an example of GIMMS NDVI3g data featuring a parametric modeling approach for a non-stationary covariance matrix.

  10. Computer-Aided Parallelizer and Optimizer

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  11. Temporal Planning for Compilation of Quantum Approximate Optimization Algorithm Circuits

    NASA Technical Reports Server (NTRS)

    Venturelli, Davide; Do, Minh Binh; Rieffel, Eleanor Gilbert; Frank, Jeremy David

    2017-01-01

    We investigate the application of temporal planners to the problem of compiling quantum circuits to newly emerging quantum hardware. While our approach is general, we focus our initial experiments on Quantum Approximate Optimization Algorithm (QAOA) circuits that have few ordering constraints and allow highly parallel plans. We report on experiments using several temporal planners to compile circuits of various sizes to a realistic hardware. This early empirical evaluation suggests that temporal planning is a viable approach to quantum circuit compilation.

  12. Floating-Point Units and Algorithms for field-programmable gate arrays

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Underwood, Keith D.; Hemmert, K. Scott

    2005-11-01

    The software that we are attempting to copyright is a package of floating-point unit descriptions and example algorithm implementations using those units for use in FPGAs. The floating point units are best-in-class implementations of add, multiply, divide, and square root floating-point operations. The algorithm implementations are sample (not highly flexible) implementations of FFT, matrix multiply, matrix vector multiply, and dot product. Together, one could think of the collection as an implementation of parts of the BLAS library or something similar to the FFTW packages (without the flexibility) for FPGAs. Results from this work has been published multiple times and wemore » are working on a publication to discuss the techniques we use to implement the floating-point units, For some more background, FPGAS are programmable hardware. "Programs" for this hardware are typically created using a hardware description language (examples include Verilog, VHDL, and JHDL). Our floating-point unit descriptions are written in JHDL, which allows them to include placement constraints that make them highly optimized relative to some other implementations of floating-point units. Many vendors (Nallatech from the UK, SRC Computers in the US) have similar implementations, but our implementations seem to be somewhat higher performance. Our algorithm implementations are written in VHDL and models of the floating-point units are provided in VHDL as well. FPGA "programs" make multiple "calls" (hardware instantiations) to libraries of intellectual property (IP), such as the floating-point unit library described here. These programs are then compiled using a tool called a synthesizer (such as a tool from Synplicity, Inc.). The compiled file is a netlist of gates and flip-flops. This netlist is then mapped to a particular type of FPGA by a mapper and then a place- and-route tool. These tools assign the gates in the netlist to specific locations on the specific type of FPGA chip used and constructs the required routes between them. The result is a "bitstream" that is analogous to a compiled binary. The bitstream is loaded into the FPGA to create a specific hardware configuration.« less

  13. HAL/S-FC compiler system functional specification

    NASA Technical Reports Server (NTRS)

    1974-01-01

    Compiler organization is discussed, including overall compiler structure, internal data transfer, compiler development, and code optimization. The user, system, and SDL interfaces are described, along with compiler system requirements. Run-time software support package and restrictions and dependencies are also considered of the HAL/S-FC system.

  14. Compiler-Assisted Multiple Instruction Rollback Recovery Using a Read Buffer. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Alewine, Neal Jon

    1993-01-01

    Multiple instruction rollback (MIR) is a technique to provide rapid recovery from transient processor failures and was implemented in hardware by researchers and slow in mainframe computers. Hardware-based MIR designs eliminate rollback data hazards by providing data redundancy implemented in hardware. Compiler-based MIR designs were also developed which remove rollback data hazards directly with data flow manipulations, thus eliminating the need for most data redundancy hardware. Compiler-assisted techniques to achieve multiple instruction rollback recovery are addressed. It is observed that data some hazards resulting from instruction rollback can be resolved more efficiently by providing hardware redundancy while others are resolved more efficiently with compiler transformations. A compiler-assisted multiple instruction rollback scheme is developed which combines hardware-implemented data redundancy with compiler-driven hazard removal transformations. Experimental performance evaluations were conducted which indicate improved efficiency over previous hardware-based and compiler-based schemes. Various enhancements to the compiler transformations and to the data redundancy hardware developed for the compiler-assisted MIR scheme are described and evaluated. The final topic deals with the application of compiler-assisted MIR techniques to aid in exception repair and branch repair in a speculative execution architecture.

  15. Cache Locality Optimization for Recursive Programs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lifflander, Jonathan; Krishnamoorthy, Sriram

    We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. Wemore » present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.« less

  16. A Selective Encryption Algorithm Based on AES for Medical Information.

    PubMed

    Oh, Ju-Young; Yang, Dong-Il; Chon, Ki-Hwan

    2010-03-01

    The transmission of medical information is currently a daily routine. Medical information needs efficient, robust and secure encryption modes, but cryptography is primarily a computationally intensive process. Towards this direction, we design a selective encryption scheme for critical data transmission. We expand the advandced encrytion stanard (AES)-Rijndael with five criteria: the first is the compression of plain data, the second is the variable size of the block, the third is the selectable round, the fourth is the optimization of software implementation and the fifth is the selective function of the whole routine. We have tested our selective encryption scheme by C(++) and it was compiled with Code::Blocks using a MinGW GCC compiler. The experimental results showed that our selective encryption scheme achieves a faster execution speed of encryption/decryption. In future work, we intend to use resource optimization to enhance the round operations, such as SubByte/InvSubByte, by exploiting similarities between encryption and decryption. As encryption schemes become more widely used, the concept of hardware and software co-design is also a growing new area of interest.

  17. A Selective Encryption Algorithm Based on AES for Medical Information

    PubMed Central

    Oh, Ju-Young; Chon, Ki-Hwan

    2010-01-01

    Objectives The transmission of medical information is currently a daily routine. Medical information needs efficient, robust and secure encryption modes, but cryptography is primarily a computationally intensive process. Towards this direction, we design a selective encryption scheme for critical data transmission. Methods We expand the advandced encrytion stanard (AES)-Rijndael with five criteria: the first is the compression of plain data, the second is the variable size of the block, the third is the selectable round, the fourth is the optimization of software implementation and the fifth is the selective function of the whole routine. We have tested our selective encryption scheme by C++ and it was compiled with Code::Blocks using a MinGW GCC compiler. Results The experimental results showed that our selective encryption scheme achieves a faster execution speed of encryption/decryption. In future work, we intend to use resource optimization to enhance the round operations, such as SubByte/InvSubByte, by exploiting similarities between encryption and decryption. Conclusions As encryption schemes become more widely used, the concept of hardware and software co-design is also a growing new area of interest. PMID:21818420

  18. A survey of compiler development aids. [concerning lexical, syntax, and semantic analysis

    NASA Technical Reports Server (NTRS)

    Buckles, B. P.; Hodges, B. C.; Hsia, P.

    1977-01-01

    A theoretical background was established for the compilation process by dividing it into five phases and explaining the concepts and algorithms that underpin each. The five selected phases were lexical analysis, syntax analysis, semantic analysis, optimization, and code generation. Graph theoretical optimization techniques were presented, and approaches to code generation were described for both one-pass and multipass compilation environments. Following the initial tutorial sections, more than 20 tools that were developed to aid in the process of writing compilers were surveyed. Eight of the more recent compiler development aids were selected for special attention - SIMCMP/STAGE2, LANG-PAK, COGENT, XPL, AED, CWIC, LIS, and JOCIT. The impact of compiler development aids were assessed some of their shortcomings and some of the areas of research currently in progress were inspected.

  19. Compiling quantum circuits to realistic hardware architectures using temporal planners

    NASA Astrophysics Data System (ADS)

    Venturelli, Davide; Do, Minh; Rieffel, Eleanor; Frank, Jeremy

    2018-04-01

    To run quantum algorithms on emerging gate-model quantum hardware, quantum circuits must be compiled to take into account constraints on the hardware. For near-term hardware, with only limited means to mitigate decoherence, it is critical to minimize the duration of the circuit. We investigate the application of temporal planners to the problem of compiling quantum circuits to newly emerging quantum hardware. While our approach is general, we focus on compiling to superconducting hardware architectures with nearest neighbor constraints. Our initial experiments focus on compiling Quantum Alternating Operator Ansatz (QAOA) circuits whose high number of commuting gates allow great flexibility in the order in which the gates can be applied. That freedom makes it more challenging to find optimal compilations but also means there is a greater potential win from more optimized compilation than for less flexible circuits. We map this quantum circuit compilation problem to a temporal planning problem, and generated a test suite of compilation problems for QAOA circuits of various sizes to a realistic hardware architecture. We report compilation results from several state-of-the-art temporal planners on this test set. This early empirical evaluation demonstrates that temporal planning is a viable approach to quantum circuit compilation.

  20. Python based high-level synthesis compiler

    NASA Astrophysics Data System (ADS)

    Cieszewski, Radosław; Pozniak, Krzysztof; Romaniuk, Ryszard

    2014-11-01

    This paper presents a python based High-Level synthesis (HLS) compiler. The compiler interprets an algorithmic description of a desired behavior written in Python and map it to VHDL. FPGA combines many benefits of both software and ASIC implementations. Like software, the mapped circuit is flexible, and can be reconfigured over the lifetime of the system. FPGAs therefore have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. Creating parallel programs implemented in FPGAs is not trivial. This article describes design, implementation and first results of created Python based compiler.

  1. Advancing HAL to an operational status

    NASA Technical Reports Server (NTRS)

    1974-01-01

    The development of the HAL language and the compiler implementation of the mathematical subset of the language have been completed. On-site support, training, and maintenance of this compiler were enlarged to broaden the implementation of HAL to include all features of the language specification for NASA manned space usage. A summary of activities associated with the HAL compiler for the UNIVAC 1108 is given.

  2. ZettaBricks: A Language Compiler and Runtime System for Anyscale Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Amarasinghe, Saman

    This grant supported the ZettaBricks and OpenTuner projects. ZettaBricks is a new implicitly parallel language and compiler where defining multiple implementations of multiple algorithms to solve a problem is the natural way of programming. ZettaBricks makes algorithmic choice a first class construct of the language. Choices are provided in a way that also allows our compiler to tune at a finer granularity. The ZettaBricks compiler autotunes programs by making both fine-grained as well as algorithmic choices. Choices also include different automatic parallelization techniques, data distributions, algorithmic parameters, transformations, and blocking. Additionally, ZettaBricks introduces novel techniques to autotune algorithms for differentmore » convergence criteria. When choosing between various direct and iterative methods, the ZettaBricks compiler is able to tune a program in such a way that delivers near-optimal efficiency for any desired level of accuracy. The compiler has the flexibility of utilizing different convergence criteria for the various components within a single algorithm, providing the user with accuracy choice alongside algorithmic choice. OpenTuner is a generalization of the experience gained in building an autotuner for ZettaBricks. OpenTuner is a new open source framework for building domain-specific multi-objective program autotuners. OpenTuner supports fully-customizable configuration representations, an extensible technique representation to allow for domain-specific techniques, and an easy to use interface for communicating with the program to be autotuned. A key capability inside OpenTuner is the use of ensembles of disparate search techniques simultaneously; techniques that perform well will dynamically be allocated a larger proportion of tests.« less

  3. Ada Integrated Environment III Computer Program Development Specification. Volume III. Ada Optimizing Compiler.

    DTIC Science & Technology

    1981-12-01

    file.library-unit{.subunit).SYMAP Statement Map: library-file. library-unit.subunit).SMAP Type Map: 1 ibrary.fi le. 1 ibrary-unit{.subunit). TMAP The library...generator SYMAP Symbol Map code generator SMAP Updated Statement Map code generator TMAP Type Map code generator A.3.5 The PUNIT Command The P UNIT...Core.Stmtmap) NAME Tmap (Core.Typemap) END Example A-3 Compiler Command Stream for the Code Generator Texas Instruments A-5 Ada Optimizing Compiler

  4. Optimization guide for programs compiled under IBM FORTRAN H (OPT=2)

    NASA Technical Reports Server (NTRS)

    Smith, D. M.; Dobyns, A. H.; Marsh, H. M.

    1977-01-01

    Guidelines are given to provide the programmer with various techniques for optimizing programs when the FORTRAN IV H compiler is used with OPT=2. Subroutines and programs are described in the appendices along with a timing summary of all the examples given in the manual.

  5. Compiler-assisted multiple instruction rollback recovery using a read buffer

    NASA Technical Reports Server (NTRS)

    Alewine, Neal J.; Chen, Shyh-Kwei; Fuchs, W. Kent; Hwu, Wen-Mei W.

    1995-01-01

    Multiple instruction rollback (MIR) is a technique that has been implemented in mainframe computers to provide rapid recovery from transient processor failures. Hardware-based MIR designs eliminate rollback data hazards by providing data redundancy implemented in hardware. Compiler-based MIR designs have also been developed which remove rollback data hazards directly with data-flow transformations. This paper describes compiler-assisted techniques to achieve multiple instruction rollback recovery. We observe that some data hazards resulting from instruction rollback can be resolved efficiently by providing an operand read buffer while others are resolved more efficiently with compiler transformations. The compiler-assisted scheme presented consists of hardware that is less complex than shadow files, history files, history buffers, or delayed write buffers, while experimental evaluation indicates performance improvement over compiler-based schemes.

  6. Final Report A Multi-Language Environment For Programmable Code Optimization and Empirical Tuning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yi, Qing; Whaley, Richard Clint; Qasem, Apan

    This report summarizes our effort and results of building an integrated optimization environment to effectively combine the programmable control and the empirical tuning of source-to-source compiler optimizations within the framework of multiple existing languages, specifically C, C++, and Fortran. The environment contains two main components: the ROSE analysis engine, which is based on the ROSE C/C++/Fortran2003 source-to-source compiler developed by Co-PI Dr.Quinlan et. al at DOE/LLNL, and the POET transformation engine, which is based on an interpreted program transformation language developed by Dr. Yi at University of Texas at San Antonio (UTSA). The ROSE analysis engine performs advanced compiler analysis,more » identifies profitable code transformations, and then produces output in POET, a language designed to provide programmable control of compiler optimizations to application developers and to support the parameterization of architecture-sensitive optimizations so that their configurations can be empirically tuned later. This POET output can then be ported to different machines together with the user application, where a POET-based search engine empirically reconfigures the parameterized optimizations until satisfactory performance is found. Computational specialists can write POET scripts to directly control the optimization of their code. Application developers can interact with ROSE to obtain optimization feedback as well as provide domain-specific knowledge and high-level optimization strategies. The optimization environment is expected to support different levels of automation and programmer intervention, from fully-automated tuning to semi-automated development and to manual programmable control.« less

  7. A survey of compiler optimization techniques

    NASA Technical Reports Server (NTRS)

    Schneck, P. B.

    1972-01-01

    Major optimization techniques of compilers are described and grouped into three categories: machine dependent, architecture dependent, and architecture independent. Machine-dependent optimizations tend to be local and are performed upon short spans of generated code by using particular properties of an instruction set to reduce the time or space required by a program. Architecture-dependent optimizations are global and are performed while generating code. These optimizations consider the structure of a computer, but not its detailed instruction set. Architecture independent optimizations are also global but are based on analysis of the program flow graph and the dependencies among statements of source program. A conceptual review of a universal optimizer that performs architecture-independent optimizations at source-code level is also presented.

  8. A Mathematical Approach for Compiling and Optimizing Hardware Implementations of DSP Transforms

    DTIC Science & Technology

    2010-08-01

    FPGA throughput [billion samples per second] performance [ Gflop /s] 0 30 60 90 120 150 0 1 2 3 4 5 0 5,000 10,000 15,000 20,000 25,000...30,000 35,000 40,000 45,000 area [slices] DFT 64 (floating point) on Xilinx Virtex-6 FPGA throughput [billion samples per second] performance [ Gflop ...Virtex-6 FPGA throughput [billion samples per second] performance [ Gflop /s] 0 50 100 150 200 250 0 1 2 3 4 5 0 10,000 20,000 30,000 40,000

  9. Testing-Based Compiler Validation for Synchronous Languages

    NASA Technical Reports Server (NTRS)

    Garoche, Pierre-Loic; Howar, Falk; Kahsai, Temesghen; Thirioux, Xavier

    2014-01-01

    In this paper we present a novel lightweight approach to validate compilers for synchronous languages. Instead of verifying a compiler for all input programs or providing a fixed suite of regression tests, we extend the compiler to generate a test-suite with high behavioral coverage and geared towards discovery of faults for every compiled artifact. We have implemented and evaluated our approach using a compiler from Lustre to C.

  10. Numerical performance and throughput benchmark for electronic structure calculations in PC-Linux systems with new architectures, updated compilers, and libraries.

    PubMed

    Yu, Jen-Shiang K; Hwang, Jenn-Kang; Tang, Chuan Yi; Yu, Chin-Hui

    2004-01-01

    A number of recently released numerical libraries including Automatically Tuned Linear Algebra Subroutines (ATLAS) library, Intel Math Kernel Library (MKL), GOTO numerical library, and AMD Core Math Library (ACML) for AMD Opteron processors, are linked against the executables of the Gaussian 98 electronic structure calculation package, which is compiled by updated versions of Fortran compilers such as Intel Fortran compiler (ifc/efc) 7.1 and PGI Fortran compiler (pgf77/pgf90) 5.0. The ifc 7.1 delivers about 3% of improvement on 32-bit machines compared to the former version 6.0. Performance improved from pgf77 3.3 to 5.0 is also around 3% when utilizing the original unmodified optimization options of the compiler enclosed in the software. Nevertheless, if extensive compiler tuning options are used, the speed can be further accelerated to about 25%. The performances of these fully optimized numerical libraries are similar. The double-precision floating-point (FP) instruction sets (SSE2) are also functional on AMD Opteron processors operated in 32-bit compilation, and Intel Fortran compiler has performed better optimization. Hardware-level tuning is able to improve memory bandwidth by adjusting the DRAM timing, and the efficiency in the CL2 mode is further accelerated by 2.6% compared to that of the CL2.5 mode. The FP throughput is measured by simultaneous execution of two identical copies of each of the test jobs. Resultant performance impact suggests that IA64 and AMD64 architectures are able to fulfill significantly higher throughput than the IA32, which is consistent with the SpecFPrate2000 benchmarks.

  11. OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers

    NASA Astrophysics Data System (ADS)

    Kimura, Keiji; Mase, Masayoshi; Mikami, Hiroki; Miyamoto, Takamichi; Shirako, Jun; Kasahara, Hironori

    OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled "Multicore Technology for Realtime Consumer Electronics." By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.

  12. Proceedings of the second SISAL users` conference

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feo, J T; Frerking, C; Miller, P J

    1992-12-01

    This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less

  13. HAL/S-FC compiler system specifications

    NASA Technical Reports Server (NTRS)

    1976-01-01

    This document specifies the informational interfaces within the HAL/S-FC compiler, and between the compiler and the external environment. This Compiler System Specification is for the HAL/S-FC compiler and its associated run time facilities which implement the full HAL/S language. The HAL/S-FC compiler is designed to operate stand-alone on any compatible IBM 360/370 computer and within the Software Development Laboratory (SDL) at NASA/JSC, Houston, Texas.

  14. Final report: Compiled MPI. Cost-Effective Exascale Application Development

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gropp, William Douglas

    2015-12-21

    This is the final report on Compiled MPI: Cost-Effective Exascale Application Development, and summarizes the results under this project. The project investigated runtime enviroments that improve the performance of MPI (Message-Passing Interface) programs; work at Illinois in the last period of this project looked at optimizing data access optimizations expressed with MPI datatypes.

  15. DFT Performance Prediction in FFTW

    NASA Astrophysics Data System (ADS)

    Gu, Liang; Li, Xiaoming

    Fastest Fourier Transform in the West (FFTW) is an adaptive FFT library that generates highly efficient Discrete Fourier Transform (DFT) implementations. It is one of the fastest FFT libraries available and it outperforms many adaptive or hand-tuned DFT libraries. Its success largely relies on the huge search space spanned by several FFT algorithms and a set of compiler generated C code (called codelets) for small size DFTs. FFTW empirically finds the best algorithm by measuring the performance of different algorithm combinations. Although the empirical search works very well for FFTW, the search process does not explain why the best plan found performs best, and the search overhead grows polynomially as the DFT size increases. The opposite of empirical search is model-driven optimization. However, it is widely believed that model-driven optimization is inferior to empirical search and is particularly powerless to solve problems as complex as the optimization of DFT.

  16. Declarative language design for interactive visualization.

    PubMed

    Heer, Jeffrey; Bostock, Michael

    2010-01-01

    We investigate the design of declarative, domain-specific languages for constructing interactive visualizations. By separating specification from execution, declarative languages can simplify development, enable unobtrusive optimization, and support retargeting across platforms. We describe the design of the Protovis specification language and its implementation within an object-oriented, statically-typed programming language (Java). We demonstrate how to support rich visualizations without requiring a toolkit-specific data model and extend Protovis to enable declarative specification of animated transitions. To support cross-platform deployment, we introduce rendering and event-handling infrastructures decoupled from the runtime platform, letting designers retarget visualization specifications (e.g., from desktop to mobile phone) with reduced effort. We also explore optimizations such as runtime compilation of visualization specifications, parallelized execution, and hardware-accelerated rendering. We present benchmark studies measuring the performance gains provided by these optimizations and compare performance to existing Java-based visualization tools, demonstrating scalability improvements exceeding an order of magnitude.

  17. User involvement in the implementation of clinical guidelines for common mental health disorders: a review and compilation of strategies and resources.

    PubMed

    Moreno, Eliana M; Moriana, Juan Antonio

    2016-08-09

    There is now broad consensus regarding the importance of involving users in the process of implementing guidelines. Few studies, however, have addressed this issue, let alone the implementation of guidelines for common mental health disorders. The aim of this study is to compile and describe implementation strategies and resources related to common clinical mental health disorders targeted at service users. The literature was reviewed and resources for the implementation of clinical guidelines were compiled using the PRISMA model. A mixed qualitative and quantitative analysis was performed based on a series of categories developed ad hoc. A total of 263 items were included in the preliminary analysis and 64 implementation resources aimed at users were analysed in depth. A wide variety of types, sources and formats were identified, including guides (40%), websites (29%), videos and leaflets, as well as instruments for the implementation of strategies regarding information and education (64%), self-care, or users' assessment of service quality. The results reveal the need to establish clear criteria for assessing the quality of implementation materials in general and standardising systems to classify user-targeted strategies. The compilation and description of key elements of strategies and resources for users can be of interest in designing materials and specific actions for this target audience, as well as improving the implementation of clinical guidelines.

  18. High Level Rule Modeling Language for Airline Crew Pairing

    NASA Astrophysics Data System (ADS)

    Mutlu, Erdal; Birbil, Ş. Ilker; Bülbül, Kerem; Yenigün, Hüsnü

    2011-09-01

    The crew pairing problem is an airline optimization problem where a set of least costly pairings (consecutive flights to be flown by a single crew) that covers every flight in a given flight network is sought. A pairing is defined by using a very complex set of feasibility rules imposed by international and national regulatory agencies, and also by the airline itself. The cost of a pairing is also defined by using complicated rules. When an optimization engine generates a sequence of flights from a given flight network, it has to check all these feasibility rules to ensure whether the sequence forms a valid pairing. Likewise, the engine needs to calculate the cost of the pairing by using certain rules. However, the rules used for checking the feasibility and calculating the costs are usually not static. Furthermore, the airline companies carry out what-if-type analyses through testing several alternate scenarios in each planning period. Therefore, embedding the implementation of feasibility checking and cost calculation rules into the source code of the optimization engine is not a practical approach. In this work, a high level language called ARUS is introduced for describing the feasibility and cost calculation rules. A compiler for ARUS is also implemented in this work to generate a dynamic link library to be used by crew pairing optimization engines.

  19. Development of the Tensoral Computer Language

    NASA Technical Reports Server (NTRS)

    Ferziger, Joel; Dresselhaus, Eliot

    1996-01-01

    The research scientist or engineer wishing to perform large scale simulations or to extract useful information from existing databases is required to have expertise in the details of the particular database, the numerical methods and the computer architecture to be used. This poses a significant practical barrier to the use of simulation data. The goal of this research was to develop a high-level computer language called Tensoral, designed to remove this barrier. The Tensoral language provides a framework in which efficient generic data manipulations can be easily coded and implemented. First of all, Tensoral is general. The fundamental objects in Tensoral represent tensor fields and the operators that act on them. The numerical implementation of these tensors and operators is completely and flexibly programmable. New mathematical constructs and operators can be easily added to the Tensoral system. Tensoral is compatible with existing languages. Tensoral tensor operations co-exist in a natural way with a host language, which may be any sufficiently powerful computer language such as Fortran, C, or Vectoral. Tensoral is very-high-level. Tensor operations in Tensoral typically act on entire databases (i.e., arrays) at one time and may, therefore, correspond to many lines of code in a conventional language. Tensoral is efficient. Tensoral is a compiled language. Database manipulations are simplified optimized and scheduled by the compiler eventually resulting in efficient machine code to implement them.

  20. Empirical Performance Model-Driven Data Layout Optimization and Library Call Selection for Tensor Contraction Expressions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Qingda; Gao, Xiaoyang; Krishnamoorthy, Sriram

    Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of loop unrolling is determined by executing different versions of the computation. In contrast, optimizing compilers use a model-driven approach to program transformation. While the model-driven approach of optimizing compilers is generally orders of magnitude faster than ATLAS-like library generators, its effectiveness can be limited by the accuracy of the performance models used. In this paper, we describe an approach where a class of computations is modeled in terms of constituent operations that are empiricallymore » measured, thereby allowing modeling of the overall execution time. The performance model with empirically determined cost components is used to perform data layout optimization together with the selection of library calls and layout transformations in the context of the Tensor Contraction Engine, a compiler for a high-level domain-specific language for expressing computational models in quantum chemistry. The effectiveness of the approach is demonstrated through experimental measurements on representative computations from quantum chemistry.« less

  1. Optimizing Irregular Applications for Energy and Performance on the Tilera Many-core Architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chavarría-Miranda, Daniel; Panyala, Ajay R.; Halappanavar, Mahantesh

    Optimizing applications simultaneously for energy and performance is a complex problem. High performance, parallel, irregular applications are notoriously hard to optimize due to their data-dependent memory accesses, lack of structured locality and complex data structures and code patterns. Irregular kernels are growing in importance in applications such as machine learning, graph analytics and combinatorial scientific computing. Performance- and energy-efficient implementation of these kernels on modern, energy efficient, multicore and many-core platforms is therefore an important and challenging problem. We present results from optimizing two irregular applications { the Louvain method for community detection (Grappolo), and high-performance conjugate gradient (HPCCG) {more » on the Tilera many-core system. We have significantly extended MIT's OpenTuner auto-tuning framework to conduct a detailed study of platform-independent and platform-specific optimizations to improve performance as well as reduce total energy consumption. We explore the optimization design space along three dimensions: memory layout schemes, compiler-based code transformations, and optimization of parallel loop schedules. Using auto-tuning, we demonstrate whole node energy savings of up to 41% relative to a baseline instantiation, and up to 31% relative to manually optimized variants.« less

  2. Evaluation of HAL/S language compilability using SAMSO's Compiler Writing System (CWS)

    NASA Technical Reports Server (NTRS)

    Feliciano, M.; Anderson, H. D.; Bond, J. W., III

    1976-01-01

    NASA/Langley is engaged in a program to develop an adaptable guidance and control software concept for spacecraft such as shuttle-launched payloads. It is envisioned that this flight software be written in a higher-order language, such as HAL/S, to facilitate changes or additions. To make this adaptable software transferable to various onboard computers, a compiler writing system capability is necessary. A joint program with the Air Force Space and Missile Systems Organization was initiated to determine if the Compiler Writing System (CWS) owned by the Air Force could be utilized for this purpose. The present study explores the feasibility of including the HAL/S language constructs in CWS and the effort required to implement these constructs. This will determine the compilability of HAL/S using CWS and permit NASA/Langley to identify the HAL/S constructs desired for their applications. The study consisted of comparing the implementation of the Space Programming Language using CWS with the requirements for the implementation of HAL/S. It is the conclusion of the study that CWS already contains many of the language features of HAL/S and that it can be expanded for compiling part or all of HAL/S. It is assumed that persons reading and evaluating this report have a basic familiarity with (1) the principles of compiler construction and operation, and (2) the logical structure and applications characteristics of HAL/S and SPL.

  3. Ada Compiler Validation Summary Report: Certificate Number 910626S1. 11173 U.S. Navy Ada/L, Version 4.0 (/Optimize) VAX 855 = AN/UYK-43 (EMR) (Bare Board).

    DTIC Science & Technology

    1991-07-30

    Target), 91 0626S1.11173 6. AUTHOR(S) National Institute of Standards and Technology Gaithersburg, MD USA 7 PERFORMING ORGANIZATION NAME(S) AND ADDRESS...Capability (ACVC). This Validation Summary Report ( VSR ) gives an account of the testing of this Ada implementation. For iny technical terms used in...8217 & ’"’ $BLANKS (1..V-20 => ’ $MAXLENINTBASEDLITERAL -Ŗ:" & (I..V-5 => 𔃺’) & ൓ :" $MAXLENREALBASEDLITERAL ൘:" & (1..V- 7 => 𔃺’) & "F.E:" $MAXSTRINGLITERAL

  4. Ada Compiler Validation Summary Report. Certificate Number: 910626S1. 11178, U.S. Navy Ada/M, Version 4.0 (/OPTIMIZE) VAX 11/785 = AN/UYK-44 (EMR) (Bare Board).

    DTIC Science & Technology

    1991-07-30

    Technology Gaithersburg, MD USA 7 . PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION National Institute of Stanaiards and...ACVC). This Validation Summary Report ( VSR ) gives an account of the testing of this Ada implementation. For any technical terms used in this report...8217"’ & (1..V-l-V/2 => ’A’) & ’I’ & ’"’ $BLANKS (l..V-20 => ’ $MAXLENINTBASEDLITERAL Ŗ:" & (1..V-5 => 𔃺’) & ൓:" $MAXLENREALBASEDLITERAL ൘:" & (1..V- 7

  5. Arity Raising in Manticore

    NASA Astrophysics Data System (ADS)

    Bergstrom, Lars; Reppy, John

    Compilers for polymorphic languages are required to treat values in programs in an abstract and generic way at the source level. The challenges of optimizing the boxing of raw values, flattening of argument tuples, and raising the arity of functions that handle complex structures to reduce memory usage are old ones, but take on newfound import with processors that have twice as many registers. We present a novel strategy that uses both control-flow and type information to provide an arity raising implementation addressing these problems. This strategy is conservative - no matter the execution path, the transformed program will not perform extra operations.

  6. On the design of a radix-10 online floating-point multiplier

    NASA Astrophysics Data System (ADS)

    McIlhenny, Robert D.; Ercegovac, Milos D.

    2009-08-01

    This paper describes an approach to design and implement a radix-10 online floating-point multiplier. An online approach is considered because it offers computational flexibility not available with conventional arithmetic. The design was coded in VHDL and compiled, synthesized, and mapped onto a Virtex 5 FPGA to measure cost in terms of LUTs (look-up-tables) as well as the cycle time and total latency. The routing delay which was not optimized is the major component in the cycle time. For a rough estimate of the cost/latency characteristics, our design was compared to a standard radix-2 floating-point multiplier of equivalent precision. The results demonstrate that even an unoptimized radix-10 online design is an attractive implementation alternative for FPGA floating-point multiplication.

  7. Lattice surgery on the Raussendorf lattice

    NASA Astrophysics Data System (ADS)

    Herr, Daniel; Paler, Alexandru; Devitt, Simon J.; Nori, Franco

    2018-07-01

    Lattice surgery is a method to perform quantum computation fault-tolerantly by using operations on boundary qubits between different patches of the planar code. This technique allows for universal planar code computation without eliminating the intrinsic two-dimensional nearest-neighbor properties of the surface code that eases physical hardware implementations. Lattice surgery approaches to algorithmic compilation and optimization have been demonstrated to be more resource efficient for resource-intensive components of a fault-tolerant algorithm, and consequently may be preferable over braid-based logic. Lattice surgery can be extended to the Raussendorf lattice, providing a measurement-based approach to the surface code. In this paper we describe how lattice surgery can be performed on the Raussendorf lattice and therefore give a viable alternative to computation using braiding in measurement-based implementations of topological codes.

  8. Compiler-Directed File Layout Optimization for Hierarchical Storage Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ding, Wei; Zhang, Yuanrui; Kandemir, Mahmut

    File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performancemore » against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.« less

  9. Compiler-Directed File Layout Optimization for Hierarchical Storage Systems

    DOE PAGES

    Ding, Wei; Zhang, Yuanrui; Kandemir, Mahmut; ...

    2013-01-01

    File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performancemore » against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.« less

  10. RPython high-level synthesis

    NASA Astrophysics Data System (ADS)

    Cieszewski, Radoslaw; Linczuk, Maciej

    2016-09-01

    The development of FPGA technology and the increasing complexity of applications in recent decades have forced compilers to move to higher abstraction levels. Compilers interprets an algorithmic description of a desired behavior written in High-Level Languages (HLLs) and translate it to Hardware Description Languages (HDLs). This paper presents a RPython based High-Level synthesis (HLS) compiler. The compiler get the configuration parameters and map RPython program to VHDL. Then, VHDL code can be used to program FPGA chips. In comparison of other technologies usage, FPGAs have the potential to achieve far greater performance than software as a result of omitting the fetch-decode-execute operations of General Purpose Processors (GPUs), and introduce more parallel computation. This can be exploited by utilizing many resources at the same time. Creating parallel algorithms computed with FPGAs in pure HDL is difficult and time consuming. Implementation time can be greatly reduced with High-Level Synthesis compiler. This article describes design methodologies and tools, implementation and first results of created VHDL backend for RPython compiler.

  11. A Multiprocessor SoC Architecture with Efficient Communication Infrastructure and Advanced Compiler Support for Easy Application Development

    NASA Astrophysics Data System (ADS)

    Urfianto, Mohammad Zalfany; Isshiki, Tsuyoshi; Khan, Arif Ullah; Li, Dongju; Kunieda, Hiroaki

    This paper presentss a Multiprocessor System-on-Chips (MPSoC) architecture used as an execution platform for the new C-language based MPSoC design framework we are currently developing. The MPSoC architecture is based on an existing SoC platform with a commercial RISC core acting as the host CPU. We extend the existing SoC with a multiprocessor-array block that is used as the main engine to run parallel applications modeled in our design framework. Utilizing several optimizations provided by our compiler, an efficient inter-communication between processing elements with minimum overhead is implemented. A host-interface is designed to integrate the existing RISC core to the multiprocessor-array. The experimental results show that an efficacious integration is achieved, proving that the designed communication module can be used to efficiently incorporate off-the-shelf processors as a processing element for MPSoC architectures designed using our framework.

  12. Ada 9X Project Report, A Study of Implementation-Dependent Pragmas and Attributes in Ada

    DTIC Science & Technology

    1989-11-01

    here communicatons with the vendor were often required to firmly establish the behavior of some implementation-dependent features CMU-SEI-SR-89-19 3 2.2...compilers), by potential market penetration (percent coverage of all surveyed implementations), and by cross-compiler influence (percentage of cross...operations in the context of a tightly integrated development environment, specific underlying operating system services (beneath the Ada run- time kernel

  13. qtcm 0.1.2: A Python Implementation of the Neelin-Zeng Quasi-Equilibrium Tropical Circulation model

    NASA Astrophysics Data System (ADS)

    Lin, J. W.-B.

    2008-10-01

    Historically, climate models have been developed incrementally and in compiled languages like Fortran. While the use of legacy compiled languages results in fast, time-tested code, the resulting model is limited in its modularity and cannot take advantage of functionality available with modern computer languages. Here we describe an effort at using the open-source, object-oriented language Python to create more flexible climate models: the package qtcm, a Python implementation of the intermediate-level Neelin-Zeng Quasi-Equilibrium Tropical Circulation model (QTCM1) of the atmosphere. The qtcm package retains the core numerics of QTCM1, written in Fortran to optimize model performance, but uses Python structures and utilities to wrap the QTCM1 Fortran routines and manage model execution. The resulting "mixed language" modeling package allows order and choice of subroutine execution to be altered at run time, and model analysis and visualization to be integrated in interactively with model execution at run time. This flexibility facilitates more complex scientific analysis using less complex code than would be possible using traditional languages alone, and provides tools to transform the traditional "formulate hypothesis → write and test code → run model → analyze results" sequence into a feedback loop that can be executed automatically by the computer.

  14. qtcm 0.1.2: a Python implementation of the Neelin-Zeng Quasi-Equilibrium Tropical Circulation Model

    NASA Astrophysics Data System (ADS)

    Lin, J. W.-B.

    2009-02-01

    Historically, climate models have been developed incrementally and in compiled languages like Fortran. While the use of legacy compiled languages results in fast, time-tested code, the resulting model is limited in its modularity and cannot take advantage of functionality available with modern computer languages. Here we describe an effort at using the open-source, object-oriented language Python to create more flexible climate models: the package qtcm, a Python implementation of the intermediate-level Neelin-Zeng Quasi-Equilibrium Tropical Circulation model (QTCM1) of the atmosphere. The qtcm package retains the core numerics of QTCM1, written in Fortran to optimize model performance, but uses Python structures and utilities to wrap the QTCM1 Fortran routines and manage model execution. The resulting "mixed language" modeling package allows order and choice of subroutine execution to be altered at run time, and model analysis and visualization to be integrated in interactively with model execution at run time. This flexibility facilitates more complex scientific analysis using less complex code than would be possible using traditional languages alone, and provides tools to transform the traditional "formulate hypothesis → write and test code → run model → analyze results" sequence into a feedback loop that can be executed automatically by the computer.

  15. TU-AB-BRC-12: Optimized Parallel MonteCarlo Dose Calculations for Secondary MU Checks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    French, S; Nazareth, D; Bellor, M

    Purpose: Secondary MU checks are an important tool used during a physics review of a treatment plan. Commercial software packages offer varying degrees of theoretical dose calculation accuracy, depending on the modality involved. Dose calculations of VMAT plans are especially prone to error due to the large approximations involved. Monte Carlo (MC) methods are not commonly used due to their long run times. We investigated two methods to increase the computational efficiency of MC dose simulations with the BEAMnrc code. Distributed computing resources, along with optimized code compilation, will allow for accurate and efficient VMAT dose calculations. Methods: The BEAMnrcmore » package was installed on a high performance computing cluster accessible to our clinic. MATLAB and PYTHON scripts were developed to convert a clinical VMAT DICOM plan into BEAMnrc input files. The BEAMnrc installation was optimized by running the VMAT simulations through profiling tools which indicated the behavior of the constituent routines in the code, e.g. the bremsstrahlung splitting routine, and the specified random number generator. This information aided in determining the most efficient compiling parallel configuration for the specific CPU’s available on our cluster, resulting in the fastest VMAT simulation times. Our method was evaluated with calculations involving 10{sup 8} – 10{sup 9} particle histories which are sufficient to verify patient dose using VMAT. Results: Parallelization allowed the calculation of patient dose on the order of 10 – 15 hours with 100 parallel jobs. Due to the compiler optimization process, further speed increases of 23% were achieved when compared with the open-source compiler BEAMnrc packages. Conclusion: Analysis of the BEAMnrc code allowed us to optimize the compiler configuration for VMAT dose calculations. In future work, the optimized MC code, in conjunction with the parallel processing capabilities of BEAMnrc, will be applied to provide accurate and efficient secondary MU checks.« less

  16. Model compilation: An approach to automated model derivation

    NASA Technical Reports Server (NTRS)

    Keller, Richard M.; Baudin, Catherine; Iwasaki, Yumi; Nayak, Pandurang; Tanaka, Kazuo

    1990-01-01

    An approach is introduced to automated model derivation for knowledge based systems. The approach, model compilation, involves procedurally generating the set of domain models used by a knowledge based system. With an implemented example, how this approach can be used to derive models of different precision and abstraction is illustrated, and models are tailored to different tasks, from a given set of base domain models. In particular, two implemented model compilers are described, each of which takes as input a base model that describes the structure and behavior of a simple electromechanical device, the Reaction Wheel Assembly of NASA's Hubble Space Telescope. The compilers transform this relatively general base model into simple task specific models for troubleshooting and redesign, respectively, by applying a sequence of model transformations. Each transformation in this sequence produces an increasingly more specialized model. The compilation approach lessens the burden of updating and maintaining consistency among models by enabling their automatic regeneration.

  17. Compiler-assisted static checkpoint insertion

    NASA Technical Reports Server (NTRS)

    Long, Junsheng; Fuchs, W. K.; Abraham, Jacob A.

    1992-01-01

    This paper describes a compiler-assisted approach for static checkpoint insertion. Instead of fixing the checkpoint location before program execution, a compiler enhanced polling mechanism is utilized to maintain both the desired checkpoint intervals and reproducible checkpoint 1ocations. The technique has been implemented in a GNU CC compiler for Sun 3 and Sun 4 (Sparc) processors. Experiments demonstrate that the approach provides for stable checkpoint intervals and reproducible checkpoint placements with performance overhead comparable to a previously presented compiler assisted dynamic scheme (CATCH) utilizing the system clock.

  18. 77 FR 15587 - Privacy Act of 1974; Implementation

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-03-16

    ... in DMDC 11, entitled ``Investigative Records Repository'', when investigatory material is compiled... Records Repository. (i) Exemptions: (A) Investigatory material compiled for law enforcement purposes may...

  19. A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics

    NASA Astrophysics Data System (ADS)

    Poya, Roman; Gil, Antonio J.; Ortigosa, Rogelio

    2017-07-01

    The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain specific constructs for the tensor cross product and its associated algebra recently rediscovered by Bonet et al. (2015, 2016) in the context of solid mechanics. The two key ingredients of the presented expression template engine are as follows. First, the capability to mathematically transform complex chains of operations to simpler equivalent expressions, while potentially avoiding routes with higher levels of computational complexity and, second, to perform a compile time depth-first or breadth-first search to find the optimal contraction indices of a large tensor network in order to minimise the number of floating point operations. For optimisations of tensor contraction such as loop transformation, loop fusion and data locality optimisations, the framework relies heavily on compile time technologies rather than source-to-source translation or JIT techniques. Every aspect of the framework is examined through relevant performance benchmarks, including the impact of data parallelism on the performance of isomorphic and nonisomorphic tensor products, the FLOP and memory I/O optimality in the evaluation of tensor networks, the compilation cost and memory footprint of the framework and the performance of tensor cross product kernels. The framework is then applied to finite element analysis of coupled electro-mechanical problems to assess the speed-ups achieved in kernel-based numerical integration of complex electroelastic energy functionals. In this context, domain-aware expression templates combined with SIMD instructions are shown to provide a significant speed-up over the classical low-level style programming techniques.

  20. Toward Abstracting the Communication Intent in Applications to Improve Portability and Productivity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mintz, Tiffany M; Hernandez, Oscar R; Kartsaklis, Christos

    Programming with communication libraries such as the Message Passing Interface (MPI) obscures the high-level intent of the communication in an application and makes static communication analysis difficult to do. Compilers are unaware of communication libraries specifics, leading to the exclusion of communication patterns from any automated analysis and optimizations. To overcome this, communication patterns can be expressed at higher-levels of abstraction and incrementally added to existing MPI applications. In this paper, we propose the use of directives to clearly express the communication intent of an application in a way that is not specific to a given communication library. Our communicationmore » directives allow programmers to express communication among processes in a portable way, giving hints to the compiler on regions of computations that can be overlapped with communication and relaxing communication constraints on the ordering, completion and synchronization of the communication imposed by specific libraries such as MPI. The directives can then be translated by the compiler into message passing calls that efficiently implement the intended pattern and be targeted to multiple communication libraries. Thus far, we have used the directives to express point-to-point communication patterns in C, C++ and Fortran applications, and have translated them to MPI and SHMEM.« less

  1. Optimization technique of wavefront coding system based on ZEMAX externally compiled programs

    NASA Astrophysics Data System (ADS)

    Han, Libo; Dong, Liquan; Liu, Ming; Zhao, Yuejin; Liu, Xiaohua

    2016-10-01

    Wavefront coding technique as a means of athermalization applied to infrared imaging system, the design of phase plate is the key to system performance. This paper apply the externally compiled programs of ZEMAX to the optimization of phase mask in the normal optical design process, namely defining the evaluation function of wavefront coding system based on the consistency of modulation transfer function (MTF) and improving the speed of optimization by means of the introduction of the mathematical software. User write an external program which computes the evaluation function on account of the powerful computing feature of the mathematical software in order to find the optimal parameters of phase mask, and accelerate convergence through generic algorithm (GA), then use dynamic data exchange (DDE) interface between ZEMAX and mathematical software to realize high-speed data exchanging. The optimization of the rotational symmetric phase mask and the cubic phase mask have been completed by this method, the depth of focus increases nearly 3 times by inserting the rotational symmetric phase mask, while the other system with cubic phase mask can be increased to 10 times, the consistency of MTF decrease obviously, the maximum operating temperature of optimized system range between -40°-60°. Results show that this optimization method can be more convenient to define some unconventional optimization goals and fleetly to optimize optical system with special properties due to its externally compiled function and DDE, there will be greater significance for the optimization of unconventional optical system.

  2. Spiral: Automated Computing for Linear Transforms

    NASA Astrophysics Data System (ADS)

    Püschel, Markus

    2010-09-01

    Writing fast software has become extraordinarily difficult. For optimal performance, programs and their underlying algorithms have to be adapted to take full advantage of the platform's parallelism, memory hierarchy, and available instruction set. To make things worse, the best implementations are often platform-dependent and platforms are constantly evolving, which quickly renders libraries obsolete. We present Spiral, a domain-specific program generation system for important functionality used in signal processing and communication including linear transforms, filters, and other functions. Spiral completely replaces the human programmer. For a desired function, Spiral generates alternative algorithms, optimizes them, compiles them into programs, and intelligently searches for the best match to the computing platform. The main idea behind Spiral is a mathematical, declarative, domain-specific framework to represent algorithms and the use of rewriting systems to generate and optimize algorithms at a high level of abstraction. Experimental results show that the code generated by Spiral competes with, and sometimes outperforms, the best available human-written code.

  3. GLISP User's Manual. Revised.

    ERIC Educational Resources Information Center

    Novak, Gordon S., Jr.

    GLISP is a LISP-based language which provides high-level language features not found in ordinary LISP. The GLISP language is implemented by means of a compiler which accepts GLISP as input and produces ordinary LISP as output. This output can be further compiled to machine code by the LISP compiler. GLISP is available for several LISP dialects,…

  4. Using MaxCompiler for the high level synthesis of trigger algorithms

    NASA Astrophysics Data System (ADS)

    Summers, S.; Rose, A.; Sanders, P.

    2017-02-01

    Firmware for FPGA trigger applications at the CMS experiment is conventionally written using hardware description languages such as Verilog and VHDL. MaxCompiler is an alternative, Java based, tool for developing FPGA applications which uses a higher level of abstraction from the hardware than a hardware description language. An implementation of the jet and energy sum algorithms for the CMS Level-1 calorimeter trigger has been written using MaxCompiler to benchmark against the VHDL implementation in terms of accuracy, latency, resource usage, and code size. A Kalman Filter track fitting algorithm has been developed using MaxCompiler for a proposed CMS Level-1 track trigger for the High-Luminosity LHC upgrade. The design achieves a low resource usage, and has a latency of 187.5 ns per iteration.

  5. Portuguese food composition database quality management system.

    PubMed

    Oliveira, L M; Castanheira, I P; Dantas, M A; Porto, A A; Calhau, M A

    2010-11-01

    The harmonisation of food composition databases (FCDB) has been a recognised need among users, producers and stakeholders of food composition data (FCD). To reach harmonisation of FCDBs among the national compiler partners, the European Food Information Resource (EuroFIR) Network of Excellence set up a series of guidelines and quality requirements, together with recommendations to implement quality management systems (QMS) in FCDBs. The Portuguese National Institute of Health (INSA) is the national FCDB compiler in Portugal and is also a EuroFIR partner. INSA's QMS complies with ISO/IEC (International Organization for Standardisation/International Electrotechnical Commission) 17025 requirements. The purpose of this work is to report on the strategy used and progress made for extending INSA's QMS to the Portuguese FCDB in alignment with EuroFIR guidelines. A stepwise approach was used to extend INSA's QMS to the Portuguese FCDB. The approach included selection of reference standards and guides and the collection of relevant quality documents directly or indirectly related to the compilation process; selection of the adequate quality requirements; assessment of adequacy and level of requirement implementation in the current INSA's QMS; implementation of the selected requirements; and EuroFIR's preassessment 'pilot' auditing. The strategy used to design and implement the extension of INSA's QMS to the Portuguese FCDB is reported in this paper. The QMS elements have been established by consensus. ISO/IEC 17025 management requirements (except 4.5) and 5.2 technical requirements, as well as all EuroFIR requirements (including technical guidelines, FCD compilation flowchart and standard operating procedures), have been selected for implementation. The results indicate that the quality management requirements of ISO/IEC 17025 in place in INSA fit the needs for document control, audits, contract review, non-conformity work and corrective actions, and users' (customers') comments, complaints and satisfaction, with minor adaptation. Implementation of the FCDB QMS proved to be a way of reducing the subjectivity of the compilation process and fully documenting it, and also facilitates training of new compilers. Furthermore, it has strengthened cooperation and trust among FCDB actors, as all of them were called to be involved in the process. On the basis of our practical results, we can conclude that ISO/IEC 17025 management requirements are an adequate reference for the implementation of INSA's FCDB QMS with the advantages of being well known to all members of staff and also being a common quality language among laboratories producing FCD. Combining quality systems and food composition activities endows the FCDB compilation process with flexibility, consistency and transparency, and facilitates its monitoring and assessment, providing the basis for strengthening confidence among users, data producers and compilers.

  6. Parallel processing and expert systems

    NASA Technical Reports Server (NTRS)

    Lau, Sonie; Yan, Jerry C.

    1991-01-01

    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.

  7. HAL/S-360 compiler test activity report

    NASA Technical Reports Server (NTRS)

    Helmers, C. T.

    1974-01-01

    The levels of testing employed in verifying the HAL/S-360 compiler were as follows: (1) typical applications program case testing; (2) functional testing of the compiler system and its generated code; and (3) machine oriented testing of compiler implementation on operational computers. Details of the initial test plan and subsequent adaptation are reported, along with complete test results for each phase which examined the production of object codes for every possible source statement.

  8. SCORE user`s manual

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, S.A.

    SABrE is a set of tools to facilitate the development of portable scientific software and to visualize scientific data. As with most constructs, SABRE has a foundation. In this case that foundation is SCORE. SCORE (SABRE CORE) has two main functions. The first and perhaps most important is to smooth over the differences between different C implementations and define the parameters which drive most of the conditional compilations in the rest of SABRE. Secondly, it contains several groups of functionality that are used extensively throughout SABRE. Although C is highly standardized now, that has not always been the case. Roughlymore » speaking C compilers fall into three categories: ANSI standard; derivative of the Portable C Compiler (Kernighan and Ritchie); and the rest. SABRE has been successfully ported to many ANSI and PCC systems. It has never been successfully ported to a system in the last category. The reason is mainly that the ``standard`` C library supplied with such implementations is so far from true ANSI or PCC standard that SABRE would have to include its own version of the standard C library in order to work at all. Even with standardized compilers life is not dead simple. The ANSI standard leaves several crucial points ambiguous as ``implementation defined.`` Under these conditions one can find significant differences in going from one ANSI standard compiler to another. SCORE`s job is to include the requisite standard headers and ensure that certain key standard library functions exist and function correctly (there are bugs in the standard library functions supplied with some compilers) so that, to applications which include the SCORE header(s) and load with SCORE, all C implementations look the same.« less

  9. SCORE user's manual

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, S.A.

    SABrE is a set of tools to facilitate the development of portable scientific software and to visualize scientific data. As with most constructs, SABRE has a foundation. In this case that foundation is SCORE. SCORE (SABRE CORE) has two main functions. The first and perhaps most important is to smooth over the differences between different C implementations and define the parameters which drive most of the conditional compilations in the rest of SABRE. Secondly, it contains several groups of functionality that are used extensively throughout SABRE. Although C is highly standardized now, that has not always been the case. Roughlymore » speaking C compilers fall into three categories: ANSI standard; derivative of the Portable C Compiler (Kernighan and Ritchie); and the rest. SABRE has been successfully ported to many ANSI and PCC systems. It has never been successfully ported to a system in the last category. The reason is mainly that the standard'' C library supplied with such implementations is so far from true ANSI or PCC standard that SABRE would have to include its own version of the standard C library in order to work at all. Even with standardized compilers life is not dead simple. The ANSI standard leaves several crucial points ambiguous as implementation defined.'' Under these conditions one can find significant differences in going from one ANSI standard compiler to another. SCORE's job is to include the requisite standard headers and ensure that certain key standard library functions exist and function correctly (there are bugs in the standard library functions supplied with some compilers) so that, to applications which include the SCORE header(s) and load with SCORE, all C implementations look the same.« less

  10. Optimization of the coherence function estimation for multi-core central processing unit

    NASA Astrophysics Data System (ADS)

    Cheremnov, A. G.; Faerman, V. A.; Avramchuk, V. S.

    2017-02-01

    The paper considers use of parallel processing on multi-core central processing unit for optimization of the coherence function evaluation arising in digital signal processing. Coherence function along with other methods of spectral analysis is commonly used for vibration diagnosis of rotating machinery and its particular nodes. An algorithm is given for the function evaluation for signals represented with digital samples. The algorithm is analyzed for its software implementation and computational problems. Optimization measures are described, including algorithmic, architecture and compiler optimization, their results are assessed for multi-core processors from different manufacturers. Thus, speeding-up of the parallel execution with respect to sequential execution was studied and results are presented for Intel Core i7-4720HQ и AMD FX-9590 processors. The results show comparatively high efficiency of the optimization measures taken. In particular, acceleration indicators and average CPU utilization have been significantly improved, showing high degree of parallelism of the constructed calculating functions. The developed software underwent state registration and will be used as a part of a software and hardware solution for rotating machinery fault diagnosis and pipeline leak location with acoustic correlation method.

  11. C to VHDL compiler

    NASA Astrophysics Data System (ADS)

    Berdychowski, Piotr P.; Zabolotny, Wojciech M.

    2010-09-01

    The main goal of C to VHDL compiler project is to make FPGA platform more accessible for scientists and software developers. FPGA platform offers unique ability to configure the hardware to implement virtually any dedicated architecture, and modern devices provide sufficient number of hardware resources to implement parallel execution platforms with complex processing units. All this makes the FPGA platform very attractive for those looking for efficient heterogeneous, computing environment. Current industry standard in development of digital systems on FPGA platform is based on HDLs. Although very effective and expressive in hands of hardware development specialists, these languages require specific knowledge and experience, unreachable for most scientists and software programmers. C to VHDL compiler project attempts to remedy that by creating an application, that derives initial VHDL description of a digital system (for further compilation and synthesis), from purely algorithmic description in C programming language. This idea itself is not new, and the C to VHDL compiler combines the best approaches from existing solutions developed over many previous years, with the introduction of some new unique improvements.

  12. Modular implementation of a digital hardware design automation system

    NASA Astrophysics Data System (ADS)

    Masud, M.

    An automation system based on AHPL (A Hardware Programming Language) was developed. The project may be divided into three distinct phases: (1) Upgrading of AHPL to make it more universally applicable; (2) Implementation of a compiler for the language; and (3) illustration of how the compiler may be used to support several phases of design activities. Several new features were added to AHPL. These include: application-dependent parameters, mutliple clocks, asynchronous results, functional registers and primitive functions. The new language, called Universal AHPL, has been defined rigorously. The compiler design is modular. The parsing is done by an automatic parser generated from the SLR(1)BNF grammar of the language. The compiler produces two data bases from the AHPL description of a circuit. The first one is a tabular representation of the circuit, and the second one is a detailed interconnection linked list. The two data bases provide a means to interface the compiler to application-dependent CAD systems.

  13. Automatic Compilation from High-Level Biologically-Oriented Programming Language to Genetic Regulatory Networks

    PubMed Central

    Beal, Jacob; Lu, Ting; Weiss, Ron

    2011-01-01

    Background The field of synthetic biology promises to revolutionize our ability to engineer biological systems, providing important benefits for a variety of applications. Recent advances in DNA synthesis and automated DNA assembly technologies suggest that it is now possible to construct synthetic systems of significant complexity. However, while a variety of novel genetic devices and small engineered gene networks have been successfully demonstrated, the regulatory complexity of synthetic systems that have been reported recently has somewhat plateaued due to a variety of factors, including the complexity of biology itself and the lag in our ability to design and optimize sophisticated biological circuitry. Methodology/Principal Findings To address the gap between DNA synthesis and circuit design capabilities, we present a platform that enables synthetic biologists to express desired behavior using a convenient high-level biologically-oriented programming language, Proto. The high level specification is compiled, using a regulatory motif based mechanism, to a gene network, optimized, and then converted to a computational simulation for numerical verification. Through several example programs we illustrate the automated process of biological system design with our platform, and show that our compiler optimizations can yield significant reductions in the number of genes () and latency of the optimized engineered gene networks. Conclusions/Significance Our platform provides a convenient and accessible tool for the automated design of sophisticated synthetic biological systems, bridging an important gap between DNA synthesis and circuit design capabilities. Our platform is user-friendly and features biologically relevant compiler optimizations, providing an important foundation for the development of sophisticated biological systems. PMID:21850228

  14. Automatic compilation from high-level biologically-oriented programming language to genetic regulatory networks.

    PubMed

    Beal, Jacob; Lu, Ting; Weiss, Ron

    2011-01-01

    The field of synthetic biology promises to revolutionize our ability to engineer biological systems, providing important benefits for a variety of applications. Recent advances in DNA synthesis and automated DNA assembly technologies suggest that it is now possible to construct synthetic systems of significant complexity. However, while a variety of novel genetic devices and small engineered gene networks have been successfully demonstrated, the regulatory complexity of synthetic systems that have been reported recently has somewhat plateaued due to a variety of factors, including the complexity of biology itself and the lag in our ability to design and optimize sophisticated biological circuitry. To address the gap between DNA synthesis and circuit design capabilities, we present a platform that enables synthetic biologists to express desired behavior using a convenient high-level biologically-oriented programming language, Proto. The high level specification is compiled, using a regulatory motif based mechanism, to a gene network, optimized, and then converted to a computational simulation for numerical verification. Through several example programs we illustrate the automated process of biological system design with our platform, and show that our compiler optimizations can yield significant reductions in the number of genes (~ 50%) and latency of the optimized engineered gene networks. Our platform provides a convenient and accessible tool for the automated design of sophisticated synthetic biological systems, bridging an important gap between DNA synthesis and circuit design capabilities. Our platform is user-friendly and features biologically relevant compiler optimizations, providing an important foundation for the development of sophisticated biological systems.

  15. Co-arrays in the Next Fortran Standard

    DOE PAGES

    Reid, John; Numrich, Robert W.

    2007-01-01

    The WG5 committee, at its meeting in Delft, May 2005, decided to include co-arrays in the next Fortran Standard. A Fortran program containing co-arrays is interpreted as if it were replicated a fixed number of times and all copies were executed asynchronously. Each copy has its own set of data objects and is called an image. The array syntax of Fortran is extended with additional trailing subscripts in square brackets to give a clear and straightforward representation of access to data on other images. References without square brackets are to local data, so code that can run independently is uncluttered.more » Any occurrence of square brackets is a warning about communication between images. The additional syntax requires support in the compiler, but it has been designed to be easy to implement and to give the compiler scope both to apply its optimizations within each image and to optimize the communication between images. The extension includes execution control statements for synchronizing images and intrinsic procedures to return the number of images, to return the index of the current image, and to perform collective operations. The paper does not attempt to describe the full details of the feature as it now appears in the draft of the new standard. Instead, we describe a subset and demonstrate the use of this subset with examples.« less

  16. Domain Specific Language Support for Exascale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mellor-Crummey, John

    A multi-institutional project known as D-TEC (short for “Domain- specific Technology for Exascale Computing”) set out to explore technologies to support the construction of Domain Specific Languages (DSLs) to map application programs to exascale architectures. DSLs employ automated code transformation to shift the burden of delivering portable performance from application programmers to compilers. Two chief properties contribute: DSLs permit expression at a high level of abstraction so that a programmer’s intent is clear to a compiler and DSL implementations encapsulate human domain-specific optimization knowledge so that a compiler can be smart enough to achieve good results on specific hardware. Domainmore » specificity is what makes these properties possible in a programming language. If leveraging domain specificity is the key to keep exascale software tractable, a corollary is that many different DSLs will be needed to encompass the full range of exascale computing applications; moreover, a single application may well need to use several different DSLs in conjunction. As a result, developing a general toolkit for building domain-specific languages was a key goal for the D-TEC project. Different aspects of the D-TEC research portfolio were the focus of work at each of the partner institutions in the multi-institutional project. D-TEC research and development work at Rice University focused on on three principal topics: understanding how to automate the tuning of code for complex architectures, research and development of the Rosebud DSL engine, and compiler technology to support complex execution platforms. This report provides a summary of the research and development work on the D-TEC project at Rice University.« less

  17. SUMC/MPOS/HAL interface study

    NASA Technical Reports Server (NTRS)

    Saponaro, J. A.; Kosmala, A. L.

    1973-01-01

    The implementation of the HAL/S language on the IBM-360, and in particular the mechanization of its real time, I/O, and error control statements within the OS-360 environment is described. The objectives are twofold: (1) An analysis and general description of HAL/S real time, I/O, and error control statements and the structure required to mechanize these statements. The emphasis is on describing the logical functions performed upon execution of each HAL statement rather than defining whether it is accomplished by the compiler or operating system. (2) An identification of the OS-360 facilities required during execution of HAL/S code as implemented for the current HAL/S-360 compiler; and an evaluation of the aspects involved with interfacing HAL/S with the SUMC operating system utilizing either the HAL/S-360 compiler or by designing a new HAL/S-SUMC compiler.

  18. Optimizing Interactive Development of Data-Intensive Applications

    PubMed Central

    Interlandi, Matteo; Tetali, Sai Deep; Gulzar, Muhammad Ali; Noor, Joseph; Condie, Tyson; Kim, Miryung; Millstein, Todd

    2017-01-01

    Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. Vega is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage Vega to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications. PMID:28405637

  19. A comparison of native GPU computing versus OpenACC for implementing flow-routing algorithms in hydrological applications

    NASA Astrophysics Data System (ADS)

    Rueda, Antonio J.; Noguera, José M.; Luque, Adrián

    2016-02-01

    In recent years GPU computing has gained wide acceptance as a simple low-cost solution for speeding up computationally expensive processing in many scientific and engineering applications. However, in most cases accelerating a traditional CPU implementation for a GPU is a non-trivial task that requires a thorough refactorization of the code and specific optimizations that depend on the architecture of the device. OpenACC is a promising technology that aims at reducing the effort required to accelerate C/C++/Fortran code on an attached multicore device. Virtually with this technology the CPU code only has to be augmented with a few compiler directives to identify the areas to be accelerated and the way in which data has to be moved between the CPU and GPU. Its potential benefits are multiple: better code readability, less development time, lower risk of errors and less dependency on the underlying architecture and future evolution of the GPU technology. Our aim with this work is to evaluate the pros and cons of using OpenACC against native GPU implementations in computationally expensive hydrological applications, using the classic D8 algorithm of O'Callaghan and Mark for river network extraction as case-study. We implemented the flow accumulation step of this algorithm in CPU, using OpenACC and two different CUDA versions, comparing the length and complexity of the code and its performance with different datasets. We advance that although OpenACC can not match the performance of a CUDA optimized implementation (×3.5 slower in average), it provides a significant performance improvement against a CPU implementation (×2-6) with by far a simpler code and less implementation effort.

  20. Tiled architecture of a CNN-mostly IP system

    NASA Astrophysics Data System (ADS)

    Spaanenburg, Lambert; Malki, Suleyman

    2009-05-01

    Multi-core architectures have been popularized with the advent of the IBM CELL. On a finer grain the problems in scheduling multi-cores have already existed in the tiled architectures, such as the EPIC and Da Vinci. It is not easy to evaluate the performance of a schedule on such architecture as historical data are not available. One solution is to compile algorithms for which an optimal schedule is known by analysis. A typical example is an algorithm that is already defined in terms of many collaborating simple nodes, such as a Cellular Neural Network (CNN). A simple node with a local register stack together with a 'rotating wheel' internal communication mechanism has been proposed. Though the basic CNN allows for a tiled implementation of a tiled algorithm on a tiled structure, a practical CNN system will have to disturb this regularity by the additional need for arithmetical and logical operations. Arithmetic operations are needed for instance to accommodate for low-level image processing, while logical operations are needed to fork and merge different data streams without use of the external memory. It is found that the 'rotating wheel' internal communication mechanism still handles such mechanisms without the need for global control. Overall the CNN system provides for a practical network size as implemented on a FPGA, can be easily used as embedded IP and provides a clear benchmark for a multi-core compiler.

  1. Design Principles for Fragment Libraries: Maximizing the Value of Learnings from Pharma Fragment-Based Drug Discovery (FBDD) Programs for Use in Academia.

    PubMed

    Keserű, György M; Erlanson, Daniel A; Ferenczy, György G; Hann, Michael M; Murray, Christopher W; Pickett, Stephen D

    2016-09-22

    Fragment-based drug discovery (FBDD) is well suited for discovering both drug leads and chemical probes of protein function; it can cover broad swaths of chemical space and allows the use of creative chemistry. FBDD is widely implemented for lead discovery in industry but is sometimes used less systematically in academia. Design principles and implementation approaches for fragment libraries are continually evolving, and the lack of up-to-date guidance may prevent more effective application of FBDD in academia. This Perspective explores many of the theoretical, practical, and strategic considerations that occur within FBDD programs, including the optimal size, complexity, physicochemical profile, and shape profile of fragments in FBDD libraries, as well as compound storage, evaluation, and screening technologies. This compilation of industry experience in FBDD will hopefully be useful for those pursuing FBDD in academia.

  2. GPU accelerated implementation of NCI calculations using promolecular density.

    PubMed

    Rubez, Gaëtan; Etancelin, Jean-Matthieu; Vigouroux, Xavier; Krajecki, Michael; Boisson, Jean-Charles; Hénon, Eric

    2017-05-30

    The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand-protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual-GPU version leads to a 39-fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barbara Chapman

    OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close tomore » DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.« less

  4. A programmable optimization environment using the GAMESS-US and MERLIN/MCL packages. Applications on intermolecular interaction energies

    NASA Astrophysics Data System (ADS)

    Kalatzis, Fanis G.; Papageorgiou, Dimitrios G.; Demetropoulos, Ioannis N.

    2006-09-01

    The Merlin/MCL optimization environment and the GAMESS-US package were combined so as to offer an extended and efficient quantum chemistry optimization system, capable of implementing complex optimization strategies for generic molecular modeling problems. A communication and data exchange interface was established between the two packages exploiting all Merlin features such as multiple optimizers, box constraints, user extensions and a high level programming language. An important feature of the interface is its ability to perform dimer computations by eliminating the basis set superposition error using the counterpoise (CP) method of Boys and Bernardi. Furthermore it offers CP-corrected geometry optimizations using analytic derivatives. The unified optimization environment was applied to construct portions of the intermolecular potential energy surface of the weakly bound H-bonded complex C 6H 6-H 2O by utilizing the high level Merlin Control Language. The H-bonded dimer HF-H 2O was also studied by CP-corrected geometry optimization. The ab initio electronic structure energies were calculated using the 6-31G ** basis set at the Restricted Hartree-Fock and second-order Moller-Plesset levels, while all geometry optimizations were carried out using a quasi-Newton algorithm provided by Merlin. Program summaryTitle of program: MERGAM Catalogue identifier:ADYB_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYB_v1_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: The program is designed for machines running the UNIX operating system. It has been tested on the following architectures: IA32 (Linux with gcc/g77 v.3.2.3), AMD64 (Linux with the Portland group compilers v.6.0), SUN64 (SunOS 5.8 with the Sun Workshop compilers v.5.2) and SGI64 (IRIX 6.5 with the MIPSpro compilers v.7.4) Installations: University of Ioannina, Greece Operating systems or monitors under which the program has been tested: UNIX Programming language used: ANSI C, ANSI Fortran-77 No. of lines in distributed program, including test data, etc.:11 282 No. of bytes in distributed program, including test data, etc.: 49 458 Distribution format: tar.gz Memory required to execute with typical data: Memory requirements mainly depend on the selection of a GAMESS-US basis set and the number of atoms No. of bits in a word: 32 No. of processors used: 1 Has the code been vectorized or parallelized?: no Nature of physical problem: Multidimensional geometry optimization is of great importance in any ab initio calculation since it usually is one of the most CPU-intensive tasks, especially on large molecular systems. For example, the geometric and energetic description of van der Waals and weakly bound H-bonded complexes requires the construction of related important portions of the multidimensional intermolecular potential energy surface (IPES). So the various held views about the nature of these bonds can be quantitatively tested. Method of solution: The Merlin/MCL optimization environment was interconnected with the GAMESS-US package to facilitate geometry optimization in quantum chemistry problems. The important portions of the IPES require the capability to program optimization strategies. The Merlin/MCL environment was used for the implementation of such strategies. In this work, a CP-corrected geometry optimization was performed on the HF-H 2O complex and an MCL program was developed to study portions of the potential energy surface of the C 6H 6-H 2O complex. Restrictions on the complexity of the problem: The Merlin optimization environment and the GAMESS-US package must be installed. The MERGAM interface requires GAMESS-US input files that have been constructed in Cartesian coordinates. This restriction occurs from a design-time requirement to not allow reorientation of atomic coordinates; this rule holds always true when applying the COORD = UNIQUE keyword in a GAMESS-US input file. Typical running time: It depends on the size of the molecular system, the size of the basis set and the method of electron correlation. Execution of the test run took approximately 5 min on a 2.8 GHz Intel Pentium CPU.

  5. Recent advances in PC-Linux systems for electronic structure computations by optimized compilers and numerical libraries.

    PubMed

    Yu, Jen-Shiang K; Yu, Chin-Hui

    2002-01-01

    One of the most frequently used packages for electronic structure research, GAUSSIAN 98, is compiled on Linux systems with various hardware configurations, including AMD Athlon (with the "Thunderbird" core), AthlonMP, and AthlonXP (with the "Palomino" core) systems as well as the Intel Pentium 4 (with the "Willamette" core) machines. The default PGI FORTRAN compiler (pgf77) and the Intel FORTRAN compiler (ifc) are respectively employed with different architectural optimization options to compile GAUSSIAN 98 and test the performance improvement. In addition to the BLAS library included in revision A.11 of this package, the Automatically Tuned Linear Algebra Software (ATLAS) library is linked against the binary executables to improve the performance. Various Hartree-Fock, density-functional theories, and the MP2 calculations are done for benchmarking purposes. It is found that the combination of ifc with ATLAS library gives the best performance for GAUSSIAN 98 on all of these PC-Linux computers, including AMD and Intel CPUs. Even on AMD systems, the Intel FORTRAN compiler invariably produces binaries with better performance than pgf77. The enhancement provided by the ATLAS library is more significant for post-Hartree-Fock calculations. The performance on one single CPU is potentially as good as that on an Alpha 21264A workstation or an SGI supercomputer. The floating-point marks by SpecFP2000 have similar trends to the results of GAUSSIAN 98 package.

  6. Optimal simultaneous superpositioning of multiple structures with missing data

    PubMed Central

    Theobald, Douglas L.; Steindel, Phillip A.

    2012-01-01

    Motivation: Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually ‘missing’ from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether. Results: Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation–maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case. Availability and implementation: The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org. Contact: dtheobald@brandeis.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22543369

  7. OpenARC: Extensible OpenACC Compiler Framework for Directive-Based Accelerator Programming Study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Seyong; Vetter, Jeffrey S

    2014-01-01

    Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution to program emerging Scalable Heterogeneous Computing (SHC) platforms. However, the increased complexity in the SHC systems incurs several challenges in terms of portability and productivity. This paper presents an open-sourced OpenACC compiler, called OpenARC, which serves as an extensible research framework to address those issues in the directive-based accelerator programming. This paper explains important design strategies and key compiler transformation techniques needed to implement the reference OpenACC compiler. Moreover, this paper demonstrates the efficacy of OpenARC as a research framework for directive-based programming study, by proposing andmore » implementing OpenACC extensions in the OpenARC framework to 1) support hybrid programming of the unified memory and separate memory and 2) exploit architecture-specific features in an abstract manner. Porting thirteen standard OpenACC programs and three extended OpenACC programs to CUDA GPUs shows that OpenARC performs similarly to a commercial OpenACC compiler, while it serves as a high-level research framework.« less

  8. An integrated runtime and compile-time approach for parallelizing structured and block structured applications

    NASA Technical Reports Server (NTRS)

    Agrawal, Gagan; Sussman, Alan; Saltz, Joel

    1993-01-01

    Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). A combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion was described. A runtime library which can be used to port these applications on distributed memory machines was designed and implemented. The library is currently implemented on several different systems. To further ease the task of application programmers, methods were developed for integrating this runtime library with compilers for HPK-like parallel programming languages. How this runtime library was integrated with the Fortran 90D compiler being developed at Syracuse University is discussed. Experimental results to demonstrate the efficacy of our approach are presented. A multiblock Navier-Stokes solver template and a multigrid code were experimented with. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20 percent of the code parallelized by manually inserting calls to the runtime library.

  9. Compiler Optimization Pass Visualization: The Procedural Abstraction Case

    ERIC Educational Resources Information Center

    Schaeckeler, Stefan; Shang, Weijia; Davis, Ruth

    2009-01-01

    There is an active research community concentrating on visualizations of algorithms taught in CS1 and CS2 courses. These visualizations can help students to create concrete visual images of the algorithms and their underlying concepts. Not only "fundamental algorithms" can be visualized, but also algorithms used in compilers. Visualizations that…

  10. Design and implementation of online automatic judging system

    NASA Astrophysics Data System (ADS)

    Liang, Haohui; Chen, Chaojie; Zhong, Xiuyu; Chen, Yuefeng

    2017-06-01

    For lower efficiency and poorer reliability in programming training and competition by currently artificial judgment, design an Online Automatic Judging (referred to as OAJ) System. The OAJ system including the sandbox judging side and Web side, realizes functions of automatically compiling and running the tested codes, and generating evaluation scores and corresponding reports. To prevent malicious codes from damaging system, the OAJ system utilizes sandbox, ensuring the safety of the system. The OAJ system uses thread pools to achieve parallel test, and adopt database optimization mechanism, such as horizontal split table, to improve the system performance and resources utilization rate. The test results show that the system has high performance, high reliability, high stability and excellent extensibility.

  11. A Note on Compiling Fortran

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Busby, L. E.

    Fortran modules tend to serialize compilation of large Fortran projects, by introducing dependencies among the source files. If file A depends on file B, (A uses a module defined by B), you must finish compiling B before you can begin compiling A. Some Fortran compilers (Intel ifort, GNU gfortran and IBM xlf, at least) offer an option to ‘‘verify syntax’’, with the side effect of also producing any associated Fortran module files. As it happens, this option usually runs much faster than the object code generation and optimization phases. For some projects on some machines, it can be advantageous tomore » compile in two passes: The first pass generates the module files, quickly; the second pass produces the object files, in parallel. We achieve a 3.8× speedup in the case study below.« less

  12. Integrated Task and Data Parallel Programming

    NASA Technical Reports Server (NTRS)

    Grimshaw, A. S.

    1998-01-01

    This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.

  13. Integrated Task And Data Parallel Programming: Language Design

    NASA Technical Reports Server (NTRS)

    Grimshaw, Andrew S.; West, Emily A.

    1998-01-01

    his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated with Andrew Grimshaw and Adam Ferrari to write a book chapter which will be included in Parallel Processing in C++ edited by Gregory Wilson. I also finished two courses, Compilers and Advanced Compilers, in 1995. These courses complete my class requirements at the University of Virginia. I have only my dissertation research and defense to complete.

  14. Livermore Compiler Analysis Loop Suite

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hornung, R. D.

    2013-03-01

    LCALS is designed to evaluate compiler optimizations and performance of a variety of loop kernels and loop traversal software constructs. Some of the loop kernels are pulled directly from "Livermore Loops Coded in C", developed at LLNL (see item 11 below for details of earlier code versions). The older suites were used to evaluate floating-point performances of hardware platforms prior to porting larger application codes. The LCALS suite is geared toward assissing C++ compiler optimizations and platform performance related to SIMD vectorization, OpenMP threading, and advanced C++ language features. LCALS contains 20 of 24 loop kernels from the older Livermoremore » Loop suites, plus various others representative of loops found in current production appkication codes at LLNL. The latter loops emphasize more diverse loop constructs and data access patterns than the others, such as multi-dimensional difference stencils. The loops are included in a configurable framework, which allows control of compilation, loop sampling for execution timing, which loops are run and their lengths. It generates timing statistics for analysis and comparing variants of individual loops. Also, it is easy to add loops to the suite as desired.« less

  15. Runtime support and compilation methods for user-specified data distributions

    NASA Technical Reports Server (NTRS)

    Ponnusamy, Ravi; Saltz, Joel; Choudhury, Alok; Hwang, Yuan-Shin; Fox, Geoffrey

    1993-01-01

    This paper describes two new ideas by which an HPF compiler can deal with irregular computations effectively. The first mechanism invokes a user specified mapping procedure via a set of compiler directives. The directives allow use of program arrays to describe graph connectivity, spatial location of array elements, and computational load. The second mechanism is a simple conservative method that in many cases enables a compiler to recognize that it is possible to reuse previously computed information from inspectors (e.g. communication schedules, loop iteration partitions, information that associates off-processor data copies with on-processor buffer locations). We present performance results for these mechanisms from a Fortran 90D compiler implementation.

  16. Design and Implementation of a Basic Cross-Compiler and Virtual Memory Management System for the TI-59 Programmable Calculator.

    DTIC Science & Technology

    1983-06-01

    previously stated requirements to construct the framework for a software soluticn. It is during this phase of design that lany cf the most critical...the linker would have to be deferred until the compiler was formalized and ir the implementation phase of design. The second problem involved...memory liait was encountered. At this point a segmentation occurred. The memory limits were reset and the combining process continued until another

  17. Manycore Performance-Portability: Kokkos Multidimensional Array Library

    DOE PAGES

    Edwards, H. Carter; Sunderland, Daniel; Porter, Vicki; ...

    2012-01-01

    Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel executionmore » performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].« less

  18. NPDES CAFO Regulations Implementation Status Reports

    EPA Pesticide Factsheets

    EPA compiles annual summaries on the implementation status of the NPDES CAFO regulations. Reports include, for each state: total number of CAFOs, number and percentage of CAFOs with NPDES permits, and other information associated with implementation of the

  19. The Optimizing Patient Transfers, Impacting Medical Quality, andImproving Symptoms:Transforming Institutional Care approach: preliminary data from the implementation of a Centers for Medicare and Medicaid Services nursing facility demonstration project.

    PubMed

    Unroe, Kathleen T; Nazir, Arif; Holtz, Laura R; Maurer, Helen; Miller, Ellen; Hickman, Susan E; La Mantia, Michael A; Bennett, Merih; Arling, Greg; Sachs, Greg A

    2015-01-01

    The Optimizing Patient Transfers, Impacting Medical Quality, and Improving Symptoms: Transforming Institutional Care (OPTIMISTIC) project aims to reduce avoidable hospitalizations of long-stay residents enrolled in 19 central Indiana nursing facilities. This clinical demonstration project, funded by the Centers for Medicare and Medicaid Services Innovations Center, places a registered nurse in each nursing facility to implement an evidence-based quality improvement program with clinical support from nurse practitioners. A description of the model is presented, and early implementation experiences during the first year of the project are reported. Important elements include better medical care through implementation of Interventions to Reduce Acute Care Transfers tools and chronic care management, enhanced transitional care, and better palliative care with a focus on systematic advance care planning. There were 4,035 long-stay residents in 19 facilities enrolled in OPTIMISTIC between February 2013 and January 2014. Root-cause analyses were performed for all 910 acute transfers of these long stay residents. Of these transfers, the project RN evaluated 29% as avoidable (57% were not avoidable and 15% were missing), and opportunities for quality improvement were identified in 54% of transfers. Lessons learned in early implementation included defining new clinical roles, integrating into nursing facility culture, managing competing facility priorities, communicating with multiple stakeholders, and developing a system for collecting and managing data. The success of the overall initiative will be measured primarily according to reduction in avoidable hospitalizations of long-stay nursing facility residents. © 2014, Copyright the Authors Journal compilation © 2014, The American Geriatrics Society.

  20. Continued advancement of the programming language HAL to an operational status

    NASA Technical Reports Server (NTRS)

    1971-01-01

    The continued advancement of the programming language HAL to operational status is reported. It is demonstrated that the compiler itself can be written in HAL. A HAL-in-HAL experiment proves conclusively that HAL can be used successfully as a compiler implementation tool.

  1. Perspex machine: V. Compilation of C programs

    NASA Astrophysics Data System (ADS)

    Spanner, Matthew P.; Anderson, James A. D. W.

    2006-01-01

    The perspex machine arose from the unification of the Turing machine with projective geometry. The original, constructive proof used four special, perspective transformations to implement the Turing machine in projective geometry. These four transformations are now generalised and applied in a compiler, implemented in Pop11, that converts a subset of the C programming language into perspexes. This is interesting both from a geometrical and a computational point of view. Geometrically, it is interesting that program source can be converted automatically to a sequence of perspective transformations and conditional jumps, though we find that the product of homogeneous transformations with normalisation can be non-associative. Computationally, it is interesting that program source can be compiled for a Reduced Instruction Set Computer (RISC), the perspex machine, that is a Single Instruction, Zero Exception (SIZE) computer.

  2. High-Performance Design Patterns for Modern Fortran

    DOE PAGES

    Haveraaen, Magne; Morris, Karla; Rouson, Damian; ...

    2015-01-01

    This paper presents ideas for using coordinate-free numerics in modern Fortran to achieve code flexibility in the partial differential equation (PDE) domain. We also show how Fortran, over the last few decades, has changed to become a language well-suited for state-of-the-art software development. Fortran’s new coarray distributed data structure, the language’s class mechanism, and its side-effect-free, pure procedure capability provide the scaffolding on which we implement HPC software. These features empower compilers to organize parallel computations with efficient communication. We present some programming patterns that support asynchronous evaluation of expressions comprised of parallel operations on distributed data. We implemented thesemore » patterns using coarrays and the message passing interface (MPI). We compared the codes’ complexity and performance. The MPI code is much more complex and depends on external libraries. The MPI code on Cray hardware using the Cray compiler is 1.5–2 times faster than the coarray code on the same hardware. The Intel compiler implements coarrays atop Intel’s MPI library with the result apparently being 2–2.5 times slower than manually coded MPI despite exhibiting nearly linear scaling efficiency. As compilers mature and further improvements to coarrays comes in Fortran 2015, we expect this performance gap to narrow.« less

  3. Preliminary Design and Implementation of a Method for Validating Evolving ADA Compilers.

    DTIC Science & Technology

    1983-03-01

    Goodenough, John B. "The Ada Compiler Validation Capability," Computer. 14 (6): 57-64 (June 1981). 7. Pressman, Roger S. Software Engineering : A Practi...COMPILERS THESIS Presented to the faculty of the School of Engineering of the Air Force Institute of Technology Air University in Partial Fulfillment...support and encouragement they have given me. ii Contents Page 1. INTRODUCTION 1 1.1 Background -- DoDls Software Problem 1 1.1.1 The proliferation of

  4. Programming languages and compiler design for realistic quantum hardware.

    PubMed

    Chong, Frederic T; Franklin, Diana; Martonosi, Margaret

    2017-09-13

    Quantum computing sits at an important inflection point. For years, high-level algorithms for quantum computers have shown considerable promise, and recent advances in quantum device fabrication offer hope of utility. A gap still exists, however, between the hardware size and reliability requirements of quantum computing algorithms and the physical machines foreseen within the next ten years. To bridge this gap, quantum computers require appropriate software to translate and optimize applications (toolflows) and abstraction layers. Given the stringent resource constraints in quantum computing, information passed between layers of software and implementations will differ markedly from in classical computing. Quantum toolflows must expose more physical details between layers, so the challenge is to find abstractions that expose key details while hiding enough complexity.

  5. Ada Compiler Validation Summary Report: Certificate Number: 910626S1. 11179 U.S. Navy Ada/M, Version 4.0 (OPTIMIZE) VAX 11/785 = AN/AYK-14 (Bare Board).

    DTIC Science & Technology

    1991-07-30

    AN/AYK-14 (Bare Board)(Target), 910626S1 .1 1179 6 AUTHOR(S) National Institute of Standards and Technology Gaithersburg, MD USA 7 PERFORMING...Capability (ACVC). Tlis Validation Summary Report ( VSR ) gives an account of the testing of this Ada implementation. For any technical terms used in this...EE3203A EE3204A CE3207A CE3208A CE3301A EE3301B CE3302A CE3304A CE3%05A CE3401A CE3402A EE3402B CE3402C. .D (2) CE3403A. .C (3) CE3403E. - 7 (2) CE3404B

  6. Ada Compiler Validation Summary Report: Certificate Number: 910626S1. 11175 U.S. Navy Ada/M, Version 4.0 (/Optimize), VAX 8550 (Host) to AN/AYK-14 (Bare Board) (Target).

    DTIC Science & Technology

    1991-07-30

    cs A, 7 ,M ;instr’(clsr,A,M); I cm A,Y,M instr𔃻 cm ,A,M,Y); cmi A,M4 instr’(cmi,A,M); cmk A,Y,M instr’(cmk,A,M,Y); cm : A,M1 i;nstr’(cmr,A,M); cnt A...to AN ,AYK-14 (Bare Board)(Target), 910626S 1,11175 6 AU7HORISi National Institute of Standards and Technology Gaithersburg, MD UA 7 PER FORM ;NG...Validation Summary Report ( VSR ) gives an account of the testing of this Ada implementation. For any technical terms used in this -ort, the reader is

  7. Programming languages and compiler design for realistic quantum hardware

    NASA Astrophysics Data System (ADS)

    Chong, Frederic T.; Franklin, Diana; Martonosi, Margaret

    2017-09-01

    Quantum computing sits at an important inflection point. For years, high-level algorithms for quantum computers have shown considerable promise, and recent advances in quantum device fabrication offer hope of utility. A gap still exists, however, between the hardware size and reliability requirements of quantum computing algorithms and the physical machines foreseen within the next ten years. To bridge this gap, quantum computers require appropriate software to translate and optimize applications (toolflows) and abstraction layers. Given the stringent resource constraints in quantum computing, information passed between layers of software and implementations will differ markedly from in classical computing. Quantum toolflows must expose more physical details between layers, so the challenge is to find abstractions that expose key details while hiding enough complexity.

  8. Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

    DOE PAGES

    Basu, Protonu; Williams, Samuel; Van Straalen, Brian; ...

    2017-04-05

    GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less

  9. Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Basu, Protonu; Williams, Samuel; Van Straalen, Brian

    GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less

  10. Software Issues at the User Interface

    DTIC Science & Technology

    1991-05-01

    successful integration of parallel computers into mainstream scientific computing. Clearly a compiler is the most important software tool available to a...Computer Science University of Colorado Boulder, CO 80309 ABSTRACT We review software issues that are critical to the successful integration of parallel...The development of an optimizing compiler of this quality, addressing communicaton instructions as well as computational instructions is a major

  11. Praxis language reference manual

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Walker, J.H.

    1981-01-01

    This document is a language reference manual for the programming language Praxis. The document contains the specifications that must be met by any compiler for the language. The Praxis language was designed for systems programming in real-time process applications. Goals for the language and its implementations are: (1) highly efficient code generated by the compiler; (2) program portability; (3) completeness, that is, all programming requirements can be met by the language without needing an assembler; and (4) separate compilation to aid in design and management of large systems. The language does not provide any facilities for input/output, stack and queuemore » handling, string operations, parallel processing, or coroutine processing. These features can be implemented as routines in the language, using machine-dependent code to take advantage of facilities in the control environment on different machines.« less

  12. Power-Aware Compiler Controllable Chip Multiprocessor

    NASA Astrophysics Data System (ADS)

    Shikano, Hiroaki; Shirako, Jun; Wada, Yasutaka; Kimura, Keiji; Kasahara, Hironori

    A power-aware compiler controllable chip multiprocessor (CMP) is presented and its performance and power consumption are evaluated with the optimally scheduled advanced multiprocessor (OSCAR) parallelizing compiler. The CMP is equipped with power control registers that change clock frequency and power supply voltage to functional units including processor cores, memories, and an interconnection network. The OSCAR compiler carries out coarse-grain task parallelization of programs and reduces power consumption using architectural power control support and the compiler's power saving scheme. The performance evaluation shows that MPEG-2 encoding on the proposed CMP with four CPUs results in 82.6% power reduction in real-time execution mode with a deadline constraint on its sequential execution time. Furthermore, MP3 encoding on a heterogeneous CMP with four CPUs and four accelerators results in 53.9% power reduction at 21.1-fold speed-up in performance against its sequential execution in the fastest execution mode.

  13. On algorithmic optimization of histogramming functions for GEM systems

    NASA Astrophysics Data System (ADS)

    Krawczyk, Rafał D.; Czarski, Tomasz; Kolasinski, Piotr; Poźniak, Krzysztof T.; Linczuk, Maciej; Byszuk, Adrian; Chernyshova, Maryna; Juszczyk, Bartlomiej; Kasprowicz, Grzegorz; Wojenski, Andrzej; Zabolotny, Wojciech

    2015-09-01

    This article concerns optimization methods for data analysis for the X-ray GEM detector system. The offline analysis of collected samples was optimized for MATLAB computations. Compiled functions in C language were used with MEX library. Significant speedup was received for both ordering-preprocessing and for histogramming of samples. Utilized techniques with obtained results are presented.

  14. Optimizing Instruction Scheduling and Register Allocation for Register-File-Connected Clustered VLIW Architectures

    PubMed Central

    Tang, Haijing; Wang, Siye; Zhang, Yanjun

    2013-01-01

    Clustering has become a common trend in very long instruction words (VLIW) architecture to solve the problem of area, energy consumption, and design complexity. Register-file-connected clustered (RFCC) VLIW architecture uses the mechanism of global register file to accomplish the inter-cluster data communications, thus eliminating the performance and energy consumption penalty caused by explicit inter-cluster data move operations in traditional bus-connected clustered (BCC) VLIW architecture. However, the limit number of access ports to the global register file has become an issue which must be well addressed; otherwise the performance and energy consumption would be harmed. In this paper, we presented compiler optimization techniques for an RFCC VLIW architecture called Lily, which is designed for encryption systems. These techniques aim at optimizing performance and energy consumption for Lily architecture, through appropriate manipulation of the code generation process to maintain a better management of the accesses to the global register file. All the techniques have been implemented and evaluated. The result shows that our techniques can significantly reduce the penalty of performance and energy consumption due to access port limitation of global register file. PMID:23970841

  15. The Design of a Templated C++ Small Vector Class for Numerical Computing

    NASA Technical Reports Server (NTRS)

    Moran, Patrick J.

    2000-01-01

    We describe the design and implementation of a templated C++ class for vectors. The vector class is templated both for vector length and vector component type; the vector length is fixed at template instantiation time. The vector implementation is such that for a vector of N components of type T, the total number of bytes required by the vector is equal to N * size of (T), where size of is the built-in C operator. The property of having a size no bigger than that required by the components themselves is key in many numerical computing applications, where one may allocate very large arrays of small, fixed-length vectors. In addition to the design trade-offs motivating our fixed-length vector design choice, we review some of the C++ template features essential to an efficient, succinct implementation. In particular, we highlight some of the standard C++ features, such as partial template specialization, that are not supported by all compilers currently. This report provides an inventory listing the relevant support currently provided by some key compilers, as well as test code one can use to verify compiler capabilities.

  16. An Open Source modular platform for hydrological model implementation

    NASA Astrophysics Data System (ADS)

    Kolberg, Sjur; Bruland, Oddbjørn

    2010-05-01

    An implementation framework for setup and evaluation of spatio-temporal models is developed, forming a highly modularized distributed model system. The ENKI framework allows building space-time models for hydrological or other environmental purposes, from a suite of separately compiled subroutine modules. The approach makes it easy for students, researchers and other model developers to implement, exchange, and test single routines in a fixed framework. The open-source license and modular design of ENKI will also facilitate rapid dissemination of new methods to institutions engaged in operational hydropower forecasting or other water resource management. Written in C++, ENKI uses a plug-in structure to build a complete model from separately compiled subroutine implementations. These modules contain very little code apart from the core process simulation, and are compiled as dynamic-link libraries (dll). A narrow interface allows the main executable to recognise the number and type of the different variables in each routine. The framework then exposes these variables to the user within the proper context, ensuring that time series exist for input variables, initialisation for states, GIS data sets for static map data, manually or automatically calibrated values for parameters etc. ENKI is designed to meet three different levels of involvement in model construction: • Model application: Running and evaluating a given model. Regional calibration against arbitrary data using a rich suite of objective functions, including likelihood and Bayesian estimation. Uncertainty analysis directed towards input or parameter uncertainty. o Need not: Know the model's composition of subroutines, or the internal variables in the model, or the creation of method modules. • Model analysis: Link together different process methods, including parallel setup of alternative methods for solving the same task. Investigate the effect of different spatial discretization schemes. o Need not: Write or compile computer code, handle file IO for each modules, • Routine implementation and testing. Implementation of new process-simulating methods/equations, specialised objective functions or quality control routines, testing of these in an existing framework. o Need not: Implement user or model interface for the new routine, IO handling, administration of model setup and run, calibration and validation routines etc. From being developed for Norway's largest hydropower producer Statkraft, ENKI is now being turned into an Open Source project. At the time of writing, the licence and the project administration is not established. Also, it remains to port the application to other compilers and computer platforms. However, we hope that ENKI will prove useful for both academic and operational users.

  17. Secure and Resilient Functional Modeling for Navy Cyber-Physical Systems

    DTIC Science & Technology

    2017-05-24

    Functional Modeling Compiler (SCCT) FM Compiler and Key Performance Indicators (KPI) May 2018 Pending. Model Management Backbone (SCCT) MMB Demonstration...implement the agent- based distributed runtime. - KPIs for single/multicore controllers and temporal/spatial domains. - Integration of the model management ...Distributed Runtime (UCI) Not started. Model Management Backbone (SCCT) Not started. Siemens Corporation Corporate Technology Unrestricted

  18. Ada Compiler Validation Summary Report. Certificate Number: 920918S1. 11275 U.S. Navy Ada/M, Version 4.5 (/NO OPTIMIZE) VAX 8550/8600/8650 (Cluster) = VHSIC Processor Module (VPM) AN/AYK-14 (Bare Board)

    DTIC Science & Technology

    1992-10-27

    REPORT 1lr.I IMrF:MTATION PAGE OrM ft 00401 Hocq~i AD-A 265 4 3 7 : 6o tM0*lo i ue oWoo-fwva"o o "t "VoMaag ion 4LaVils HW~aiy. S, UAl 1204, k*Vinto...Porcessor Module (VPM) AN/AYK-14 (Bare Board) (target), 920918S1.11275 6. AUTHOR(S) National Institute of Standards and Technology Gaithersburg, MD USA 7 ...Summary Report ( VSR ) gives an account of the testing of this Ada implementation. For any technical terms used in this report, the reader is referred

  19. Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres

    In this paper, we have developed a novel methodology that takes into consideration multithreaded many-core designs to better utilize memory/processing resources and improve memory residence on tileable applications. It takes advantage of polyhedral analysis and transformation in the form of PLUTO, combined with a highly optimized finegrain tile runtime to exploit parallelism at all levels. The main contributions of this paper include the introduction of multi-hierarchical tiling techniques that increases intra tile parallelism; and a data-flow inspired runtime library that allows the expression of parallel tiles with an efficient synchronization registry. Our current implementation shows performance improvements on an Intelmore » Xeon Phi board up to 32.25% against instances produced by state-of-the-art compiler frameworks for selected stencil applications.« less

  20. Extension of the AMBER molecular dynamics software to Intel's Many Integrated Core (MIC) architecture

    NASA Astrophysics Data System (ADS)

    Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.

    2016-04-01

    We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.

  1. DyNAMiC Workbench: an integrated development environment for dynamic DNA nanotechnology

    PubMed Central

    Grun, Casey; Werfel, Justin; Zhang, David Yu; Yin, Peng

    2015-01-01

    Dynamic DNA nanotechnology provides a promising avenue for implementing sophisticated assembly processes, mechanical behaviours, sensing and computation at the nanoscale. However, design of these systems is complex and error-prone, because the need to control the kinetic pathway of a system greatly increases the number of design constraints and possible failure modes for the system. Previous tools have automated some parts of the design workflow, but an integrated solution is lacking. Here, we present software implementing a three ‘tier’ design process: a high-level visual programming language is used to describe systems, a molecular compiler builds a DNA implementation and nucleotide sequences are generated and optimized. Additionally, our software includes tools for analysing and ‘debugging’ the designs in silico, and for importing/exporting designs to other commonly used software systems. The software we present is built on many existing pieces of software, but is integrated into a single package—accessible using a Web-based interface at http://molecular-systems.net/workbench. We hope that the deep integration between tools and the flexibility of this design process will lead to better experimental results, fewer experimental design iterations and the development of more complex DNA nanosystems. PMID:26423437

  2. High assurance SPIRAL

    NASA Astrophysics Data System (ADS)

    Franchetti, Franz; Sandryhaila, Aliaksei; Johnson, Jeremy R.

    2014-06-01

    In this paper we introduce High Assurance SPIRAL to solve the last mile problem for the synthesis of high assurance implementations of controllers for vehicular systems that are executed in today's and future embedded and high performance embedded system processors. High Assurance SPIRAL is a scalable methodology to translate a high level specification of a high assurance controller into a highly resource-efficient, platform-adapted, verified control software implementation for a given platform in a language like C or C++. High Assurance SPIRAL proves that the implementation is equivalent to the specification written in the control engineer's domain language. Our approach scales to problems involving floating-point calculations and provides highly optimized synthesized code. It is possible to estimate the available headroom to enable assurance/performance trade-offs under real-time constraints, and enables the synthesis of multiple implementation variants to make attacks harder. At the core of High Assurance SPIRAL is the Hybrid Control Operator Language (HCOL) that leverages advanced mathematical constructs expressing the controller specification to provide high quality translation capabilities. Combined with a verified/certified compiler, High Assurance SPIRAL provides a comprehensive complete solution to the efficient synthesis of verifiable high assurance controllers. We demonstrate High Assurance SPIRALs capability by co-synthesizing proofs and implementations for attack detection and sensor spoofing algorithms and deploy the code as ROS nodes on the Landshark unmanned ground vehicle and on a Synthetic Car in a real-time simulator.

  3. The paradigm compiler: Mapping a functional language for the connection machine

    NASA Technical Reports Server (NTRS)

    Dennis, Jack B.

    1989-01-01

    The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed.

  4. Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers

    NASA Technical Reports Server (NTRS)

    Su, Ernesto; Lain, Antonio; Ramaswamy, Shankar; Palermo, Daniel J.; Hodges, Eugene W., IV; Banerjee, Prithviraj

    1995-01-01

    The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. .A previous implementation of the compiler based on the PTD representation allowed symbolic array sizes, affine loop bounds and array subscripts, and variable number of processors, provided that arrays were single or multi-dimensionally block distributed. The techniques presented here extend the compiler to also accept multidimensional cyclic and block-cyclic distributions within a uniform symbolic framework. These extensions demand more sophisticated symbolic manipulation capabilities. A novel aspect of our approach is to meet this demand by interfacing PARADIGM with a powerful off-the-shelf symbolic package, Mathematica. This paper describes some of the Mathematica routines that performs various transformations, shows how they are invoked and used by the compiler to overcome the new challenges, and presents experimental results for code involving cyclic and block-cyclic arrays as evidence of the feasibility of the approach.

  5. SAN JUAN BAY ESTUARY PROGRAM IMPLEMENTATION REVIEW ATTACHMENTS

    EPA Science Inventory

    A compilation of attachments referenced in the San Juan Bay Estuary Program Implementation Review (2004). Materials include, entity reports, water and sediment quality action plans, progress reports, correspondence with local municipalities and Puerto Rican governmental agencies,...

  6. Convis: A Toolbox to Fit and Simulate Filter-Based Models of Early Visual Processing

    PubMed Central

    Huth, Jacob; Masquelier, Timothée; Arleo, Angelo

    2018-01-01

    We developed Convis, a Python simulation toolbox for large scale neural populations which offers arbitrary receptive fields by 3D convolutions executed on a graphics card. The resulting software proves to be flexible and easily extensible in Python, while building on the PyTorch library (The Pytorch Project, 2017), which was previously used successfully in deep learning applications, for just-in-time optimization and compilation of the model onto CPU or GPU architectures. An alternative implementation based on Theano (Theano Development Team, 2016) is also available, although not fully supported. Through automatic differentiation, any parameter of a specified model can be optimized to approach a desired output which is a significant improvement over e.g., Monte Carlo or particle optimizations without gradients. We show that a number of models including even complex non-linearities such as contrast gain control and spiking mechanisms can be implemented easily. We show in this paper that we can in particular recreate the simulation results of a popular retina simulation software VirtualRetina (Wohrer and Kornprobst, 2009), with the added benefit of providing (1) arbitrary linear filters instead of the product of Gaussian and exponential filters and (2) optimization routines utilizing the gradients of the model. We demonstrate the utility of 3d convolution filters with a simple direction selective filter. Also we show that it is possible to optimize the input for a certain goal, rather than the parameters, which can aid the design of experiments as well as closed-loop online stimulus generation. Yet, Convis is more than a retina simulator. For instance it can also predict the response of V1 orientation selective cells. Convis is open source under the GPL-3.0 license and available from https://github.com/jahuth/convis/ with documentation at https://jahuth.github.io/convis/. PMID:29563867

  7. Ada compiler validation summary report. Certificate number: 891116W1. 10191. Intel Corporation, IPSC/2 Ada, Release 1. 1, IPSC/2 parallel supercomputer, system resource manager host and IPSC/2 parallel supercomputer, CX-1 nodes target

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1989-11-16

    This VSR documents the results of the validation testing performed on an Ada compiler. Testing was carried out for the following purposes: To attempt to identify any language constructs supported by the compiler that do not conform to the Ada Standard; To attempt to identify any language constructs not supported by the compiler but required by the Ada Standard; and To determine that the implementation-dependent behavior is allowed by the Ada Standard. Testing of this compiler was conducted by SofTech, Inc. under the direction of he AVF according to procedures established by the Ada Joint Program Office and administered bymore » the Ada Validation Organization (AVO). On-side testing was completed 16 November 1989 at Aloha OR.« less

  8. A translator writing system for microcomputer high-level languages and assemblers

    NASA Technical Reports Server (NTRS)

    Collins, W. R.; Knight, J. C.; Noonan, R. E.

    1980-01-01

    In order to implement high level languages whenever possible, a translator writing system of advanced design was developed. It is intended for routine production use by many programmers working on different projects. As well as a fairly conventional parser generator, it includes a system for the rapid generation of table driven code generators. The parser generator was developed from a prototype version. The translator writing system includes various tools for the management of the source text of a compiler under construction. In addition, it supplies various default source code sections so that its output is always compilable and executable. The system thereby encourages iterative enhancement as a development methodology by ensuring an executable program from the earliest stages of a compiler development project. The translator writing system includes PASCAL/48 compiler, three assemblers, and two compilers for a subset of HAL/S.

  9. Architecture Adaptive Computing Environment

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    2006-01-01

    Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.

  10. Compiler-Driven Performance Optimization and Tuning for Multicore Architectures

    DTIC Science & Technology

    2015-04-10

    develop a powerful system for auto-tuning of library routines and compute-intensive kernels, driven by the Pluto system for multicores that we are...kernels, driven by the Pluto system for multicores that we are developing. The work here is motivated by recent advances in two major areas of...automatic C-to-CUDA code generator using a polyhedral compiler transformation framework. We have used and adapted PLUTO (our state-of-the-art tool

  11. The scope of additive manufacturing in cryogenics, component design, and applications

    NASA Astrophysics Data System (ADS)

    Stautner, W.; Vanapalli, S.; Weiss, K.-P.; Chen, R.; Amm, K.; Budesheim, E.; Ricci, J.

    2017-12-01

    Additive manufacturing techniques using composites or metals are rapidly gaining momentum in cryogenic applications. Small or large, complex structural components are now no longer limited to mere design studies but can now move into the production stream thanks to new machines on the market that allow for light-weight, cost optimized designs with short turnaround times. The potential for cost reductions from bulk materials machined to tight tolerances has become obvious. Furthermore, additive manufacturing opens doors and design space for cryogenic components that to date did not exist or were not possible in the past, using bulk materials along with elaborate and expensive machining processes, e.g. micromachining. The cryogenic engineer now faces the challenge to design toward those new additive manufacturing capabilities. Additionally, re-thinking designs toward cost optimization and fast implementation also requires detailed knowledge of mechanical and thermal properties at cryogenic temperatures. In the following we compile the information available to date and show a possible roadmap for additive manufacturing applications of parts and components typically used in cryogenic engineering designs.

  12. Spacelab user implementation assessment study. Volume 1: Concept development and evaluation

    NASA Technical Reports Server (NTRS)

    1975-01-01

    The total matrix of alternate Spacelab processing concepts and the rejection rationale utilized to reduce the matrix of 243 alternates to the final candidate processing concepts are developed. The work breakdown structure used for the systematic estimation and compilation of integration and checkout resources is presented along with descriptors of each element. Program models are provided of the space transportation system, the Spacelab, the orbiter, and the ATL that were used as the basis for the study trades, analyses, and optimizations. Resource requirements for all processing concepts are summarized along with the optimizations of the processing concepts. Concept evaluations including flight-rate sensitivities of the GSE, facilities, Spacelab hardware elements, and personnel are delineated. An analysis is presented of the applicability of the candidate concepts to potential spacelab users. The impact of the use of the western test range as an orbiter/spacelab launch site on the candidate processing concepts is evaluated. An assessment of the geographical co-location of experiment, Spacelab, and orbiter-cargo integration is included. Ownership options of the support module/system igloo are discussed.

  13. GPAW - massively parallel electronic structure calculations with Python-based software.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Enkovaara, J.; Romero, N.; Shende, S.

    2011-01-01

    Electronic structure calculations are a widely used tool in materials science and large consumer of supercomputing resources. Traditionally, the software packages for these kind of simulations have been implemented in compiled languages, where Fortran in its different versions has been the most popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most of the productivity enhancing features together with a good numerical performance. We have used thismore » approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges it is possible to obtain good numerical performance and good parallel scalability with Python based software.« less

  14. Ada (Trademark) Compiler Validation Summary Report: Certificate Number: 880714N1,09135, GEC Software Ltd, VADS Version 5.5, SUN 3/50 Workstation X GEC 4195 Minicomputer

    DTIC Science & Technology

    1988-07-15

    floating-point accuracy that exceeds the maximum of 15 digits supported by this implementation: C24113L..Y (14 tests) C35705L..Y (14 tests) C35706L...declarative part or package specification, or after a libary unit in a compilation, but before any subsequent compilation unit. When the first argument is a...INT constant :=2147483647; MAX- DIGITS :constant :~15; MAX-MANTISSA constant 31; FINE-DELTA constant :=2.0’*(-31); TICK :constant :=0.01; -- Other

  15. Parallel tiled Nussinov RNA folding loop nest generated using both dependence graph transitive closure and loop skewing.

    PubMed

    Palkowski, Marek; Bielecki, Wlodzimierz

    2017-06-02

    RNA secondary structure prediction is a compute intensive task that lies at the core of several search algorithms in bioinformatics. Fortunately, the RNA folding approaches, such as the Nussinov base pair maximization, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. Polyhedral compilation techniques have proven to be a powerful tool for optimization of dense array codes. However, classical affine loop nest transformations used with these techniques do not optimize effectively codes of dynamic programming of RNA structure predictions. The purpose of this paper is to present a novel approach allowing for generation of a parallel tiled Nussinov RNA loop nest exposing significantly higher performance than that of known related code. This effect is achieved due to improving code locality and calculation parallelization. In order to improve code locality, we apply our previously published technique of automatic loop nest tiling to all the three loops of the Nussinov loop nest. This approach first forms original rectangular 3D tiles and then corrects them to establish their validity by means of applying the transitive closure of a dependence graph. To produce parallel code, we apply the loop skewing technique to a tiled Nussinov loop nest. The technique is implemented as a part of the publicly available polyhedral source-to-source TRACO compiler. Generated code was run on modern Intel multi-core processors and coprocessors. We present the speed-up factor of generated Nussinov RNA parallel code and demonstrate that it is considerably faster than related codes in which only the two outer loops of the Nussinov loop nest are tiled.

  16. A ROSE-based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liao, C; Quinlan, D; Panas, T

    2010-01-25

    OpenMP is a popular and evolving programming model for shared-memory platforms. It relies on compilers for optimal performance and to target modern hardware architectures. A variety of extensible and robust research compilers are key to OpenMP's sustainable success in the future. In this paper, we present our efforts to build an OpenMP 3.0 research compiler for C, C++, and Fortran; using the ROSE source-to-source compiler framework. Our goal is to support OpenMP research for ourselves and others. We have extended ROSE's internal representation to handle all of the OpenMP 3.0 constructs and facilitate their manipulation. Since OpenMP research is oftenmore » complicated by the tight coupling of the compiler translations and the runtime system, we present a set of rules to define a common OpenMP runtime library (XOMP) on top of multiple runtime libraries. These rules additionally define how to build a set of translations targeting XOMP. Our work demonstrates how to reuse OpenMP translations across different runtime libraries. This work simplifies OpenMP research by decoupling the problematic dependence between the compiler translations and the runtime libraries. We present an evaluation of our work by demonstrating an analysis tool for OpenMP correctness. We also show how XOMP can be defined using both GOMP and Omni and present comparative performance results against other OpenMP compilers.« less

  17. Current status of the HAL/S compiler on the Modcomp classic 7870 computer

    NASA Technical Reports Server (NTRS)

    Lytle, P. J.

    1981-01-01

    A brief history of the HAL/S language, including the experience of other users of the language at the Jet Propulsion Laboratory is presented. The current status of the compiler, as implemented on the Modcomp 7870 Classi computer, and future applications in the Deep Space Network (DSN) are discussed. The primary applications in the DSN will be in the Mark IVA network.

  18. The Optimization of Automatically Generated Compilers.

    DTIC Science & Technology

    1987-01-01

    than their procedural counterparts, and are also easier to analyze for storage optimizations; (2) AGs can be algorithmically checked to be non-circular...Providing algorithms to move the storage for many attributes from the For structure tree into global stacks and variables. -Dd(2) Creating AEs which build and...54 3.5.2. Partitioning algorithm

  19. The Katydid system for compiling KEE applications to Ada

    NASA Technical Reports Server (NTRS)

    Filman, Robert E.; Bock, Conrad; Feldman, Roy

    1990-01-01

    Components of a system known as Katydid are developed in an effort to compile knowledge-based systems developed in a multimechanism integrated environment (KEE) to Ada. The Katydid core is an Ada library supporting KEE object functionality, and the other elements include a rule compiler, a LISP-to-Ada translator, and a knowledge-base dumper. Katydid employs translation mechanisms that convert LISP knowledge structures and rules to Ada and utilizes basic prototypes of a run-time KEE object-structure library module for Ada. Preliminary results include the semiautomatic compilation of portions of a simple expert system to run in an Ada environment with the described algorithms. It is suggested that Ada can be employed for AI programming and implementation, and the Katydid system is being developed to include concurrency and synchronization mechanisms.

  20. Compiling global name-space programs for distributed execution

    NASA Technical Reports Server (NTRS)

    Koelbel, Charles; Mehrotra, Piyush

    1990-01-01

    Distributed memory machines do not provide hardware support for a global address space. Thus programmers are forced to partition the data across the memories of the architecture and use explicit message passing to communicate data between processors. The compiler support required to allow programmers to express their algorithms using a global name-space is examined. A general method is presented for analysis of a high level source program and along with its translation to a set of independently executing tasks communicating via messages. If the compiler has enough information, this translation can be carried out at compile-time. Otherwise run-time code is generated to implement the required data movement. The analysis required in both situations is described and the performance of the generated code on the Intel iPSC/2 is presented.

  1. Optimal simultaneous superpositioning of multiple structures with missing data.

    PubMed

    Theobald, Douglas L; Steindel, Phillip A

    2012-08-01

    Superpositioning is an essential technique in structural biology that facilitates the comparison and analysis of conformational differences among topologically similar structures. Performing a superposition requires a one-to-one correspondence, or alignment, of the point sets in the different structures. However, in practice, some points are usually 'missing' from several structures, for example, when the alignment contains gaps. Current superposition methods deal with missing data simply by superpositioning a subset of points that are shared among all the structures. This practice is inefficient, as it ignores important data, and it fails to satisfy the common least-squares criterion. In the extreme, disregarding missing positions prohibits the calculation of a superposition altogether. Here, we present a general solution for determining an optimal superposition when some of the data are missing. We use the expectation-maximization algorithm, a classic statistical technique for dealing with incomplete data, to find both maximum-likelihood solutions and the optimal least-squares solution as a special case. The methods presented here are implemented in THESEUS 2.0, a program for superpositioning macromolecular structures. ANSI C source code and selected compiled binaries for various computing platforms are freely available under the GNU open source license from http://www.theseus3d.org. dtheobald@brandeis.edu Supplementary data are available at Bioinformatics online.

  2. An Ada programming support environment

    NASA Technical Reports Server (NTRS)

    Tyrrill, AL; Chan, A. David

    1986-01-01

    The toolset of an Ada Programming Support Environment (APSE) being developed at North American Aircraft Operations (NAAO) of Rockwell International, is described. The APSE is resident on three different hosts and must support developments for the hosts and for embedded targets. Tools and developed software must be freely portable between the hosts. The toolset includes the usual editors, compilers, linkers, debuggers, configuration magnagers, and documentation tools. Generally, these are being supplied by the host computer vendors. Other tools, for example, pretty printer, cross referencer, compilation order tool, and management tools were obtained from public-domain sources, are implemented in Ada and are being ported to the hosts. Several tools being implemented in-house are of interest, these include an Ada Design Language processor based on compilable Ada. A Standalone Test Environment Generator facilitates test tool construction and partially automates unit level testing. A Code Auditor/Static Analyzer permits the Ada programs to be evaluated against measures of quality. An Ada Comment Box Generator partially automates generation of header comment boxes.

  3. Implementing the EuroFIR Document and Data Repositories as accessible resources of food composition information.

    PubMed

    Unwin, Ian; Jansen-van der Vliet, Martine; Westenbrink, Susanne; Presser, Karl; Infanger, Esther; Porubska, Janka; Roe, Mark; Finglas, Paul

    2016-02-15

    The EuroFIR Document and Data Repositories are being developed as accessible collections of source documents, including grey literature, and the food composition data reported in them. These Repositories will contain source information available to food composition database compilers when selecting their nutritional data. The Document Repository was implemented as searchable bibliographic records in the Europe PubMed Central database, which links to the documents online. The Data Repository will contain original data from source documents in the Document Repository. Testing confirmed the FoodCASE food database management system as a suitable tool for the input, documentation and quality assessment of Data Repository information. Data management requirements for the input and documentation of reported analytical results were established, including record identification and method documentation specifications. Document access and data preparation using the Repositories will provide information resources for compilers, eliminating duplicated work and supporting unambiguous referencing of data contributing to their compiled data. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Approaching mathematical model of the immune network based DNA Strand Displacement system.

    PubMed

    Mardian, Rizki; Sekiyama, Kosuke; Fukuda, Toshio

    2013-12-01

    One biggest obstacle in molecular programming is that there is still no direct method to compile any existed mathematical model into biochemical reaction in order to solve a computational problem. In this paper, the implementation of DNA Strand Displacement system based on nature-inspired computation is observed. By using the Immune Network Theory and Chemical Reaction Network, the compilation of DNA-based operation is defined and the formulation of its mathematical model is derived. Furthermore, the implementation on this system is compared with the conventional implementation by using silicon-based programming. From the obtained results, we can see a positive correlation between both. One possible application from this DNA-based model is for a decision making scheme of intelligent computer or molecular robot. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  5. Tectonic evaluation of the Nubian shield of Northeastern Sudan using thematic mapper imagery

    NASA Technical Reports Server (NTRS)

    1986-01-01

    Bechtel is nearing completion of a one-year program that uses digitally enhanced LANDSAT Thematic Mapper (TM) data to compile the first comprehensive regional tectonic map of the Proterozoic Nubian Shield exposed in the northern Red Sea Hills of northeastern Sudan. The status of significant objectives of this study are given. Pertinent published and unpublished geologic literature and maps of the northern Red Sea Hills to establish the geologic framework of the region were reviewed. Thematic mapper imagery for optimal base-map enhancements was processed. Photo mosaics of enhanced images to serve as base maps for compilation of geologic information were completed. Interpretation of TM imagery to define and delineate structural and lithogologic provinces was completed. Geologic information (petrologic, and radiometric data) was compiled from the literature review onto base-map overlays. Evaluation of the tectonic evolution of the Nubian Shield based on the image interpretation and the compiled tectonic maps is continuing.

  6. JiTTree: A Just-in-Time Compiled Sparse GPU Volume Data Structure.

    PubMed

    Labschütz, Matthias; Bruckner, Stefan; Gröller, M Eduard; Hadwiger, Markus; Rautek, Peter

    2016-01-01

    Sparse volume data structures enable the efficient representation of large but sparse volumes in GPU memory for computation and visualization. However, the choice of a specific data structure for a given data set depends on several factors, such as the memory budget, the sparsity of the data, and data access patterns. In general, there is no single optimal sparse data structure, but a set of several candidates with individual strengths and drawbacks. One solution to this problem are hybrid data structures which locally adapt themselves to the sparsity. However, they typically suffer from increased traversal overhead which limits their utility in many applications. This paper presents JiTTree, a novel sparse hybrid volume data structure that uses just-in-time compilation to overcome these problems. By combining multiple sparse data structures and reducing traversal overhead we leverage their individual advantages. We demonstrate that hybrid data structures adapt well to a large range of data sets. They are especially superior to other sparse data structures for data sets that locally vary in sparsity. Possible optimization criteria are memory, performance and a combination thereof. Through just-in-time (JIT) compilation, JiTTree reduces the traversal overhead of the resulting optimal data structure. As a result, our hybrid volume data structure enables efficient computations on the GPU, while being superior in terms of memory usage when compared to non-hybrid data structures.

  7. Protocol Programmability

    DTIC Science & Technology

    2013-12-01

    First, any subproject that involved an implementation shared some implementation infrastructure with other subprojects. For example, the Plaid backend ...very same language. We followed this advice in Plaid, and we therefore implemented the compiler backend in Plaid (code generation, type checker, Æminim...programming language aimed at enforcing security properties in web and mobile applications [Nistor et al., 2013]. Wyvern therefore provides an excellent

  8. A Separate Compilation Extension to Standard ML (Revised and Expanded)

    DTIC Science & Technology

    2006-09-17

    repetition of interfaces. The language is given a formal semantics, and we argue that this semantics is implementable in a variety of compilers. This...material is based on work supported in part by the National Science Foundation under grant 0121633 Language Technology for Trustless Software...Dissemination and by the Defense Advanced Research Projects Agency under contracts F196268-95-C-0050 The Fox Project: Advanced Languages for Systems Software

  9. spammpack, Version 2013-06-18

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2014-01-17

    This library is an implementation of the Sparse Approximate Matrix Multiplication (SpAMM) algorithm introduced. It provides a matrix data type, and an approximate matrix product, which exhibits linear scaling computational complexity for matrices with decay. The product error and the performance of the multiply can be tuned by choosing an appropriate tolerance. The library can be compiled for serial execution or parallel execution on shared memory systems with an OpenMP capable compiler

  10. Optimal Jet Finder (v1.0 C++)

    NASA Astrophysics Data System (ADS)

    Chumakov, S.; Jankowski, E.; Tkachov, F. V.

    2006-10-01

    We describe a C++ implementation of the Optimal Jet Definition for identification of jets in hadronic final states of particle collisions. We explain interface subroutines and provide a usage example. The source code is available from http://www.inr.ac.ru/~ftkachov/projects/jets/. Program summaryTitle of program: Optimal Jet Finder (v1.0 C++) Catalogue identifier: ADSB_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSB_v2_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer: any computer with a standard C++ compiler Tested with:GNU gcc 3.4.2, Linux Fedora Core 3, Intel i686; Forte Developer 7 C++ 5.4, SunOS 5.9, UltraSPARC III+; Microsoft Visual C++ Toolkit 2003 (compiler 13.10.3077, linker 7.10.30777, option /EHsc), Windows XP, Intel i686. Programming language used: C++ Memory required:˜1 MB (or more, depending on the settings) No. of lines in distributed program, including test data, etc.: 3047 No. of bytes in distributed program, including test data, etc.: 17 884 Distribution format: tar.gz Nature of physical problem: Analysis of hadronic final states in high energy particle collision experiments often involves identification of hadronic jets. A large number of hadrons detected in the calorimeter is reduced to a few jets by means of a jet finding algorithm. The jets are used in further analysis which would be difficult or impossible when applied directly to the hadrons. Grigoriev et al. [D.Yu. Grigoriev, E. Jankowski, F.V. Tkachov, Phys. Rev. Lett. 91 (2003) 061801] provide brief introduction to the subject of jet finding algorithms and a general review of the physics of jets can be found in [R. Barlow, Rep. Prog. Phys. 36 (1993) 1067]. Method of solution: The software we provide is an implementation of the so-called Optimal Jet Definition (OJD). The theory of OJD was developed in [F.V. Tkachov, Phys. Rev. Lett. 73 (1994) 2405; Erratum, Phys. Rev. Lett. 74 (1995) 2618; F.V. Tkachov, Int. J. Modern Phys. A 12 (1997) 5411; F.V. Tkachov, Int. J. Modern Phys. A 17 (2002) 2783]. The desired jet configuration is obtained as the one that minimizes Ω, a certain function of the input particles and jet configuration. A FORTRAN 77 implementation of OJD is described in [D.Yu. Grigoriev, E. Jankowski, F.V. Tkachov, Comput. Phys. Comm. 155 (2003) 42]. Restrictions on the complexity of the program: Memory required by the program is proportional to the number of particles in the input × the number of jets in the output. For example, for 650 particles and 20 jets ˜300 KB memory is required. Typical running time: The running time (in the running mode with a fixed number of jets) is proportional to the number of particles in the input × the number of jets in the output × times the number of different random initial configurations tried ( ntries). For example, for 65 particles in the input and 4 jets in the output, the running time is ˜4ṡ10 s per try (Pentium 4 2.8 GHz).

  11. LLVM Infrastructure and Tools Project Summary

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCormick, Patrick Sean

    2017-11-06

    This project works with the open source LLVM Compiler Infrastructure (http://llvm.org) to provide tools and capabilities that address needs and challenges faced by ECP community (applications, libraries, and other components of the software stack). Our focus is on providing a more productive development environment that enables (i) improved compilation times and code generation for parallelism, (ii) additional features/capabilities within the design and implementations of LLVM components for improved platform/performance portability and (iii) improved aspects related to composition of the underlying implementation details of the programming environment, capturing resource utilization, overheads, etc. -- including runtime systems that are often not easilymore » addressed by application and library developers.« less

  12. Pythran: enabling static optimization of scientific Python programs

    NASA Astrophysics Data System (ADS)

    Guelton, Serge; Brunet, Pierrick; Amini, Mehdi; Merlini, Adrien; Corbillon, Xavier; Raynaud, Alan

    2015-01-01

    Pythran is an open source static compiler that turns modules written in a subset of Python language into native ones. Assuming that scientific modules do not rely much on the dynamic features of the language, it trades them for powerful, possibly inter-procedural, optimizations. These optimizations include detection of pure functions, temporary allocation removal, constant folding, Numpy ufunc fusion and parallelization, explicit thread-level parallelism through OpenMP annotations, false variable polymorphism pruning, and automatic vector instruction generation such as AVX or SSE. In addition to these compilation steps, Pythran provides a C++ runtime library that leverages the C++ STL to provide generic containers, and the Numeric Template Toolbox for Numpy support. It takes advantage of modern C++11 features such as variadic templates, type inference, move semantics and perfect forwarding, as well as classical idioms such as expression templates. Unlike the Cython approach, Pythran input code remains compatible with the Python interpreter. Output code is generally as efficient as the annotated Cython equivalent, if not more, but without the backward compatibility loss.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hornung, Richard D.; Hones, Holger E.

    The RAJA Performance Suite is designed to evaluate performance of the RAJA performance portability library on a wide variety of important high performance computing (HPC) algorithmic lulmels. These kernels assess compiler optimizations and various parallel programming model backends accessible through RAJA, such as OpenMP, CUDA, etc. The Initial version of the suite contains 25 computational kernels, each of which appears in 6 variants: Baseline SequcntiaJ, RAJA SequentiaJ, Baseline OpenMP, RAJA OpenMP, Baseline CUDA, RAJA CUDA. All variants of each kernel perform essentially the same mathematical operations and the loop body code for each kernel is identical across all variants. Theremore » are a few kernels, such as those that contain reduction operations, that require CUDA-specific coding for their CUDA variants. ActuaJ computer instructions executed and how they run in parallel differs depending on the parallel programming model backend used and which optimizations are perfonned by the compiler used to build the Perfonnance Suite executable. The Suite will be used primarily by RAJA developers to perform regular assessments of RAJA performance across a range of hardware platforms and compilers as RAJA features are being developed. It will also be used by LLNL hardware and software vendor panners for new defining requirements for future computing platform procurements and acceptance testing. In particular, the RAJA Performance Suite will be used for compiler acceptance testing of the upcoming CORAUSierra machine {initial LLNL delivery expected in late-2017/early 2018) and the CORAL-2 procurement. The Suite will aJso be used to generate concise source code reproducers of compiler and runtime issues we uncover so that we may provide them to relevant vendors to be fixed.« less

  14. Affinity Chart Analysis: A Method for Structured Collection, Aggregation, and Response to Customer Needs in Radiology.

    PubMed

    Boll, Daniel T; Rubin, Geoffrey D; Heye, Tobias; Pierce, Laura J

    2017-04-01

    The objective of this study is to analyze implementation of the voice-of-the-customer method to assess the current state of image postprocessing and reporting delivered by a radiology department and to plan improvements on the basis of referring physicians' preferences. The voice-of-the-customer method consisted of discovery, analysis, and optimization phases. Fifty referring physicians were invited to be interviewed. Interviews addressed the topics of structure, process, outcome, and support. Interviews were dissected into individual statements categorized as fact or feeling. Statements were grouped to find collective voices. Improvements were compiled from affinity charts and were processed by identifying insights. Ninety-four percent (47/50) of physicians participated, generating 352 statements (81 facts and 271 feelings) that subsequently underwent affinity chart clustering. The resultant affinity charts covered distinct themes: "we need you to know us better," "we need you to consider our workflow," "we need more from your services," "we want to review your data in certain ways," and "we want to do more with you." As a result of the insights gained, the following optimizations were implemented: a software application that improves study requesting, performance tracking, study prioritization, and longitudinal data archiving; six prototype reports containing tabulated data and annotated images; two prototype longitudinal reporting templates assessing aneurysm evolution and treatment-induced changes in organ size over time; and a teaching curriculum for trainees. This study has shown the clinical feasibility to assess the current state of image postprocessing and reporting and to implement improvements of and investments in image postprocessing and reporting infrastructure on the basis of referring physicians' preferences using the voice-of-the-customer method.

  15. A compiler and validator for flight operations on NASA space missions

    NASA Astrophysics Data System (ADS)

    Fonte, Sergio; Politi, Romolo; Capria, Maria Teresa; Giardino, Marco; De Sanctis, Maria Cristina

    2016-07-01

    In NASA missions the management and the programming of the flight systems is performed by a specific scripting language, the SASF (Spacecraft Activity Sequence File). In order to perform a check on the syntax and grammar it is necessary a compiler that stress the errors (eventually) found in the sequence file produced for an instrument on board the flight system. In our experience on Dawn mission, we developed VIRV (VIR Validator), a tool that performs checks on the syntax and grammar of SASF, runs a simulations of VIR acquisitions and eventually finds violation of the flight rules of the sequences produced. The project of a SASF compiler (SSC - Spacecraft Sequence Compiler) is ready to have a new implementation: the generalization for different NASA mission. In fact, VIRV is a compiler for a dialect of SASF; it includes VIR commands as part of SASF language. Our goal is to produce a general compiler for the SASF, in which every instrument has a library to be introduced into the compiler. The SSC can analyze a SASF, produce a log of events, perform a simulation of the instrument acquisition and check the flight rules for the instrument selected. The output of the program can be produced in GRASS GIS format and may help the operator to analyze the geometry of the acquisition.

  16. A comparative study of programming languages for next-generation astrodynamics systems

    NASA Astrophysics Data System (ADS)

    Eichhorn, Helge; Cano, Juan Luis; McLean, Frazer; Anderl, Reiner

    2018-03-01

    Due to the computationally intensive nature of astrodynamics tasks, astrodynamicists have relied on compiled programming languages such as Fortran for the development of astrodynamics software. Interpreted languages such as Python, on the other hand, offer higher flexibility and development speed thereby increasing the productivity of the programmer. While interpreted languages are generally slower than compiled languages, recent developments such as just-in-time (JIT) compilers or transpilers have been able to close this speed gap significantly. Another important factor for the usefulness of a programming language is its wider ecosystem which consists of the available open-source packages and development tools such as integrated development environments or debuggers. This study compares three compiled languages and three interpreted languages, which were selected based on their popularity within the scientific programming community and technical merit. The three compiled candidate languages are Fortran, C++, and Java. Python, Matlab, and Julia were selected as the interpreted candidate languages. All six languages are assessed and compared to each other based on their features, performance, and ease-of-use through the implementation of idiomatic solutions to classical astrodynamics problems. We show that compiled languages still provide the best performance for astrodynamics applications, but JIT-compiled dynamic languages have reached a competitive level of speed and offer an attractive compromise between numerical performance and programmer productivity.

  17. Optimized spatial priorities for biodiversity conservation in China: a systematic conservation planning perspective.

    PubMed

    Wu, Ruidong; Long, Yongcheng; Malanson, George P; Garber, Paul A; Zhang, Shuang; Li, Diqiang; Zhao, Peng; Wang, Longzhu; Duo, Hairui

    2014-01-01

    By addressing several key features overlooked in previous studies, i.e. human disturbance, integration of ecosystem- and species-level conservation features, and principles of complementarity and representativeness, we present the first national-scale systematic conservation planning for China to determine the optimized spatial priorities for biodiversity conservation. We compiled a spatial database on the distributions of ecosystem- and species-level conservation features, and modeled a human disturbance index (HDI) by aggregating information using several socioeconomic proxies. We ran Marxan with two scenarios (HDI-ignored and HDI-considered) to investigate the effects of human disturbance, and explored the geographic patterns of the optimized spatial conservation priorities. Compared to when HDI was ignored, the HDI-considered scenario resulted in (1) a marked reduction (∼9%) in the total HDI score and a slight increase (∼7%) in the total area of the portfolio of priority units, (2) a significant increase (∼43%) in the total irreplaceable area and (3) more irreplaceable units being identified in almost all environmental zones and highly-disturbed provinces. Thus the inclusion of human disturbance is essential for cost-effective priority-setting. Attention should be targeted to the areas that are characterized as moderately-disturbed, <2,000 m in altitude, and/or intermediately- to extremely-rugged in terrain to identify potentially important regions for implementing cost-effective conservation. We delineated 23 primary large-scale priority areas that are significant for conserving China's biodiversity, but those isolated priority units in disturbed regions are in more urgent need of conservation actions so as to prevent immediate and severe biodiversity loss. This study presents a spatially optimized national-scale portfolio of conservation priorities--effectively representing the overall biodiversity of China while minimizing conflicts with economic development. Our results offer critical insights for current conservation and strategic land-use planning in China. The approach is transferable and easy to implement by end-users, and applicable for national- and local-scale systematic conservation prioritization practices.

  18. Optimized Spatial Priorities for Biodiversity Conservation in China: A Systematic Conservation Planning Perspective

    PubMed Central

    Wu, Ruidong; Long, Yongcheng; Malanson, George P.; Garber, Paul A.; Zhang, Shuang; Li, Diqiang; Zhao, Peng; Wang, Longzhu; Duo, Hairui

    2014-01-01

    By addressing several key features overlooked in previous studies, i.e. human disturbance, integration of ecosystem- and species-level conservation features, and principles of complementarity and representativeness, we present the first national-scale systematic conservation planning for China to determine the optimized spatial priorities for biodiversity conservation. We compiled a spatial database on the distributions of ecosystem- and species-level conservation features, and modeled a human disturbance index (HDI) by aggregating information using several socioeconomic proxies. We ran Marxan with two scenarios (HDI-ignored and HDI-considered) to investigate the effects of human disturbance, and explored the geographic patterns of the optimized spatial conservation priorities. Compared to when HDI was ignored, the HDI-considered scenario resulted in (1) a marked reduction (∼9%) in the total HDI score and a slight increase (∼7%) in the total area of the portfolio of priority units, (2) a significant increase (∼43%) in the total irreplaceable area and (3) more irreplaceable units being identified in almost all environmental zones and highly-disturbed provinces. Thus the inclusion of human disturbance is essential for cost-effective priority-setting. Attention should be targeted to the areas that are characterized as moderately-disturbed, <2,000 m in altitude, and/or intermediately- to extremely-rugged in terrain to identify potentially important regions for implementing cost-effective conservation. We delineated 23 primary large-scale priority areas that are significant for conserving China's biodiversity, but those isolated priority units in disturbed regions are in more urgent need of conservation actions so as to prevent immediate and severe biodiversity loss. This study presents a spatially optimized national-scale portfolio of conservation priorities – effectively representing the overall biodiversity of China while minimizing conflicts with economic development. Our results offer critical insights for current conservation and strategic land-use planning in China. The approach is transferable and easy to implement by end-users, and applicable for national- and local-scale systematic conservation prioritization practices. PMID:25072933

  19. Ada Compiler Validation Summary Report: Certificate Number: 911107W1. 11228 Hewlett-Packard HP 9000 Series 700/800 Ada Compiler, Version 5.35 HP 9000 Series 800 Model 835 = HP 9000 Series 800 Model 835

    DTIC Science & Technology

    1991-11-07

    OPTICWS ’The linker options of this Ada implementation, as described in this Appendix, are provided by the customer. unless specifically noted otherwise...Packard Company Print History The following table lists the printings of this document, together with the respective release dates for each edition. The

  20. Exploiting loop level parallelism in nonprocedural dataflow programs

    NASA Technical Reports Server (NTRS)

    Gokhale, Maya B.

    1987-01-01

    Discussed are how loop level parallelism is detected in a nonprocedural dataflow program, and how a procedural program with concurrent loops is scheduled. Also discussed is a program restructuring technique which may be applied to recursive equations so that concurrent loops may be generated for a seemingly iterative computation. A compiler which generates C code for the language described below has been implemented. The scheduling component of the compiler and the restructuring transformation are described.

  1. Compilation of Abstracts of Theses Submitted by Candidates for Degrees.

    DTIC Science & Technology

    1984-06-01

    Management System for the TI - 59 Programmable Calculator Kersh, T. B. Signal Processor Interface 65 CPT, USA Simulation of the AN/SPY-lA Radar...DESIGN AND IMPLEMENTATION OF A BASIC CROSS-COMPILER AND VIRTUAL MEMORY MANAGEMENT SYSTEM FOR THE TI - 59 PROGRAMMABLE CALCULATOR Mark R. Kindl Captain...Academy, 1974 The instruction set of the TI - 59 Programmable Calculator bears a close similarity to that of an assembler. Though most of the calculator

  2. The preliminary SOL (Sizing and Optimization Language) reference manual

    NASA Technical Reports Server (NTRS)

    Lucas, Stephen H.; Scotti, Stephen J.

    1989-01-01

    The Sizing and Optimization Language, SOL, a high-level special-purpose computer language has been developed to expedite application of numerical optimization to design problems and to make the process less error-prone. This document is a reference manual for those wishing to write SOL programs. SOL is presently available for DEC VAX/VMS systems. A SOL package is available which includes the SOL compiler and runtime library routines. An overview of SOL appears in NASA TM 100565.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bales, Benjamin B; Barrett, Richard F

    In almost all modern scientific applications, developers achieve the greatest performance gains by tuning algorithms, communication systems, and memory access patterns, while leaving low level instruction optimizations to the compiler. Given the increasingly varied and complicated x86 architectures, the value of these optimizations is unclear, and, due to time and complexity constraints, it is difficult for many programmers to experiment with them. In this report we explore the potential gains of these 'last mile' optimization efforts on an AMD Barcelona processor, providing readers with relevant information so that they can decide whether investment in the presented optimizations is worthwhile.

  4. Elementary Science Curriculum Implementation: As It Was and As It Should Be.

    ERIC Educational Resources Information Center

    Horn, Jerry G.; Marsh, Marilyn A.

    School districts were identified that were involved in implementation of recent National Science Foundation (NSF) elementary school science curricula and in corresponding in-service work. Questionnaires sent to 6 school districts, selected somewhat randomly from across the 50 states and the District of Columbia, compiled information regarding…

  5. Combining constraint satisfaction and local improvement algorithms to construct anaesthetists' rotas

    NASA Technical Reports Server (NTRS)

    Smith, Barbara M.; Bennett, Sean

    1992-01-01

    A system is described which was built to compile weekly rotas for the anaesthetists in a large hospital. The rota compilation problem is an optimization problem (the number of tasks which cannot be assigned to an anaesthetist must be minimized) and was formulated as a constraint satisfaction problem (CSP). The forward checking algorithm is used to find a feasible rota, but because of the size of the problem, it cannot find an optimal (or even a good enough) solution in an acceptable time. Instead, an algorithm was devised which makes local improvements to a feasible solution. The algorithm makes use of the constraints as expressed in the CSP to ensure that feasibility is maintained, and produces very good rotas which are being used by the hospital involved in the project. It is argued that formulation as a constraint satisfaction problem may be a good approach to solving discrete optimization problems, even if the resulting CSP is too large to be solved exactly in an acceptable time. A CSP algorithm may be able to produce a feasible solution which can then be improved, giving a good, if not provably optimal, solution.

  6. 76 FR 57644 - Privacy Act of 1974; Implementation

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-09-16

    ... in DMDC 13, entitled ``Investigative Records Repository'', when investigatory material is compiled... exemptions. * * * * * (c) * * * (17) System identifier and name: DMDC 13, Investigative Records Repository...

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Seyong; Vetter, Jeffrey S

    Computer architecture experts expect that non-volatile memory (NVM) hierarchies will play a more significant role in future systems including mobile, enterprise, and HPC architectures. With this expectation in mind, we present NVL-C: a novel programming system that facilitates the efficient and correct programming of NVM main memory systems. The NVL-C programming abstraction extends C with a small set of intuitive language features that target NVM main memory, and can be combined directly with traditional C memory model features for DRAM. We have designed these new features to enable compiler analyses and run-time checks that can improve performance and guard againstmore » a number of subtle programming errors, which, when left uncorrected, can corrupt NVM-stored data. Moreover, to enable recovery of data across application or system failures, these NVL-C features include a flexible directive for specifying NVM transactions. So that our implementation might be extended to other compiler front ends and languages, the majority of our compiler analyses are implemented in an extended version of LLVM's intermediate representation (LLVM IR). We evaluate NVL-C on a number of applications to show its flexibility, performance, and correctness.« less

  8. Proving Correctness for Pointer Programs in a Verifying Compiler

    NASA Technical Reports Server (NTRS)

    Kulczycki, Gregory; Singh, Amrinder

    2008-01-01

    This research describes a component-based approach to proving the correctness of programs involving pointer behavior. The approach supports modular reasoning and is designed to be used within the larger context of a verifying compiler. The approach consists of two parts. When a system component requires the direct manipulation of pointer operations in its implementation, we implement it using a built-in component specifically designed to capture the functional and performance behavior of pointers. When a system component requires pointer behavior via a linked data structure, we ensure that the complexities of the pointer operations are encapsulated within the data structure and are hidden to the client component. In this way, programs that rely on pointers can be verified modularly, without requiring special rules for pointers. The ultimate objective of a verifying compiler is to prove-with as little human intervention as possible-that proposed program code is correct with respect to a full behavioral specification. Full verification for software is especially important for an agency like NASA that is routinely involved in the development of mission critical systems.

  9. From Classification to Causality: Advancing Understanding of Mechanisms of Change in Implementation Science.

    PubMed

    Lewis, Cara C; Klasnja, Predrag; Powell, Byron J; Lyon, Aaron R; Tuzzio, Leah; Jones, Salene; Walsh-Bailey, Callie; Weiner, Bryan

    2018-01-01

    The science of implementation has offered little toward understanding how different implementation strategies work. To improve outcomes of implementation efforts, the field needs precise, testable theories that describe the causal pathways through which implementation strategies function. In this perspective piece, we describe a four-step approach to developing causal pathway models for implementation strategies. First, it is important to ensure that implementation strategies are appropriately specified. Some strategies in published compilations are well defined but may not be specified in terms of its core component that can have a reliable and measureable impact. Second, linkages between strategies and mechanisms need to be generated. Existing compilations do not offer mechanisms by which strategies act, or the processes or events through which an implementation strategy operates to affect desired implementation outcomes. Third, it is critical to identify proximal and distal outcomes the strategy is theorized to impact, with the former being direct, measurable products of the strategy and the latter being one of eight implementation outcomes (1). Finally, articulating effect modifiers, like preconditions and moderators, allow for an understanding of where, when, and why strategies have an effect on outcomes of interest. We argue for greater precision in use of terms for factors implicated in implementation processes; development of guidelines for selecting research design and study plans that account for practical constructs and allow for the study of mechanisms; psychometrically strong and pragmatic measures of mechanisms; and more robust curation of evidence for knowledge transfer and use.

  10. Hardware-Independent Proofs of Numerical Programs

    NASA Technical Reports Server (NTRS)

    Boldo, Sylvie; Nguyen, Thi Minh Tuyen

    2010-01-01

    On recent architectures, a numerical program may give different answers depending on the execution hardware and the compilation. Our goal is to formally prove properties about numerical programs that are true for multiple architectures and compilers. We propose an approach that states the rounding error of each floating-point computation whatever the environment. This approach is implemented in the Frama-C platform for static analysis of C code. Small case studies using this approach are entirely and automatically proved

  11. Further developments in generating type-safe messaging

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Neswold, R.; King, C.; /Fermilab

    2011-11-01

    At ICALEPCS 09, we introduced a source code generator that allows processes to communicate safely using data types native to each host language. In this paper, we discuss further development that has occurred since the conference in Kobe, Japan, including the addition of three more client languages, an optimization in network packet size and the addition of a new protocol data type. The protocol compiler is continuing to prove itself as an easy and robust way to get applications written in different languages hosted on different computer architectures to communicate. We have two active Erlang projects that are using themore » protocol compiler to access ACNET data at high data rates. We also used the protocol compiler output to deliver ACNET data to an iPhone/iPad application. Since it takes an average of two weeks to support a new language, we're willing to expand the protocol compiler to support new languages that our community uses.« less

  12. Optimizing transformations of stencil operations for parallel cache-based architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bassetti, F.; Davis, K.

    This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like operations for cache-based architectures. This technique takes advantage of the semantic knowledge implicity in stencil-like computations. The technique is implemented as a source-to-source program transformation; because of its specificity it could not be expected of a conventional compiler. Empirical results demonstrate a uniform factor of two speedup. The experiments clearly show the benefits of this technique to be a consequence, as intended, of the reduction in cache misses. The test codes are based on a 5-point stencil obtained by the discretization of the Poisson equation andmore » applied to a two-dimensional uniform grid using the Jacobi method as an iterative solver. Results are presented for a 1-D tiling for a single processor, and in parallel using 1-D data partition. For the parallel case both blocking and non-blocking communication are tested. The same scheme of experiments has bee n performed for the 2-D tiling case. However, for the parallel case the 2-D partitioning is not discussed here, so the parallel case handled for 2-D is 2-D tiling with 1-D data partitioning.« less

  13. Scalability study of parallel spatial direct numerical simulation code on IBM SP1 parallel supercomputer

    NASA Technical Reports Server (NTRS)

    Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad

    1994-01-01

    The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.

  14. Compiling for Application Specific Computational Acceleration in Reconfigurable Architectures Final Report CRADA No. TSB-2033-01

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    De Supinski, B.; Caliga, D.

    2017-09-28

    The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.

  15. National Land Use Policy: Objectives, Components, Implementation.

    ERIC Educational Resources Information Center

    Soil Conservation Society of America, Ankeny, IA.

    Proceedings of a special conference sponsored by the Soil Conservation Society of America, are compiled in this report. The conference served as a forum for those involved in land use planning and implementation at all levels of government and private enterprise. Comments were directed to four main topics: (1) Objectives and Need for a National…

  16. PDP Implementation at English Universities: What Are the Issues?

    ERIC Educational Resources Information Center

    Quinton, Sarah; Smallbone, Teresa

    2008-01-01

    Personal development planing (PDP) is now a nationally required part of undergraduate and postgraduate education in the United Kingdom. Little is known about how universities in general are implementing personal development plans, nor how engaged students will become in compiling a set of records of their learning and progress, which they…

  17. Carl D. Perkins Vocational and Applied Technology Education Act of 1990: Selected Resources for Implementation.

    ERIC Educational Resources Information Center

    Kallembach, Sheri, Comp.; And Others

    This document compiles representative resources to assist state and local administrators of vocational special needs programs, special needs educators, counselors, researchers, and policymakers in implementing and complying with the Carl D. Perkins Vocational and Applied Technology Education Act of 1990. Entries in the publications section are…

  18. Post-Implementation Success Factors for Enterprise Resource Planning Student Administration Systems in Higher Education Institutions

    ERIC Educational Resources Information Center

    Sullivan, Linda; Bozeman, William

    2010-01-01

    Enterprise Resource Planning (ERP) systems can represent one of the largest investments of human and financial resources by a higher education institution. They also bring a significant process reengineering aspect to the institution and the associated implementation project through the integration of compiled industry best practices into the…

  19. Lean and Efficient Software: Whole-Program Optimization of Executables

    DTIC Science & Technology

    2015-09-30

    libraries. Many levels of library interfaces—where some libraries are dynamically linked and some are provided in binary form only—significantly limit...software at build time. The opportunity: Our objective in this project is to substantially improve the performance, size, and robustness of binary ...executables by using static and dynamic binary program analysis techniques to perform whole-program optimization directly on compiled programs

  20. Continuous-time quantum Monte Carlo impurity solvers

    NASA Astrophysics Data System (ADS)

    Gull, Emanuel; Werner, Philipp; Fuchs, Sebastian; Surer, Brigitte; Pruschke, Thomas; Troyer, Matthias

    2011-04-01

    Continuous-time quantum Monte Carlo impurity solvers are algorithms that sample the partition function of an impurity model using diagrammatic Monte Carlo techniques. The present paper describes codes that implement the interaction expansion algorithm originally developed by Rubtsov, Savkin, and Lichtenstein, as well as the hybridization expansion method developed by Werner, Millis, Troyer, et al. These impurity solvers are part of the ALPS-DMFT application package and are accompanied by an implementation of dynamical mean-field self-consistency equations for (single orbital single site) dynamical mean-field problems with arbitrary densities of states. Program summaryProgram title: dmft Catalogue identifier: AEIL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: ALPS LIBRARY LICENSE version 1.1 No. of lines in distributed program, including test data, etc.: 899 806 No. of bytes in distributed program, including test data, etc.: 32 153 916 Distribution format: tar.gz Programming language: C++ Operating system: The ALPS libraries have been tested on the following platforms and compilers: Linux with GNU Compiler Collection (g++ version 3.1 and higher), and Intel C++ Compiler (icc version 7.0 and higher) MacOS X with GNU Compiler (g++ Apple-version 3.1, 3.3 and 4.0) IBM AIX with Visual Age C++ (xlC version 6.0) and GNU (g++ version 3.1 and higher) compilers Compaq Tru64 UNIX with Compq C++ Compiler (cxx) SGI IRIX with MIPSpro C++ Compiler (CC) HP-UX with HP C++ Compiler (aCC) Windows with Cygwin or coLinux platforms and GNU Compiler Collection (g++ version 3.1 and higher) RAM: 10 MB-1 GB Classification: 7.3 External routines: ALPS [1], BLAS/LAPACK, HDF5 Nature of problem: (See [2].) Quantum impurity models describe an atom or molecule embedded in a host material with which it can exchange electrons. They are basic to nanoscience as representations of quantum dots and molecular conductors and play an increasingly important role in the theory of "correlated electron" materials as auxiliary problems whose solution gives the "dynamical mean field" approximation to the self-energy and local correlation functions. Solution method: Quantum impurity models require a method of solution which provides access to both high and low energy scales and is effective for wide classes of physically realistic models. The continuous-time quantum Monte Carlo algorithms for which we present implementations here meet this challenge. Continuous-time quantum impurity methods are based on partition function expansions of quantum impurity models that are stochastically sampled to all orders using diagrammatic quantum Monte Carlo techniques. For a review of quantum impurity models and their applications and of continuous-time quantum Monte Carlo methods for impurity models we refer the reader to [2]. Additional comments: Use of dmft requires citation of this paper. Use of any ALPS program requires citation of the ALPS [1] paper. Running time: 60 s-8 h per iteration.

  1. Optimization strategies for molecular dynamics programs on Cray computers and scalar work stations

    NASA Astrophysics Data System (ADS)

    Unekis, Michael J.; Rice, Betsy M.

    1994-12-01

    We present results of timing runs and different optimization strategies for a prototype molecular dynamics program that simulates shock waves in a two-dimensional (2-D) model of a reactive energetic solid. The performance of the program may be improved substantially by simple changes to the Fortran or by employing various vendor-supplied compiler optimizations. The optimum strategy varies among the machines used and will vary depending upon the details of the program. The effect of various compiler options and vendor-supplied subroutine calls is demonstrated. Comparison is made between two scalar workstations (IBM RS/6000 Model 370 and Model 530) and several Cray supercomputers (X-MP/48, Y-MP8/128, and C-90/16256). We find that for a scientific application program dominated by sequential, scalar statements, a relatively inexpensive high-end work station such as the IBM RS/60006 RISC series will outperform single processor performance of the Cray X-MP/48 and perform competitively with single processor performance of the Y-MP8/128 and C-9O/16256.

  2. Water Quality Standards Handbook

    EPA Pesticide Factsheets

    The Water Quality Standards Handbook is a compilation of the EPA's water quality standards (WQS) program guidance including recommendations for states, authorized tribes, and territories in reviewing, revising, and implementing WQS.

  3. Z2Pack: Numerical implementation of hybrid Wannier centers for identifying topological materials

    NASA Astrophysics Data System (ADS)

    Gresch, Dominik; Autès, Gabriel; Yazyev, Oleg V.; Troyer, Matthias; Vanderbilt, David; Bernevig, B. Andrei; Soluyanov, Alexey A.

    2017-02-01

    The intense theoretical and experimental interest in topological insulators and semimetals has established band structure topology as a fundamental material property. Consequently, identifying band topologies has become an important, but often challenging, problem, with no exhaustive solution at the present time. In this work we compile a series of techniques, some previously known, that allow for a solution to this problem for a large set of the possible band topologies. The method is based on tracking hybrid Wannier charge centers computed for relevant Bloch states, and it works at all levels of materials modeling: continuous k .p models, tight-binding models, and ab initio calculations. We apply the method to compute and identify Chern, Z2, and crystalline topological insulators, as well as topological semimetal phases, using real material examples. Moreover, we provide a numerical implementation of this technique (the Z2Pack software package) that is ideally suited for high-throughput screening of materials databases for compounds with nontrivial topologies. We expect that our work will allow researchers to (a) identify topological materials optimal for experimental probes, (b) classify existing compounds, and (c) reveal materials that host novel, not yet described, topological states.

  4. Implementation of the NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.

  5. Implementation of BT, SP, LU, and FT of NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Schultz, Matthew; Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

    2000-01-01

    A number of Java features make it an attractive but a debatable choice for High Performance Computing. We have implemented benchmarks working on single structured grid BT,SP,LU and FT in Java. The performance and scalability of the Java code shows that a significant improvement in Java compiler technology and in Java thread implementation are necessary for Java to compete with Fortran in HPC applications.

  6. A Code Generation Approach for Auto-Vectorization in the Spade Compiler

    NASA Astrophysics Data System (ADS)

    Wang, Huayong; Andrade, Henrique; Gedik, Buğra; Wu, Kun-Lung

    We describe an auto-vectorization approach for the Spade stream processing programming language, comprising two ideas. First, we provide support for vectors as a primitive data type. Second, we provide a C++ library with architecture-specific implementations of a large number of pre-vectorized operations as the means to support language extensions. We evaluate our approach with several stream processing operators, contrasting Spade's auto-vectorization with the native auto-vectorization provided by the GNU gcc and Intel icc compilers.

  7. Model compilation for real-time planning and diagnosis with feedback

    NASA Technical Reports Server (NTRS)

    Barrett, Anthony

    2005-01-01

    This paper describes MEXEC, an implemented micro executive that compiles a device model that can have feedback into a structure for subsequent evaluation. This system computes both the most likely current device mode from n sets of sensor measurements and the n-1 step reconfiguration plan that is most likely to result in reaching a target mode - if such a plan exists. A user tunes the system by increasing n to improve system capability at the cost of real-time performance.

  8. Fault-Tree Compiler Program

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.; Martensen, Anna L.

    1992-01-01

    FTC, Fault-Tree Compiler program, is reliability-analysis software tool used to calculate probability of top event of fault tree. Five different types of gates allowed in fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. High-level input language of FTC easy to understand and use. Program supports hierarchical fault-tree-definition feature simplifying process of description of tree and reduces execution time. Solution technique implemented in FORTRAN, and user interface in Pascal. Written to run on DEC VAX computer operating under VMS operating system.

  9. Aquarius Project: Research in the System Architecture of Accelerators for the High Performance Execution of Logic Programs.

    DTIC Science & Technology

    1991-05-31

    benchmarks ............ .... . .. .. . . .. 220 Appendix G : Source code of the Aquarius Prolog compiler ........ . 224 Chapter I Introduction "You’re given...notation, a tool that is used throughout the compiler’s implementation. Appendix F lists the source code of the C and Prolog benchmarks. Appendix G lists the...source code of the compilcr. 5 "- standard form Prolog / a-sfomadon / head umrvln Convert to tmeikernel Prol g vrans~fonaon 1symbolic execution

  10. Action Algebras and Model Algebras in Denotational Semantics

    NASA Astrophysics Data System (ADS)

    Guedes, Luiz Carlos Castro; Haeusler, Edward Hermann

    This article describes some results concerning the conceptual separation of model dependent and language inherent aspects in a denotational semantics of a programming language. Before going into the technical explanation, the authors wish to relate a story that illustrates how correctly and precisely posed questions can influence the direction of research. By means of his questions, Professor Mosses aided the PhD research of one of the authors of this article and taught the other, who at the time was a novice supervisor, the real meaning of careful PhD supervision. The student’s research had been partially developed towards the implementation of programming languages through denotational semantics specification, and the student had developed a prototype [12] that compared relatively well to some industrial compilers of the PASCAL language. During a visit to the BRICS lab in Aarhus, the student’s supervisor gave Professor Mosses a draft of an article describing the prototype and its implementation experiments. The next day, Professor Mosses asked the supervisor, “Why is the generated code so efficient when compared to that generated by an industrial compiler?” and “You claim that the efficiency is simply a consequence of the Object- Orientation mechanisms used by the prototype programming language (C++); this should be better investigated. Pay more attention to the class of programs that might have this good comparison profile.” As a result of these aptly chosen questions and comments, the student and supervisor made great strides in the subsequent research; the advice provided by Professor Mosses made them perceive that the code generated for certain semantic domains was efficient because it mapped to the “right aspect” of the language semantics. (Certain functional types, used to represent mappings such as Stores and Environments, were pushed to the level of the object language (as in gcc). This had the side-effect of generating code for arrays in the same way as that for functional denotational types. For example, PASCAL arrays belong to the “language inherent” aspect, while the Store domain seems to belong to the “model dependent” aspect. This distinction was important because it focussed attention on optimizing the model dependent semantic domains to obtain a more efficient implementation.) The research led to a nice conclusion: The guidelines of Action Semantics induce a clear separation of the model and language inherent aspects of a language’s semantics. A good implementation of facets, particularly the model dependent ones, leads to generation of an efficient compiler. In this article we discuss the separation of the language inherent and model-inherent domains at the theoretical and conceptual level. In doing so, the authors hope to show how Professor Mosses’s influence extended beyond his technical advice to his professional and personal examples on the supervision of PhD research.

  11. Retargeting of existing FORTRAN program and development of parallel compilers

    NASA Technical Reports Server (NTRS)

    Agrawal, Dharma P.

    1988-01-01

    The software models used in implementing the parallelizing compiler for the B-HIVE multiprocessor system are described. The various models and strategies used in the compiler development are: flexible granularity model, which allows a compromise between two extreme granularity models; communication model, which is capable of precisely describing the interprocessor communication timings and patterns; loop type detection strategy, which identifies different types of loops; critical path with coloring scheme, which is a versatile scheduling strategy for any multicomputer with some associated communication costs; and loop allocation strategy, which realizes optimum overlapped operations between computation and communication of the system. Using these models, several sample routines of the AIR3D package are examined and tested. It may be noted that automatically generated codes are highly parallelized to provide the maximized degree of parallelism, obtaining the speedup up to a 28 to 32-processor system. A comparison of parallel codes for both the existing and proposed communication model, is performed and the corresponding expected speedup factors are obtained. The experimentation shows that the B-HIVE compiler produces more efficient codes than existing techniques. Work is progressing well in completing the final phase of the compiler. Numerous enhancements are needed to improve the capabilities of the parallelizing compiler.

  12. Use of concept mapping to characterize relationships among implementation strategies and assess their feasibility and importance: results from the Expert Recommendations for Implementing Change (ERIC) study.

    PubMed

    Waltz, Thomas J; Powell, Byron J; Matthieu, Monica M; Damschroder, Laura J; Chinman, Matthew J; Smith, Jeffrey L; Proctor, Enola K; Kirchner, JoAnn E

    2015-08-07

    Poor terminological consistency for core concepts in implementation science has been widely noted as an obstacle to effective meta-analyses. This inconsistency is also a barrier for those seeking guidance from the research literature when developing and planning implementation initiatives. The Expert Recommendations for Implementing Change (ERIC) study aims to address one area of terminological inconsistency: discrete implementation strategies involving one process or action used to support a practice change. The present report is on the second stage of the ERIC project that focuses on providing initial validation of the compilation of 73 implementation strategies that were identified in the first phase. Purposive sampling was used to recruit a panel of experts in implementation science and clinical practice (N = 35). These key stakeholders used concept mapping sorting and rating activities to place the 73 implementation strategies into similar groups and to rate each strategy's relative importance and feasibility. Multidimensional scaling analysis provided a quantitative representation of the relationships among the strategies, all but one of which were found to be conceptually distinct from the others. Hierarchical cluster analysis supported organizing the 73 strategies into 9 categories. The ratings data reflect those strategies identified as the most important and feasible. This study provides initial validation of the implementation strategies within the ERIC compilation as being conceptually distinct. The categorization and strategy ratings of importance and feasibility may facilitate the search for, and selection of, strategies that are best suited for implementation efforts in a particular setting.

  13. Area and power efficient DCT architecture for image compression

    NASA Astrophysics Data System (ADS)

    Dhandapani, Vaithiyanathan; Ramachandran, Seshasayanan

    2014-12-01

    The discrete cosine transform (DCT) is one of the major components in image and video compression systems. The final output of these systems is interpreted by the human visual system (HVS), which is not perfect. The limited perception of human visualization allows the algorithm to be numerically approximate rather than exact. In this paper, we propose a new matrix for discrete cosine transform. The proposed 8 × 8 transformation matrix contains only zeros and ones which requires only adders, thus avoiding the need for multiplication and shift operations. The new class of transform requires only 12 additions, which highly reduces the computational complexity and achieves a performance in image compression that is comparable to that of the existing approximated DCT. Another important aspect of the proposed transform is that it provides an efficient area and power optimization while implementing in hardware. To ensure the versatility of the proposal and to further evaluate the performance and correctness of the structure in terms of speed, area, and power consumption, the model is implemented on Xilinx Virtex 7 field programmable gate array (FPGA) device and synthesized with Cadence® RTL Compiler® using UMC 90 nm standard cell library. The analysis obtained from the implementation indicates that the proposed structure is superior to the existing approximation techniques with a 30% reduction in power and 12% reduction in area.

  14. Compiling Planning into Quantum Optimization Problems: A Comparative Study

    DTIC Science & Technology

    2015-06-07

    and Sipser, M. 2000. Quantum computation by adiabatic evolution. arXiv:quant- ph/0001106. Fikes, R. E., and Nilsson, N. J. 1972. STRIPS: A new...become available: quantum annealing. Quantum annealing is one of the most accessible quantum algorithms for a computer sci- ence audience not versed...in quantum computing because of its close ties to classical optimization algorithms such as simulated annealing. While large-scale universal quantum

  15. High-performance computing — an overview

    NASA Astrophysics Data System (ADS)

    Marksteiner, Peter

    1996-08-01

    An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.

  16. Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences.

    PubMed

    Rideout, Jai Ram; He, Yan; Navas-Molina, Jose A; Walters, William A; Ursell, Luke K; Gibbons, Sean M; Chase, John; McDonald, Daniel; Gonzalez, Antonio; Robbins-Pianka, Adam; Clemente, Jose C; Gilbert, Jack A; Huse, Susan M; Zhou, Hong-Wei; Knight, Rob; Caporaso, J Gregory

    2014-01-01

    We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to "classic" open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, "classic" open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of "classic" open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by "classic" open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME's uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME's OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.

  17. Optimizing Performance of Combustion Chemistry Solvers on Intel's Many Integrated Core (MIC) Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sitaraman, Hariswaran; Grout, Ray W

    This work investigates novel algorithm designs and optimization techniques for restructuring chemistry integrators in zero and multidimensional combustion solvers, which can then be effectively used on the emerging generation of Intel's Many Integrated Core/Xeon Phi processors. These processors offer increased computing performance via large number of lightweight cores at relatively lower clock speeds compared to traditional processors (e.g. Intel Sandybridge/Ivybridge) used in current supercomputers. This style of processor can be productively used for chemistry integrators that form a costly part of computational combustion codes, in spite of their relatively lower clock speeds. Performance commensurate with traditional processors is achieved heremore » through the combination of careful memory layout, exposing multiple levels of fine grain parallelism and through extensive use of vendor supported libraries (Cilk Plus and Math Kernel Libraries). Important optimization techniques for efficient memory usage and vectorization have been identified and quantified. These optimizations resulted in a factor of ~ 3 speed-up using Intel 2013 compiler and ~ 1.5 using Intel 2017 compiler for large chemical mechanisms compared to the unoptimized version on the Intel Xeon Phi. The strategies, especially with respect to memory usage and vectorization, should also be beneficial for general purpose computational fluid dynamics codes.« less

  18. Tera-Op Reliable Intelligently Adaptive Processing System (TRIPS)

    DTIC Science & Technology

    2004-04-01

    flop creates a loadable FIFO queue, fifo pload. A prototype of the HML simulator is implemented using a functional language OCaml . The language type...Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 7.1.2 Hardware Meta Language ...operates on the TRIPS Intermediate Language (TIL) produced by the Scale compiler. We also adapted the gnu binary utilities to implement an assembler and

  19. Listen to Me: Report of the Ninth Annual NEA-CHR Conference, "Implementing Cultural Diversity in Instructional Programs".

    ERIC Educational Resources Information Center

    National Education Association, Washington, DC. Center for Human Relations.

    This publication is a compilation of speeches, seminar summaries, and participant reactions and recommendations from the Ninth Annual NEA-CHR Conference printed in both English and Spanish. The conference was designed to present the concept of cultural pluralism and to suggest ways of implementing this concept in instructional programs. The…

  20. Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code

    DOE PAGES

    Mendis, Charith; Bosboom, Jeffrey; Wu, Kevin; ...

    2015-06-03

    Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide's state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware. The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regeneratemore » the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. Here, we abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions. Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75 % performance improvement, four kernels from Irfan View, leading to 4.97 x performance, and one stencil from the mini GMG multigrid benchmark netting a 4.25 x improvement in performance. We manually rejuvenated Photoshop by replacing eleven of Photoshop's filters with our lifted implementations, giving 1.12 x speedup without affecting the user experience.« less

  1. Implementation of NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael; Schultz, Matthew; Jin, Hao-Qiang; Yan, Jerry

    2000-01-01

    A number of features make Java an attractive but a debatable choice for High Performance Computing (HPC). In order to gauge the applicability of Java to the Computational Fluid Dynamics (CFD) we have implemented NAS Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would move Java closer to Fortran in the competition for CFD applications.

  2. Implementation of equity in resource allocation for regional earthquake risk mitigation using two-stage stochastic programming.

    PubMed

    Zolfaghari, Mohammad R; Peyghaleh, Elnaz

    2015-03-01

    This article presents a new methodology to implement the concept of equity in regional earthquake risk mitigation programs using an optimization framework. It presents a framework that could be used by decisionmakers (government and authorities) to structure budget allocation strategy toward different seismic risk mitigation measures, i.e., structural retrofitting for different building structural types in different locations and planning horizons. A two-stage stochastic model is developed here to seek optimal mitigation measures based on minimizing mitigation expenditures, reconstruction expenditures, and especially large losses in highly seismically active countries. To consider fairness in the distribution of financial resources among different groups of people, the equity concept is incorporated using constraints in model formulation. These constraints limit inequity to the user-defined level to achieve the equity-efficiency tradeoff in the decision-making process. To present practical application of the proposed model, it is applied to a pilot area in Tehran, the capital city of Iran. Building stocks, structural vulnerability functions, and regional seismic hazard characteristics are incorporated to compile a probabilistic seismic risk model for the pilot area. Results illustrate the variation of mitigation expenditures by location and structural type for buildings. These expenditures are sensitive to the amount of available budget and equity consideration for the constant risk aversion. Most significantly, equity is more easily achieved if the budget is unlimited. Conversely, increasing equity where the budget is limited decreases the efficiency. The risk-return tradeoff, equity-reconstruction expenditures tradeoff, and variation of per-capita expected earthquake loss in different income classes are also presented. © 2015 Society for Risk Analysis.

  3. LIBVERSIONINGCOMPILER: An easy-to-use library for dynamic generation and invocation of multiple code versions

    NASA Astrophysics Data System (ADS)

    Cherubin, S.; Agosta, G.

    2018-01-01

    We present LIBVERSIONINGCOMPILER, a C++ library designed to support the dynamic generation of multiple versions of the same compute kernel in a HPC scenario. It can be used to provide continuous optimization, code specialization based on the input data or on workload changes, or otherwise to dynamically adjust the application, without the burden of a full dynamic compiler. The library supports multiple underlying compilers but specifically targets the LLVM framework. We also provide examples of use, showing the overhead of the library, and providing guidelines for its efficient use.

  4. Ada Compiler Validation Summary Report. Certificate Number: 920918S1. 11273 U.S. Navy, Ada/M, Version 4.5 /OPTIMIZE) VAX 8550/8600/8650 (Cluster) = VHSIC Processor Module (VPM) AN/AYK-14 (Bare Board)

    DTIC Science & Technology

    1992-10-27

    Module (VPM) AN/AYK-14 (Bare Board) (target), 920918S1.11273 6. AUTHOR(S) National Institute of Standards and Technology Gaithersburg, MD USA 7 ...Validation Procedures (Pro90] against the Ada Standard (Ada83] using the current Ada Compiler Validation Capability (ACVC). This Validation Summary Report ( VSR ...l..V-20 => ’ $MAXLENINTBASEDLITERAL "-Ŗ:" & (l..V-5 1> 𔃺’) & ൓:" $MAXLENREALBASEDLITERAL ൘:" & (i..V- 7 => 𔃺’) & "F.E:" $MAXSTRINGLITERAL

  5. Writing and compiling code into biochemistry.

    PubMed

    Shea, Adam; Fett, Brian; Riedel, Marc D; Parhi, Keshab

    2010-01-01

    This paper presents a methodology for translating iterative arithmetic computation, specified as high-level programming constructs, into biochemical reactions. From an input/output specification, we generate biochemical reactions that produce output quantities of proteins as a function of input quantities performing operations such as addition, subtraction, and scalar multiplication. Iterative constructs such as "while" loops and "for" loops are implemented by transferring quantities between protein types, based on a clocking mechanism. Synthesis first is performed at a conceptual level, in terms of abstract biochemical reactions - a task analogous to high-level program compilation. Then the results are mapped onto specific biochemical reactions selected from libraries - a task analogous to machine language compilation. We demonstrate our approach through the compilation of a variety of standard iterative functions: multiplication, exponentiation, discrete logarithms, raising to a power, and linear transforms on time series. The designs are validated through transient stochastic simulation of the chemical kinetics. We are exploring DNA-based computation via strand displacement as a possible experimental chassis.

  6. Data Collection Answers - SEER Registrars

    Cancer.gov

    Read clarifications to existing coding rules, which should be implemented immediately. Data collection experts from American College of Surgeons Commission on Cancer, CDC National Program of Cancer Registries, and SEER Program compiled these answers.

  7. Supercomputer optimizations for stochastic optimal control applications

    NASA Technical Reports Server (NTRS)

    Chung, Siu-Leung; Hanson, Floyd B.; Xu, Huihuang

    1991-01-01

    Supercomputer optimizations for a computational method of solving stochastic, multibody, dynamic programming problems are presented. The computational method is valid for a general class of optimal control problems that are nonlinear, multibody dynamical systems, perturbed by general Markov noise in continuous time, i.e., nonsmooth Gaussian as well as jump Poisson random white noise. Optimization techniques for vector multiprocessors or vectorizing supercomputers include advanced data structures, loop restructuring, loop collapsing, blocking, and compiler directives. These advanced computing techniques and superconducting hardware help alleviate Bellman's curse of dimensionality in dynamic programming computations, by permitting the solution of large multibody problems. Possible applications include lumped flight dynamics models for uncertain environments, such as large scale and background random aerospace fluctuations.

  8. Read buffer optimizations to support compiler-assisted multiple instruction retry

    NASA Technical Reports Server (NTRS)

    Alewine, N. J.; Fuchs, W. K.; Hwu, W. M.

    1993-01-01

    Multiple instruction retry is a recovery mechanism for transient processor faults. We previously developed a compiler-assisted approach to multiple instruction ferry in which a read buffer of size 2N (where N represents the maximum instruction rollback distance) was used to resolve some data hazards while the compiler resolved the remaining hazards. The compiler-assisted scheme was shown to reduce the performance overhead and/or hardware complexity normally associated with hardware-only retry schemes. This paper examines the size and design of the read buffer. We establish a practical lower bound and average size requirement for the read buffer by modifying the scheme to save only the data required for rollback. The study measures the effect on the performance of a DECstation 3100 running ten application programs using six read buffer configurations with varying read buffer sizes. Two alternative configurations are shown to be the most efficient and differed depending on whether split-cycle-saves are assumed. Up to a 55 percent read buffer size reduction is achievable with an average reduction of 39 percent given the most efficient read buffer configuration and a variety of applications.

  9. Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology.

    PubMed

    Benigni, Romualdo; Bossa, Cecilia

    2008-01-01

    In the past decades, chemical carcinogenicity has been the object of mechanistic studies that have been translated into valuable experimental (e.g., the Salmonella assays system) and theoretical (e.g., compilations of structure alerts for chemical carcinogenicity) models. These findings remain the basis of the science and regulation of mutagens and carcinogens. Recent advances in the organization and treatment of large databases consisting of both biological and chemical information nowadays allows for a much easier and more refined view of data. This paper reviews recent analyses on the predictive performance of various lists of structure alerts, including a new compilation of alerts that combines previous work in an optimized form for computer implementation. The revised compilation is part of the Toxtree 1.50 software (freely available from the European Chemicals Bureau website). The use of structural alerts for the chemical biological profiling of a large database of Salmonella mutagenicity results is also reported. Together with being a repository of the science on the chemical biological interactions at the basis of chemical carcinogenicity, the SAs have a crucial role in practical applications for risk assessment, for: (a) description of sets of chemicals; (b) preliminary hazard characterization; (c) formation of categories for e.g., regulatory purposes; (d) generation of subsets of congeneric chemicals to be analyzed subsequently with QSAR methods; (e) priority setting. An important aspect of SAs as predictive toxicity tools is that they derive directly from mechanistic knowledge. The crucial role of mechanistic knowledge in the process of applying (Q)SAR considerations to risk assessment should be strongly emphasized. Mechanistic knowledge provides a ground for interaction and dialogue between model developers, toxicologists and regulators, and permits the integration of the (Q)SAR results into a wider regulatory framework, where different types of evidence and data concur or complement each other as a basis for making decisions and taking actions.

  10. Adaptive Environment for Supercompiling with Optimized Parallelism (AESOP)

    DTIC Science & Technology

    2011-09-01

    DATES COVERED (From - To) September 2011 Final 09 March 2009 – 31 July 2011 4 . TITLE AND SUBTITLE ADAPTIVE ENVIRONMENT FOR SUPERCOMPILING WITH... 4 2.1 System characterization loop...Integration Points for AESOP .......................................................................................10 4 . LLVM and the AESOP Compiler

  11. Compilation of International Regulatory Guidance Documents for Neuropathology Assessment during Nonclinical Toxicity Studies

    EPA Science Inventory

    Neuropathology analysis as an endpoint during nonclinical efficacy and toxicity studies is a challenging prospect that requires trained personnel and particular equipment to achieve optimal results. Accordingly, many regulatory agencies have produced explicit guidelines for desig...

  12. Migration of legacy mumps applications to relational database servers.

    PubMed

    O'Kane, K C

    2001-07-01

    An extended implementation of the Mumps language is described that facilitates vendor neutral migration of legacy Mumps applications to SQL-based relational database servers. Implemented as a compiler, this system translates Mumps programs to operating system independent, standard C code for subsequent compilation to fully stand-alone, binary executables. Added built-in functions and support modules extend the native hierarchical Mumps database with access to industry standard, networked, relational database management servers (RDBMS) thus freeing Mumps applications from dependence upon vendor specific, proprietary, unstandardized database models. Unlike Mumps systems that have added captive, proprietary RDMBS access, the programs generated by this development environment can be used with any RDBMS system that supports common network access protocols. Additional features include a built-in web server interface and the ability to interoperate directly with programs and functions written in other languages.

  13. System for Configuring Modular Telemetry Transponders

    NASA Technical Reports Server (NTRS)

    Varnavas, Kosta A. (Inventor); Sims, William Herbert, III (Inventor)

    2014-01-01

    A system for configuring telemetry transponder cards uses a database of error checking protocol data structures, each containing data to implement at least one CCSDS protocol algorithm. Using a user interface, a user selects at least one telemetry specific error checking protocol from the database. A compiler configures an FPGA with the data from the data structures to implement the error checking protocol.

  14. Selected photographic techniques, a compilation

    NASA Technical Reports Server (NTRS)

    1971-01-01

    A selection has been made of methods, devices, and techniques developed in the field of photography during implementation of space and nuclear research projects. These items include many adaptations, variations, and modifications to standard hardware and practice, and should prove interesting to both amateur and professional photographers and photographic technicians. This compilation is divided into two sections. The first section presents techniques and devices that have been found useful in making photolab work simpler, more productive, and higher in quality. Section two deals with modifications to and special applications for existing photographic equipment.

  15. An implementation and analysis of the Abstract Syntax Notation One and the basic encoding rules

    NASA Technical Reports Server (NTRS)

    Harvey, James D.; Weaver, Alfred C.

    1990-01-01

    The details of abstract syntax notation one standard (ASN.1) and the basic encoding rules standard (BER) that collectively solve the problem of data transfer across incompatible host environments are presented, and a compiler that was built to automate their use is described. Experiences with this compiler are also discussed which provide a quantitative analysis of the performance costs associated with the application of these standards. An evaluation is offered as to how well suited ASN.1 and BER are in solving the common data representation problem.

  16. Implementation of a 3D mixing layer code on parallel computers

    NASA Technical Reports Server (NTRS)

    Roe, K.; Thakur, R.; Dang, T.; Bogucz, E.

    1995-01-01

    This paper summarizes our progress and experience in the development of a Computational-Fluid-Dynamics code on parallel computers to simulate three-dimensional spatially-developing mixing layers. In this initial study, the three-dimensional time-dependent Euler equations are solved using a finite-volume explicit time-marching algorithm. The code was first programmed in Fortran 77 for sequential computers. The code was then converted for use on parallel computers using the conventional message-passing technique, while we have not been able to compile the code with the present version of HPF compilers.

  17. Reducing software security risk through an integrated approach

    NASA Technical Reports Server (NTRS)

    Gilliam, D.; Powell, J.; Kelly, J.; Bishop, M.

    2001-01-01

    The fourth quarter delivery, FY'01 for this RTOP is a Property-Based Testing (PBT), 'Tester's Assistant' (TA). The TA tool is to be used to check compiled and pre-compiled code for potential security weaknesses that could be exploited by hackers. The TA Instrumenter, implemented mostly in C++ (with a small part in Java), parsels two types of files: Java and TASPEC. Security properties to be checked are written in TASPEC. The Instrumenter is used in conjunction with the Tester's Assistant Specification (TASpec)execution monitor to verify the security properties of a given program.

  18. Timing characterization and analysis of the Linux-based, closed loop control computer for the Subaru Telescope laser guide star adaptive optics system

    NASA Astrophysics Data System (ADS)

    Dinkins, Matthew; Colley, Stephen

    2008-07-01

    Hardware and software specialized for real time control reduce the timing jitter of executables when compared to off-the-shelf hardware and software. However, these specialized environments are costly in both money and development time. While conventional systems have a cost advantage, the jitter in these systems is much larger and potentially problematic. This study analyzes the timing characterstics of a standard Dell server running a fully featured Linux operating system to determine if such a system would be capable of meeting the timing requirements for closed loop operations. Investigations are preformed on the effectiveness of tools designed to make off-the-shelf system performance closer to specialized real time systems. The Gnu Compiler Collection (gcc) is compared to the Intel C Compiler (icc), compiler optimizations are investigated, and real-time extensions to Linux are evaluated.

  19. On the Efficacy of Source Code Optimizations for Cache-Based Systems

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Saphir, William C.

    1998-01-01

    Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize temporal and spatial locality of memory references by reusing data and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive impact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates - as reported by a cache simulation tool, and confirmed by hardware counters - only partially explain the results. By contrast, the compiler-generated assembly code provides more insight by revealing the importance of processor-specific instructions and of compiler maturity, both of which strongly, and sometimes unexpectedly, influence performance. We conclude that it is difficult to obtain performance portability on modern cache-based computers, and comment on the implications of this result.

  20. On the Efficacy of Source Code Optimizations for Cache-Based Systems

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.; Saphir, William C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize temporal and spatial locality of memory references by reusing data and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive impact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates-as reported by a cache simulation tool, and confirmed by hardware counters-only partially explain the results. By contrast, the compiler-generated assembly code provides more insight by revealing the importance of processor-specific instructions and of compiler maturity, both of which strongly, and sometimes unexpectedly, influence performance. We conclude that it is difficult to obtain performance portability on modern cache-based computers, and comment on the implications of this result.

  1. Modular Expression Language for Ordinary Differential Equation Editing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blake, Robert C.

    MELODEEis a system for describing systems of initial value problem ordinary differential equations, and a compiler for the language that produces optimized code to integrate the differential equations. Features include rational polynomial approximation for expensive functions and automatic differentiation for symbolic jacobians

  2. Spacelab user implementation assessment study. Volume 4: SUIAS appendixes

    NASA Technical Reports Server (NTRS)

    1975-01-01

    The capital investment for the integration and checkout of Spacelab payloads is assessed. Detailed data pertaining to this assessment and a computer cost model utilized in the compilation of programmatic resource requirements are delineated.

  3. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

    1999-01-01

    As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.

  4. DI: An interactive debugging interpreter for applicative languages

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Skedzielewski, S.K.; Yates, R.K.; Oldehoeft, R.R.

    1987-03-12

    The DI interpreter is both a debugger and interpreter of SISLAL programs. Its use as a program interpreter is only a small part of its role; it is designed to be a tool for studying compilation techniques for applicative languages. DI interprets dataflow graphs expressed in the IF1 and IF2 languages, and is heavily instrumented to report the activity of dynamic storage activity, reference counting, copying and updating of structured data values. It also aids the SISAL language evaluation by providing an interim execution vehicle for SISAL programs. DI provides determinate, sequential interpretation of graph nodes for sequential and parallelmore » operations in a canonical order. As a debugging aid, DI allows tracing, breakpointing, and interactive display of program data values. DI handles creation of SISAL and IF1 error values for each data type and propagates them according to a well-defined algebra. We have begun to implement IF1 optimizers and have measured the improvements with DI.« less

  5. Programs of religious/spiritual support in hospitals - five "Whies" and five "Hows".

    PubMed

    Saad, Marcelo; de Medeiros, Roberta

    2016-08-22

    A contemporary orientation of the hospital experience model must encompass the clients' religious-spiritual dimension. The objective of this paper is to share a previous experience, highlighting at least five reasons hospitals should invest in this direction, and an equal number of steps required to achieve it. In the first part, the text discourses about five reasons to invest in religious-spiritual support programs: 1. Religious-spiritual wellbeing is related to better health; 2. Religious-spiritual appreciation is a standard for hospital accreditation; 3. To undo religious-spiritual misunderstandings that can affect treatment; 4. Patients demand a religious-spiritual outlook from the institution; and 5. Costs may be reduced with religious-spiritual support. In the second part, the text suggests five steps to implement religious-spiritual support programs: 1. Deep institutional involvement; 2. Formal staff training; 3. Infrastructure and resources; 4. Adjustment of institutional politics; and 5. Agreement with religious-spiritual leaders. The authors hope the information compiled here can inspire hospitals to adopt actions toward optimization of the healing experience.

  6. OPTICON: Pro-Matlab software for large order controlled structure design

    NASA Technical Reports Server (NTRS)

    Peterson, Lee D.

    1989-01-01

    A software package for large order controlled structure design is described and demonstrated. The primary program, called OPTICAN, uses both Pro-Matlab M-file routines and selected compiled FORTRAN routines linked into the Pro-Matlab structure. The program accepts structural model information in the form of state-space matrices and performs three basic design functions on the model: (1) open loop analyses; (2) closed loop reduced order controller synthesis; and (3) closed loop stability and performance assessment. The current controller synthesis methods which were implemented in this software are based on the Generalized Linear Quadratic Gaussian theory of Bernstein. In particular, a reduced order Optimal Projection synthesis algorithm based on a homotopy solution method was successfully applied to an experimental truss structure using a 58-state dynamic model. These results are presented and discussed. Current plans to expand the practical size of the design model to several hundred states and the intention to interface Pro-Matlab to a supercomputing environment are discussed.

  7. Jellyfish Bioactive Compounds: Methods for Wet-Lab Work

    PubMed Central

    Frazão, Bárbara; Antunes, Agostinho

    2016-01-01

    The study of bioactive compounds from marine animals has provided, over time, an endless source of interesting molecules. Jellyfish are commonly targets of study due to their toxic proteins. However, there is a gap in reviewing successful wet-lab methods employed in these animals, which compromises the fast progress in the detection of related biomolecules. Here, we provide a compilation of the most effective wet-lab methodologies for jellyfish venom extraction prior to proteomic analysis—separation, identification and toxicity assays. This includes SDS-PAGE, 2DE, gel chromatography, HPLC, DEAE, LC-MS, MALDI, Western blot, hemolytic assay, antimicrobial assay and protease activity assay. For a more comprehensive approach, jellyfish toxicity studies should further consider transcriptome sequencing. We reviewed such methodologies and other genomic techniques used prior to the deep sequencing of transcripts, including RNA extraction, construction of cDNA libraries and RACE. Overall, we provide an overview of the most promising methods and their successful implementation for optimizing time and effort when studying jellyfish. PMID:27077869

  8. Jellyfish Bioactive Compounds: Methods for Wet-Lab Work.

    PubMed

    Frazão, Bárbara; Antunes, Agostinho

    2016-04-12

    The study of bioactive compounds from marine animals has provided, over time, an endless source of interesting molecules. Jellyfish are commonly targets of study due to their toxic proteins. However, there is a gap in reviewing successful wet-lab methods employed in these animals, which compromises the fast progress in the detection of related biomolecules. Here, we provide a compilation of the most effective wet-lab methodologies for jellyfish venom extraction prior to proteomic analysis-separation, identification and toxicity assays. This includes SDS-PAGE, 2DE, gel chromatography, HPLC, DEAE, LC-MS, MALDI, Western blot, hemolytic assay, antimicrobial assay and protease activity assay. For a more comprehensive approach, jellyfish toxicity studies should further consider transcriptome sequencing. We reviewed such methodologies and other genomic techniques used prior to the deep sequencing of transcripts, including RNA extraction, construction of cDNA libraries and RACE. Overall, we provide an overview of the most promising methods and their successful implementation for optimizing time and effort when studying jellyfish.

  9. Magnetocaloric Materials and the Optimization of Cooling Power Density

    NASA Technical Reports Server (NTRS)

    Wikus, Patrick; Canavan, Edgar; Heine, Sarah Trowbridge; Matsumoto, Koichi; Numazawa, Takenori

    2014-01-01

    The magnetocaloric effect is the thermal response of a material to an external magnetic field. This manuscript focuses on the physics and the properties of materials which are commonly used for magnetic refrigeration at cryogenic temperatures. After a brief overview of the magnetocaloric effect and associated thermodynamics, typical requirements on refrigerants are discussed from a standpoint of cooling power density optimization. Finally, a compilation of the most important properties of several common magnetocaloric materials is presented.

  10. Scout: high-performance heterogeneous computing made simple

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jablin, James; Mc Cormick, Patrick; Herlihy, Maurice

    2011-01-26

    Researchers must often write their own simulation and analysis software. During this process they simultaneously confront both computational and scientific problems. Current strategies for aiding the generation of performance-oriented programs do not abstract the software development from the science. Furthermore, the problem is becoming increasingly complex and pressing with the continued development of many-core and heterogeneous (CPU-GPU) architectures. To acbieve high performance, scientists must expertly navigate both software and hardware. Co-design between computer scientists and research scientists can alleviate but not solve this problem. The science community requires better tools for developing, optimizing, and future-proofing codes, allowing scientists to focusmore » on their research while still achieving high computational performance. Scout is a parallel programming language and extensible compiler framework targeting heterogeneous architectures. It provides the abstraction required to buffer scientists from the constantly-shifting details of hardware while still realizing higb-performance by encapsulating software and hardware optimization within a compiler framework.« less

  11. Bellman's GAP--a language and compiler for dynamic programming in sequence analysis.

    PubMed

    Sauthoff, Georg; Möhl, Mathias; Janssen, Stefan; Giegerich, Robert

    2013-03-01

    Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman's GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. In Bellman's GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman's GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman's GAP as an implementation platform of 'real-world' bioinformatics tools. Bellman's GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics.

  12. MEDOF - MINIMUM EUCLIDEAN DISTANCE OPTIMAL FILTER

    NASA Technical Reports Server (NTRS)

    Barton, R. S.

    1994-01-01

    The Minimum Euclidean Distance Optimal Filter program, MEDOF, generates filters for use in optical correlators. The algorithm implemented in MEDOF follows theory put forth by Richard D. Juday of NASA/JSC. This program analytically optimizes filters on arbitrary spatial light modulators such as coupled, binary, full complex, and fractional 2pi phase. MEDOF optimizes these modulators on a number of metrics including: correlation peak intensity at the origin for the centered appearance of the reference image in the input plane, signal to noise ratio including the correlation detector noise as well as the colored additive input noise, peak to correlation energy defined as the fraction of the signal energy passed by the filter that shows up in the correlation spot, and the peak to total energy which is a generalization of PCE that adds the passed colored input noise to the input image's passed energy. The user of MEDOF supplies the functions that describe the following quantities: 1) the reference signal, 2) the realizable complex encodings of both the input and filter SLM, 3) the noise model, possibly colored, as it adds at the reference image and at the correlation detection plane, and 4) the metric to analyze, here taken to be one of the analytical ones like SNR (signal to noise ratio) or PCE (peak to correlation energy) rather than peak to secondary ratio. MEDOF calculates filters for arbitrary modulators and a wide range of metrics as described above. MEDOF examines the statistics of the encoded input image's noise (if SNR or PCE is selected) and the filter SLM's (Spatial Light Modulator) available values. These statistics are used as the basis of a range for searching for the magnitude and phase of k, a pragmatically based complex constant for computing the filter transmittance from the electric field. The filter is produced for the mesh points in those ranges and the value of the metric that results from these points is computed. When the search is concluded, the values of amplitude and phase for the k whose metric was largest, as well as consistency checks, are reported. A finer search can be done in the neighborhood of the optimal k if desired. The filter finally selected is written to disk in terms of drive values, not in terms of the filter's complex transmittance. Optionally, the impulse response of the filter may be created to permit users to examine the response for the features the algorithm deems important to the recognition process under the selected metric, limitations of the filter SLM, etc. MEDOF uses the filter SLM to its greatest potential, therefore filter competence is not compromised for simplicity of computation. MEDOF is written in C-language for Sun series computers running SunOS. With slight modifications, it has been implemented on DEC VAX series computers using the DEC-C v3.30 compiler, although the documentation does not currently support this platform. MEDOF can also be compiled using Borland International Inc.'s Turbo C++ v1.0, but IBM PC memory restrictions greatly reduce the maximum size of the reference images from which the filters can be calculated. MEDOF requires a two dimensional Fast Fourier Transform (2DFFT). One 2DFFT routine which has been used successfully with MEDOF is a routine found in "Numerical Recipes in C: The Art of Scientific Programming," which is available from Cambridge University Press, New Rochelle, NY 10801. The standard distribution medium for MEDOF is a .25 inch streaming magnetic tape cartridge (Sun QIC-24) in UNIX tar format. MEDOF was developed in 1992-1993.

  13. Technical Data and Reports on Nitrogen Dioxide Measurements and SIP Status

    EPA Pesticide Factsheets

    EPA collects data from the states and regions on their air quality and state implementation plan (SIP) progress. This information is compiled in a database, and used to create reports, trend charts, and maps.

  14. Technical Data and Reports on Carbon Monoxide Measurements and SIP Status

    EPA Pesticide Factsheets

    EPA collects data from the states and regions on their air quality and state implementation plan (SIP) progress. This information is compiled in a database, and used to create reports, trend charts, and maps.

  15. The fault-tree compiler

    NASA Technical Reports Server (NTRS)

    Martensen, Anna L.; Butler, Ricky W.

    1987-01-01

    The Fault Tree Compiler Program is a new reliability tool used to predict the top event probability for a fault tree. Five different gate types are allowed in the fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N gates. The high level input language is easy to understand and use when describing the system tree. In addition, the use of the hierarchical fault tree capability can simplify the tree description and decrease program execution time. The current solution technique provides an answer precise (within the limits of double precision floating point arithmetic) to the five digits in the answer. The user may vary one failure rate or failure probability over a range of values and plot the results for sensitivity analyses. The solution technique is implemented in FORTRAN; the remaining program code is implemented in Pascal. The program is written to run on a Digital Corporation VAX with the VMS operation system.

  16. An inference engine for embedded diagnostic systems

    NASA Technical Reports Server (NTRS)

    Fox, Barry R.; Brewster, Larry T.

    1987-01-01

    The implementation of an inference engine for embedded diagnostic systems is described. The system consists of two distinct parts. The first is an off-line compiler which accepts a propositional logical statement of the relationship between facts and conclusions and produces data structures required by the on-line inference engine. The second part consists of the inference engine and interface routines which accept assertions of fact and return the conclusions which necessarily follow. Given a set of assertions, it will generate exactly the conclusions which logically follow. At the same time, it will detect any inconsistencies which may propagate from an inconsistent set of assertions or a poorly formulated set of rules. The memory requirements are fixed and the worst case execution times are bounded at compile time. The data structures and inference algorithms are very simple and well understood. The data structures and algorithms are described in detail. The system has been implemented on Lisp, Pascal, and Modula-2.

  17. The Fault Tree Compiler (FTC): Program and mathematics

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.; Martensen, Anna L.

    1989-01-01

    The Fault Tree Compiler Program is a new reliability tool used to predict the top-event probability for a fault tree. Five different gate types are allowed in the fault tree: AND, OR, EXCLUSIVE OR, INVERT, AND m OF n gates. The high-level input language is easy to understand and use when describing the system tree. In addition, the use of the hierarchical fault tree capability can simplify the tree description and decrease program execution time. The current solution technique provides an answer precisely (within the limits of double precision floating point arithmetic) within a user specified number of digits accuracy. The user may vary one failure rate or failure probability over a range of values and plot the results for sensitivity analyses. The solution technique is implemented in FORTRAN; the remaining program code is implemented in Pascal. The program is written to run on a Digital Equipment Corporation (DEC) VAX computer with the VMS operation system.

  18. Ontology patterns for tabular representations of biomedical knowledge on neglected tropical diseases

    PubMed Central

    Santana, Filipe; Schober, Daniel; Medeiros, Zulma; Freitas, Fred; Schulz, Stefan

    2011-01-01

    Motivation: Ontology-like domain knowledge is frequently published in a tabular format embedded in scientific publications. We explore the re-use of such tabular content in the process of building NTDO, an ontology of neglected tropical diseases (NTDs), where the representation of the interdependencies between hosts, pathogens and vectors plays a crucial role. Results: As a proof of concept we analyzed a tabular compilation of knowledge about pathogens, vectors and geographic locations involved in the transmission of NTDs. After a thorough ontological analysis of the domain of interest, we formulated a comprehensive design pattern, rooted in the biomedical domain upper level ontology BioTop. This pattern was implemented in a VBA script which takes cell contents of an Excel spreadsheet and transforms them into OWL-DL. After minor manual post-processing, the correctness and completeness of the ontology was tested using pre-formulated competence questions as description logics (DL) queries. The expected results could be reproduced by the ontology. The proposed approach is recommended for optimizing the acquisition of ontological domain knowledge from tabular representations. Availability and implementation: Domain examples, source code and ontology are freely available on the web at http://www.cin.ufpe.br/~ntdo. Contact: fss3@cin.ufpe.br PMID:21685092

  19. A Python Implementation of an Intermediate-Level Tropical Circulation Model and Implications for How Modeling Science is Done

    NASA Astrophysics Data System (ADS)

    Lin, J. W. B.

    2015-12-01

    Historically, climate models have been developed incrementally and in compiled languages like Fortran. While the use of legacy compiledlanguages results in fast, time-tested code, the resulting model is limited in its modularity and cannot take advantage of functionalityavailable with modern computer languages. Here we describe an effort at using the open-source, object-oriented language Pythonto create more flexible climate models: the package qtcm, a Python implementation of the intermediate-level Neelin-Zeng Quasi-Equilibrium Tropical Circulation model (QTCM1) of the atmosphere. The qtcm package retains the core numerics of QTCM1, written in Fortran, to optimize model performance but uses Python structures and utilities to wrap the QTCM1 Fortran routines and manage model execution. The resulting "mixed language" modeling package allows order and choice of subroutine execution to be altered at run time, and model analysis and visualization to be integrated in interactively with model execution at run time. This flexibility facilitates more complex scientific analysis using less complex code than would be possible using traditional languages alone and provides tools to transform the traditional "formulate hypothesis → write and test code → run model → analyze results" sequence into a feedback loop that can be executed automatically by the computer.

  20. Porting marine ecosystem model spin-up using transport matrices to GPUs

    NASA Astrophysics Data System (ADS)

    Siewertsen, E.; Piwonski, J.; Slawig, T.

    2013-01-01

    We have ported an implementation of the spin-up for marine ecosystem models based on transport matrices to graphics processing units (GPUs). The original implementation was designed for distributed-memory architectures and uses the Portable, Extensible Toolkit for Scientific Computation (PETSc) library that is based on the Message Passing Interface (MPI) standard. The spin-up computes a steady seasonal cycle of ecosystem tracers with climatological ocean circulation data as forcing. Since the transport is linear with respect to the tracers, the resulting operator is represented by matrices. Each iteration of the spin-up involves two matrix-vector multiplications and the evaluation of the used biogeochemical model. The original code was written in C and Fortran. On the GPU, we use the Compute Unified Device Architecture (CUDA) standard, a customized version of PETSc and a commercial CUDA Fortran compiler. We describe the extensions to PETSc and the modifications of the original C and Fortran codes that had to be done. Here we make use of freely available libraries for the GPU. We analyze the computational effort of the main parts of the spin-up for two exemplar ecosystem models and compare the overall computational time to those necessary on different CPUs. The results show that a consumer GPU can compete with a significant number of cluster CPUs without further code optimization.

  1. CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU.

    PubMed

    Jiang, Hanyu; Ganesan, Narayan

    2016-02-27

    HMMER software suite is widely used for analysis of homologous protein and nucleotide sequences with high sensitivity. The latest version of hmmsearch in HMMER 3.x, utilizes heuristic-pipeline which consists of MSV/SSV (Multiple/Single ungapped Segment Viterbi) stage, P7Viterbi stage and the Forward scoring stage to accelerate homology detection. Since the latest version is highly optimized for performance on modern multi-core CPUs with SSE capabilities, only a few acceleration attempts report speedup. However, the most compute intensive tasks within the pipeline (viz., MSV/SSV and P7Viterbi stages) still stand to benefit from the computational capabilities of massively parallel processors. A Multi-Tiered Parallel Framework (CUDAMPF) implemented on CUDA-enabled GPUs presented here, offers a finer-grained parallelism for MSV/SSV and Viterbi algorithms. We couple SIMT (Single Instruction Multiple Threads) mechanism with SIMD (Single Instructions Multiple Data) video instructions with warp-synchronism to achieve high-throughput processing and eliminate thread idling. We also propose a hardware-aware optimal allocation scheme of scarce resources like on-chip memory and caches in order to boost performance and scalability of CUDAMPF. In addition, runtime compilation via NVRTC available with CUDA 7.0 is incorporated into the presented framework that not only helps unroll innermost loop to yield upto 2 to 3-fold speedup than static compilation but also enables dynamic loading and switching of kernels depending on the query model size, in order to achieve optimal performance. CUDAMPF is designed as a hardware-aware parallel framework for accelerating computational hotspots within the hmmsearch pipeline as well as other sequence alignment applications. It achieves significant speedup by exploiting hierarchical parallelism on single GPU and takes full advantage of limited resources based on their own performance features. In addition to exceeding performance of other acceleration attempts, comprehensive evaluations against high-end CPUs (Intel i5, i7 and Xeon) shows that CUDAMPF yields upto 440 GCUPS for SSV, 277 GCUPS for MSV and 14.3 GCUPS for P7Viterbi all with 100 % accuracy, which translates to a maximum speedup of 37.5, 23.1 and 11.6-fold for MSV, SSV and P7Viterbi respectively. The source code is available at https://github.com/Super-Hippo/CUDAMPF.

  2. FX-87 performance measurements: data-flow implementation. Technical report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hammel, R.T.; Gifford, D.K.

    1988-11-01

    This report documents a series of experiments performed to explore the thesis that the FX-87 effect system permits a compiler to schedule imperative programs (i.e., programs that may contain side-effects) for execution on a parallel computer. The authors analyze how much the FX-87 static effect system can improve the execution times of five benchmark programs on a parallel graph interpreter. Three of their benchmark programs do not use side-effects (factorial, fibonacci, and polynomial division) and thus did not have any effect-induced constraints. Their FX-87 performance was comparable to their performance in a purely functional language. Two of the benchmark programsmore » use side effects (DNA sequence matching and Scheme interpretation) and the compiler was able to use effect information to reduce their execution times by factors of 1.7 to 5.4 when compared with sequential execution times. These results support the thesis that a static effect system is a powerful tool for compilation to multiprocessor computers. However, the graph interpreter we used was based on unrealistic assumptions, and thus our results may not accurately reflect the performance of a practical FX-87 implementation. The results also suggest that conventional loop analysis would complement the FX-87 effect system« less

  3. SLEEC: Semantics-Rich Libraries for Effective Exascale Computation. Final Technical Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Milind, Kulkarni

    SLEEC (Semantics-rich Libraries for Effective Exascale Computation) was a project funded by the Department of Energy X-Stack Program, award number DE-SC0008629. The initial project period was September 2012–August 2015. The project was renewed for an additional year, expiring August 2016. Finally, the project received a no-cost extension, leading to a final expiry date of August 2017. Modern applications, especially those intended to run at exascale, are not written from scratch. Instead, they are built by stitching together various carefully-written, hand-tuned libraries. Correctly composing these libraries is difficult, but traditional compilers are unable to effectively analyze and transform across abstraction layers.more » Domain specific compilers integrate semantic knowledge into compilers, allowing them to transform applications that use particular domain-specific languages, or domain libraries. But they do not help when new domains are developed, or applications span multiple domains. SLEEC aims to fix these problems. To do so, we are building generic compiler and runtime infrastructures that are semantics-aware but not domain-specific. By performing optimizations related to the semantics of a domain library, the same infrastructure can be made generic and apply across multiple domains.« less

  4. A distributed programming environment for Ada

    NASA Technical Reports Server (NTRS)

    Brennan, Peter; Mcdonnell, Tom; Mcfarland, Gregory; Timmins, Lawrence J.; Litke, John D.

    1986-01-01

    Despite considerable commercial exploitation of fault tolerance systems, significant and difficult research problems remain in such areas as fault detection and correction. A research project is described which constructs a distributed computing test bed for loosely coupled computers. The project is constructing a tool kit to support research into distributed control algorithms, including a distributed Ada compiler, distributed debugger, test harnesses, and environment monitors. The Ada compiler is being written in Ada and will implement distributed computing at the subsystem level. The design goal is to provide a variety of control mechanics for distributed programming while retaining total transparency at the code level.

  5. Industrial Automation Mechanic Model Curriculum Project. Final Report.

    ERIC Educational Resources Information Center

    Toledo Public Schools, OH.

    This document describes a demonstration program that developed secondary level competency-based instructional materials for industrial automation mechanics. Program activities included task list compilation, instructional materials research, learning activity packet (LAP) development, construction of lab elements, system implementation,…

  6. Technical Data and Reports on Sulfur Dioxide (SO2) Measurements and SIP Status

    EPA Pesticide Factsheets

    EPA collects data from the states and regions on their air quality and state implementation plan (SIP) progress. This information is compiled in a database, and used to create reports, trend charts, and maps.

  7. Performance and Scalability of the NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.

  8. Compilation of Abstracts of Theses Submitted by Candidates for Degrees.

    DTIC Science & Technology

    1986-09-30

    Musitano, J.R. Fin-line Horn Antennas 118 LCDR, USNR Muth, L.R. VLSI Tutorials Through the 119 LT, USN Video -computer Courseware Implementation...Engineer Allocation 432 CPT, USA Model Kiziltan, M. Cognitive Performance Degrada- 433 LTJG, Turkish Navy tion on Sonar Operator and Tor- pedo Data...and Computer Engineering 118 VLSI TUTORIALS THROUGH THE VIDEO -COMPUTER COURSEWARE IMPLEMENTATION SYSTEM Liesel R. Muth Lieutenant, United States Navy

  9. Ada Compiler Validation Summary Report: Certificate Number 890711W1. 10109 Concurrent Computer Corporation C(3) Ada, Version R02-02.00 Concurrent Computer Corporation 3280 MPS

    DTIC Science & Technology

    1989-07-11

    applicable because this implementation does not support temporary files with names. ag . EE2401D is inapplicable because this implementation does not...buffer. No spanned records with ASCII.NUL are output. A line terminator followed by a page terminator may be represented as: ASC::. CR ASCU :.FF ASCII.CR if

  10. Corn response to nitrogen management under fully-irrigated vs. water-stressed conditions

    USDA-ARS?s Scientific Manuscript database

    Characterizing corn grain yield response to nitrogen (N) fertilizer rate is critical for maximizing profits, optimizing N use efficiency and minimizing environmental impacts. Although a large data base of yield response to N has been compiled for highly productive soils in the upper Midwest U.S., f...

  11. Becoming Little Scientists: Technologically-Enhanced Project-Based Language Learning

    ERIC Educational Resources Information Center

    Dooly, Melinda; Sadler, Randall

    2016-01-01

    This article outlines research into innovative language teaching practices that make optimal use of technology and Computer-Mediated Communication (CMC) for an integrated approach to Project-Based Learning. It is based on data compiled during a 10- week language project that employed videoconferencing and "machinima" (short video clips…

  12. Automated Analysis of Stateflow Models

    NASA Technical Reports Server (NTRS)

    Bourbouh, Hamza; Garoche, Pierre-Loic; Garion, Christophe; Gurfinkel, Arie; Kahsaia, Temesghen; Thirioux, Xavier

    2017-01-01

    Stateflow is a widely used modeling framework for embedded and cyber physical systems where control software interacts with physical processes. In this work, we present a framework a fully automated safety verification technique for Stateflow models. Our approach is two-folded: (i) we faithfully compile Stateflow models into hierarchical state machines, and (ii) we use automated logic-based verification engine to decide the validity of safety properties. The starting point of our approach is a denotational semantics of State flow. We propose a compilation process using continuation-passing style (CPS) denotational semantics. Our compilation technique preserves the structural and modal behavior of the system. The overall approach is implemented as an open source toolbox that can be integrated into the existing Mathworks Simulink Stateflow modeling framework. We present preliminary experimental evaluations that illustrate the effectiveness of our approach in code generation and safety verification of industrial scale Stateflow models.

  13. Fast 2D FWI on a multi and many-cores workstation.

    NASA Astrophysics Data System (ADS)

    Thierry, Philippe; Donno, Daniela; Noble, Mark

    2014-05-01

    Following the introduction of x86 co-processors (Xeon Phi) and the performance increase of standard 2-socket workstations using the latest 12 cores E5-v2 x86-64 CPU, we present here a MPI + OpenMP implementation of an acoustic 2D FWI (full waveform inversion) code which simultaneously runs on the CPUs and on the co-processors installed in a workstation. The main advantage of running a 2D FWI on a workstation is to be able to quickly evaluate new features such as more complicated wave equations, new cost functions, finite-difference stencils or boundary conditions. Since the co-processor is made of 61 in-order x86 cores, each of them having up to 4 threads, this many-core can be seen as a shared memory SMP (symmetric multiprocessing) machine with its own IP address. Depending on the vendor, a single workstation can handle several co-processors making the workstation as a personal cluster under the desk. The original Fortran 90 CPU version of the 2D FWI code is just recompiled to get a Xeon Phi x86 binary. This multi and many-core configuration uses standard compilers and associated MPI as well as math libraries under Linux; therefore, the cost of code development remains constant, while improving computation time. We choose to implement the code with the so-called symmetric mode to fully use the capacity of the workstation, but we also evaluate the scalability of the code in native mode (i.e running only on the co-processor) thanks to the Linux ssh and NFS capabilities. Usual care of optimization and SIMD vectorization is used to ensure optimal performances, and to analyze the application performances and bottlenecks on both platforms. The 2D FWI implementation uses finite-difference time-domain forward modeling and a quasi-Newton (with L-BFGS algorithm) optimization scheme for the model parameters update. Parallelization is achieved through standard MPI shot gathers distribution and OpenMP for domain decomposition within the co-processor. Taking advantage of the 16 GB of memory available on the co-processor we are able to keep wavefields in memory to achieve the gradient computation by cross-correlation of forward and back-propagated wavefields needed by our time-domain FWI scheme, without heavy traffic on the i/o subsystem and PCIe bus. In this presentation we will also review some simple methodologies to determine performance expectation compared to real performances in order to get optimization effort estimation before starting any huge modification or rewriting of research codes. The key message is the ease of use and development of this hybrid configuration to reach not the absolute peak performance value but the optimal one that ensures the best balance between geophysical and computer developments.

  14. Fast computation of close-coupling exchange integrals using polynomials in a tree representation

    NASA Astrophysics Data System (ADS)

    Wallerberger, Markus; Igenbergs, Katharina; Schweinzer, Josef; Aumayr, Friedrich

    2011-03-01

    The semi-classical atomic-orbital close-coupling method is a well-known approach for the calculation of cross sections in ion-atom collisions. It strongly relies on the fast and stable computation of exchange integrals. We present an upgrade to earlier implementations of the Fourier-transform method. For this purpose, we implement an extensive library for symbolic storage of polynomials, relying on sophisticated tree structures to allow fast manipulation and numerically stable evaluation. Using this library, we considerably speed up creation and computation of exchange integrals. This enables us to compute cross sections for more complex collision systems. Program summaryProgram title: TXINT Catalogue identifier: AEHS_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHS_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 12 332 No. of bytes in distributed program, including test data, etc.: 157 086 Distribution format: tar.gz Programming language: Fortran 95 Computer: All with a Fortran 95 compiler Operating system: All with a Fortran 95 compiler RAM: Depends heavily on input, usually less than 100 MiB Classification: 16.10 Nature of problem: Analytical calculation of one- and two-center exchange matrix elements for the close-coupling method in the impact parameter model. Solution method: Similar to the code of Hansen and Dubois [1], we use the Fourier-transform method suggested by Shakeshaft [2] to compute the integrals. However, we heavily speed up the calculation using a library for symbolic manipulation of polynomials. Restrictions: We restrict ourselves to a defined collision system in the impact parameter model. Unusual features: A library for symbolic manipulation of polynomials, where polynomials are stored in a space-saving left-child right-sibling binary tree. This provides stable numerical evaluation and fast mutation while maintaining full compatibility with the original code. Additional comments: This program makes heavy use of the new features provided by the Fortran 90 standard, most prominently pointers, derived types and allocatable structures and a small portion of Fortran 95. Only newer compilers support these features. Following compilers support all features needed by the program. GNU Fortran Compiler "gfortran" from version 4.3.0 GNU Fortran 95 Compiler "g95" from version 4.2.0 Intel Fortran Compiler "ifort" from version 11.0

  15. Testing New Programming Paradigms with NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

    2000-01-01

    Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.

  16. YAPPA: a Compiler-Based Parallelization Framework for Irregular Applications on MPSoCs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lovergine, Silvia; Tumeo, Antonino; Villa, Oreste

    Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on non-coherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expectedmore » performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.« less

  17. On program restructuring, scheduling, and communication for parallel processor systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Polychronopoulos, Constantine D.

    1986-08-01

    This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, thesemore » algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.« less

  18. VINE-A NUMERICAL CODE FOR SIMULATING ASTROPHYSICAL SYSTEMS USING PARTICLES. II. IMPLEMENTATION AND PERFORMANCE CHARACTERISTICS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nelson, Andrew F.; Wetzstein, M.; Naab, T.

    2009-10-01

    We continue our presentation of VINE. In this paper, we begin with a description of relevant architectural properties of the serial and shared memory parallel computers on which VINE is intended to run, and describe their influences on the design of the code itself. We continue with a detailed description of a number of optimizations made to the layout of the particle data in memory and to our implementation of a binary tree used to access that data for use in gravitational force calculations and searches for smoothed particle hydrodynamics (SPH) neighbor particles. We describe the modifications to the codemore » necessary to obtain forces efficiently from special purpose 'GRAPE' hardware, the interfaces required to allow transparent substitution of those forces in the code instead of those obtained from the tree, and the modifications necessary to use both tree and GRAPE together as a fused GRAPE/tree combination. We conclude with an extensive series of performance tests, which demonstrate that the code can be run efficiently and without modification in serial on small workstations or in parallel using the OpenMP compiler directives on large-scale, shared memory parallel machines. We analyze the effects of the code optimizations and estimate that they improve its overall performance by more than an order of magnitude over that obtained by many other tree codes. Scaled parallel performance of the gravity and SPH calculations, together the most costly components of most simulations, is nearly linear up to at least 120 processors on moderate sized test problems using the Origin 3000 architecture, and to the maximum machine sizes available to us on several other architectures. At similar accuracy, performance of VINE, used in GRAPE-tree mode, is approximately a factor 2 slower than that of VINE, used in host-only mode. Further optimizations of the GRAPE/host communications could improve the speed by as much as a factor of 3, but have not yet been implemented in VINE. Finally, we find that although parallel performance on small problems may reach a plateau beyond which more processors bring no additional speedup, performance never decreases, a factor important for running large simulations on many processors with individual time steps, where only a small fraction of the total particles require updates at any given moment.« less

  19. Implementing a national process for estimating growth, removals, and mortality at the Pacific Northwest’s Forest Inventory and Analysis’s Region: modeling diameter growth

    Treesearch

    Olaf. Kuegler

    2015-01-01

    The Pacific Northwest Research Station’s Forest Inventory and Analysis Unit began remeasurement of permanently located FIA plots under the annualized design in 2011. With remeasurement has come the need to implement the national FIA system for compiling estimates of forest growth, removals, and mortality. The national system requires regional diameter-growth models to...

  20. Harmonised information exchange between decentralised food composition database systems.

    PubMed

    Pakkala, H; Christensen, T; de Victoria, I Martínez; Presser, K; Kadvan, A

    2010-11-01

    The main aim of the European Food Information Resource (EuroFIR) project is to develop and disseminate a comprehensive, coherent and validated data bank for the distribution of food composition data (FCD). This can only be accomplished by harmonising food description and data documentation and by the use of standardised thesauri. The data bank is implemented through a network of local FCD storages (usually national) under the control and responsibility of the local (national) EuroFIR partner. The implementation of the system based on the EuroFIR specifications is under development. The data interchange happens through the EuroFIR Web Services interface, allowing the partners to implement their system using methods and software suitable for the local computer environment. The implementation uses common international standards, such as Simple Object Access Protocol, Web Service Description Language and Extensible Markup Language (XML). A specifically constructed EuroFIR search facility (eSearch) was designed for end users. The EuroFIR eSearch facility compiles queries using a specifically designed Food Data Query Language and sends a request to those network nodes linked to the EuroFIR Web Services that will most likely have the requested information. The retrieved FCD are compiled into a specifically designed data interchange format (the EuroFIR Food Data Transport Package) in XML, which is sent back to the EuroFIR eSearch facility as the query response. The same request-response operation happens in all the nodes that have been selected in the EuroFIR eSearch facility for a certain task. Finally, the FCD are combined by the EuroFIR eSearch facility and delivered to the food compiler. The implementation of FCD interchange using decentralised computer systems instead of traditional data-centre models has several advantages. First of all, the local partners have more control over their FCD, which will increase commitment and improve quality. Second, a multicentred solution is more economically viable than the creation of a centralised data bank, because of the lack of national political support for multinational systems.

  1. Using a multi-state Learning Community as an implementation strategy for immediate postpartum long-acting reversible contraception.

    PubMed

    DeSisto, Carla L; Estrich, Cameron; Kroelinger, Charlan D; Goodman, David A; Pliska, Ellen; Mackie, Christine N; Waddell, Lisa F; Rankin, Kristin M

    2017-11-21

    Implementation strategies are imperative for the successful adoption and sustainability of complex evidence-based public health practices. Creating a learning collaborative is one strategy that was part of a recently published compilation of implementation strategy terms and definitions. In partnership with the Centers for Disease Control and Prevention and other partner agencies, the Association of State and Territorial Health Officials recently convened a multi-state Learning Community to support cross-state collaboration and provide technical assistance for improving state capacity to increase access to long-acting reversible contraception (LARC) in the immediate postpartum period, an evidence-based practice with the potential for reducing unintended pregnancy and improving maternal and child health outcomes. During 2015-2016, the Learning Community included multi-disciplinary, multi-agency teams of state health officials, payers, clinicians, and health department staff from 13 states. This qualitative study was conducted to better understand the successes, challenges, and strategies that the 13 US states in the Learning Community used for increasing access to immediate postpartum LARC. We conducted telephone interviews with each team in the Learning Community. Interviews were semi-structured and organized by the eight domains of the Learning Community. We coded transcribed interviews for facilitators, barriers, and implementation strategies, using a recent compilation of expert-defined implementation strategies as a foundation for coding the latter. Data analysis showed three ways that the activities of the Learning Community helped in policy implementation work: structure and accountability, validity, and preparing for potential challenges and opportunities. Further, the qualitative data demonstrated that the Learning Community integrated six other implementation strategies from the literature: organize clinician implementation team meetings, conduct educational meetings, facilitation, promote network weaving, provide ongoing consultation, and distribute educational materials. Convening a multi-state learning collaborative is a promising approach for facilitating the implementation of new reimbursement policies for evidence-based practices complicated by systems challenges. By integrating several implementation strategies, the Learning Community serves as a meta-strategy for supporting implementation.

  2. Vector-matrix-quaternion, array and arithmetic packages: All HAL/S functions implemented in Ada

    NASA Technical Reports Server (NTRS)

    Klumpp, Allan R.; Kwong, David D.

    1986-01-01

    The HAL/S avionics programmers have enjoyed a variety of tools built into a language tailored to their special requirements. Ada is designed for a broader group of applications. Rather than providing built-in tools, Ada provides the elements with which users can build their own. Standard avionic packages remain to be developed. These must enable programmers to code in Ada as they have coded in HAL/S. The packages under development at JPL will provide all of the vector-matrix, array, and arithmetic functions described in the HAL/S manuals. In addition, the linear algebra package will provide all of the quaternion functions used in Shuttle steering and Galileo attitude control. Furthermore, using Ada's extensibility, many quaternion functions are being implemented as infix operations; equivalent capabilities were never implemented in HAL/S because doing so would entail modifying the compiler and expanding the language. With these packages, many HAL/S expressions will compile and execute in Ada, unchanged. Others can be converted simply by replacing the implicit HAL/S multiply operator with the Ada *. Errors will be trapped and identified. Input/output will be convenient and readable.

  3. EPA Office of Water (OW): 2002 Impaired Waters Baseline NHDPlus Indexed Dataset

    EPA Pesticide Factsheets

    This dataset consists of geospatial and attribute data identifying the spatial extent of state-reported impaired waters (EPA's Integrated Reporting categories 4a, 4b, 4c and 5)* available in EPA's Reach Address Database (RAD) at the time of extraction. For the 2002 baseline reporting year, EPA compiled state-submitted GIS data to create a seamless and nationally consistent picture of the Nation's impaired waters for measuring progress. EPA's Assessment and TMDL Tracking and Implementation System (ATTAINS) is a national compilation of states' 303(d) listings and TMDL development information, spanning several years of tracking over 40,000 impaired waters.

  4. Validation Summary Report: TLD Systems, Ltd., TLD HP 9000/MIL-STD-1750A Ada Compiler System, Ver 2.9.0, HP-UX Ver 7.0 (Host) to running TLDmps (Target), 920319W1.11243

    DTIC Science & Technology

    1992-04-30

    9G.dgal. Wath ~nglon. DC 20503l 1. AGENCY USE (Leave 2. REPORT j3. REPORT TYPE AND DATES I Final: 30 April 92 4. TITLE AND 5.FUNDING Validation Summary...2-4 CHAPTER 3 PROCESSING INFORMATION 3.1 TESTING EVIRaONMflr.. ............ ............ 3-1 3.2 SUM’MARYYOOTE TRESUT ESU.L...ACVC an Ada implementation must process each test of the customized test suite according to the Ada Standard. 1.4 DEFINITION OF TERMS Ada Compiler

  5. Parallel Performance of a Combustion Chemistry Simulation

    DOE PAGES

    Skinner, Gregg; Eigenmann, Rudolf

    1995-01-01

    We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.

  6. ROOT: A C++ framework for petabyte data storage, statistical analysis and visualization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Antcheva, I.; /CERN; Ballintijn, M.

    2009-01-01

    ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web or a number of different shared file systems. In order to analyze this data, the user can chose outmore » of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally distributing the work over the available resources in a transparent way.« less

  7. Technical Data and Reports on Particulate Matter (PM) Measurements and SIP Status

    EPA Pesticide Factsheets

    EPA collects data from the states and regions on their air quality, including levels of pollutants such as PM, and state implementation plan (SIP) progress. This information is compiled in a database, and used to create reports, trend charts, and maps.

  8. Methodology to evaluate the performance of simulation models for alternative compiler and operating system configurations

    USDA-ARS?s Scientific Manuscript database

    Simulation modelers increasingly require greater flexibility for model implementation on diverse operating systems, and they demand high computational speed for efficient iterative simulations. Additionally, model users may differ in preference for proprietary versus open-source software environment...

  9. Instructor Data Reporting Procedures.

    ERIC Educational Resources Information Center

    Mountain-Plains Education and Economic Development Program, Inc., Glasgow AFB, MT.

    The document has been compiled for reference use by instructors and others in need of information necessary to understand and implement the Mountain-Plains instructional and evaluation system. Included in detail are: (1) descriptions of the several forms used for student accounting, student progress monitoring, program evaluation, and ancillary…

  10. Verification Tests for Sierra/SM's Reproducing Kernal Particle Method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Giffin, Brian D.

    2015-09-01

    This report seeks to verify the proper implemention of RKPM within Sierra by comparing the results from several basic example problems excecuted with RKPM against the analytical and FEM solutions for these same problems. This report was compiled as a summer student intern project.

  11. Fault tolerant, radiation hard, high performance digital signal processor

    NASA Technical Reports Server (NTRS)

    Holmann, Edgar; Linscott, Ivan R.; Maurer, Michael J.; Tyler, G. L.; Libby, Vibeke

    1990-01-01

    An architecture has been developed for a high-performance VLSI digital signal processor that is highly reliable, fault-tolerant, and radiation-hard. The signal processor, part of a spacecraft receiver designed to support uplink radio science experiments at the outer planets, organizes the connections between redundant arithmetic resources, register files, and memory through a shuffle exchange communication network. The configuration of the network and the state of the processor resources are all under microprogram control, which both maps the resources according to algorithmic needs and reconfigures the processing should a failure occur. In addition, the microprogram is reloadable through the uplink to accommodate changes in the science objectives throughout the course of the mission. The processor will be implemented with silicon compiler tools, and its design will be verified through silicon compilation simulation at all levels from the resources to full functionality. By blending reconfiguration with redundancy the processor implementation is fault-tolerant and reliable, and possesses the long expected lifetime needed for a spacecraft mission to the outer planets.

  12. The Sizing and Optimization Language, (SOL): Computer language for design problems

    NASA Technical Reports Server (NTRS)

    Lucas, Stephen H.; Scotti, Stephen J.

    1988-01-01

    The Sizing and Optimization Language, (SOL), a new high level, special purpose computer language was developed to expedite application of numerical optimization to design problems and to make the process less error prone. SOL utilizes the ADS optimization software and provides a clear, concise syntax for describing an optimization problem, the OPTIMIZE description, which closely parallels the mathematical description of the problem. SOL offers language statements which can be used to model a design mathematically, with subroutines or code logic, and with existing FORTRAN routines. In addition, SOL provides error checking and clear output of the optimization results. Because of these language features, SOL is best suited to model and optimize a design concept when the model consits of mathematical expressions written in SOL. For such cases, SOL's unique syntax and error checking can be fully utilized. SOL is presently available for DEC VAX/VMS systems. A SOL package is available which includes the SOL compiler, runtime library routines, and a SOL reference manual.

  13. Kernel and System Procedures in Flex.

    DTIC Science & Technology

    1983-08-01

    System procedures on which the operating system for the Flex computer is based. These are the low level rOCedures Whbich are used to implement the compilers, file-store* coummand interpreters etc on Flex. 168 ... System procedures on which the operating system for the Flex computer is based. These are the low level procedures which are used to implement the...privileged mode. They form the interface between the user and a particular operating system written on top of the Kernel.

  14. Large angle magnetic suspension text fixture

    NASA Technical Reports Server (NTRS)

    Britcher, Colin P.

    1995-01-01

    In lieu of a final report for this project for the period 1 April 1995 through 31 October 1995, a compilation of three reports are included herein. The three reports are: (1) 'Design and Implementation of a Digital Controller for a Magnetic Suspension and Vernier Pointing System', (2) 'Influence of Eddy Currents on the Dynamic Characteristics of Magnetic Suspensions and Magnetic Bearings', and (3) 'Design and Implementation of a Digital Controller for a Magnetic Suspension and Vernier Pointing System'.

  15. Should the Federal Government Implement a Program Which Guarantees Employment Opportunities for All U.S. Citizens in the Labor Force? Inter-Collegiate Debate Topic, 1978-1979, Pursuant to Public Law 88-246.

    ERIC Educational Resources Information Center

    Roth, Dennis M.

    This is a compilation of selected articles and a bibliography on the 1978-79 intercollegiate debate proposition: Resolved, that the Federal Government should implement a program which guarantees employment opportunities for all U.S. citizens in the labor force. The introduction briefly reviews the United States post-World War II history of…

  16. ADA Implementation Issues as Discovered through a Literature Survey of Applications Outside the United States

    DTIC Science & Technology

    1992-03-01

    compile time, ensuring that operations conducted are appropriate for the object type. Each implementation requires a database known as the program...Finnish bank being developed by Nokia • Oil drilling control system managed by Sedco- Forex * Vigile - an industrial installation supervisor project by...user interface and Oracle database backend control. The software is being developed in Ada under DOD-STD-2167 under OS/2. BELGIUM BATS S.A. Project title

  17. Stakeholder driven indicators for eHealth performance management.

    PubMed

    Vedlūga, Tomas; Mikulskienė, Birutė

    2017-08-01

    The goal of the present article is to compile a corpus of indicators of eHealth development evaluation that would essentially reflect stakeholder approaches and complement technical indicators of assessment of an eHealth system. Consequently, the assessment of the development of an eHealth system would reflect stakeholder approaches and become an innovative solution in attempting to improve productivity of IT projects in the field of health care. The compiled minimum set of indicators will be designed to monitor implementation of the national eHealth information system. To ensure reliability of the quality research, the respondents were grouped in accordance to the geographical distribution and diversity of the levels and types of the represented jobs and institutions. The applied analysis implies several managerial insights on the hierarchy of eHealth indicators. These insights may be helpful in recommending priority activities in implementation of an eHealth data system on the national or international level. The research is practically useful as it is the first to deal with the topic in Lithuania and its theoretical and practical aspect are particularly relevant in implementation of an eHealth data system in Lithuania. The eHealth assessment indicators presented in the article may be practically useful in two aspects: (1) as key implementation guidelines facilitating the general course of eHealth system development and (2) as a means to evaluate eHealth outcomes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. NAS Parallel Benchmark. Results 11-96: Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks. 1.0

    NASA Technical Reports Server (NTRS)

    Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

    1997-01-01

    High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.

  19. 75 FR 64746 - Proposed Collection, Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-10-20

    ... Injuries and Illnesses in the Workplace, and another report, Keystone National Policy Dialogue on Work... implementing Section 24(a) of the Occupational Safety and Health Act of 1970. This section states that ``the Secretary shall compile accurate statistics on work injuries and illnesses which shall include all disabling...

  20. Competency-Based Adult High School Curriculum Project.

    ERIC Educational Resources Information Center

    Singer, Elizabeth

    This compilation of program materials serves as an introduction to and overview of Florida's Brevard Community College's (BCC's) Competency-Based Adult High School Completion Project, which was conducted to teach administrators, counselors, and teachers how to organize and implement a competency-based adult education (CBAE) program; to critique…

  1. Competency-Based Adult Education: Florida Model.

    ERIC Educational Resources Information Center

    Singer, Elizabeth

    This compilation of program materials serves as an introduction to Florida's Brevard Community College's (BCC's) Competency-Based Adult High School Completion Project, a multi-year project designed to teach adult administrators, counselors, and teachers how to organize and implement a competency-based adult education (CBAE) program; to critique…

  2. Catalog of Exemplary Projects: 1984-85.

    ERIC Educational Resources Information Center

    Virginia Community Coll. System, Sterling. Inst. for Instructional Excellence.

    This compilation of abstracts represents 39 projects that were funded by the State Council of Higher Education for Virginia under Adapter Grants (which involve experimentation with instructional methods or techniques) or Developer Grants (which involve the implementation of a uniquely innovative teaching method or other instructional procedure).…

  3. Optimizability of OGC Standards Implementations - a Case Study

    NASA Astrophysics Data System (ADS)

    Misev, D.; Baumann, P.

    2012-04-01

    Why do we shop at Amazon? Because they have a unique offering that is nowhere else available? Certainly not. Rather, Amazon offers (i) simple, yet effective search; (ii) very simple payment; (iii) extremely rapid delivery. This is how scientific services will be distinguished in future: not for their data holding (there will be manifold choice), but for their service quality. We are facing the transition from data stewardship to service stewardship. One of the OGC standards which particularly enables flexible retrieval is the Web Coverage Processing Service (WCPS). It defines a high-level query language on large, multi-dimensional raster data, such as 1D timeseries, 2D EO imagery, 3D x/y/t image time series and x/y/z geophysical data, 4D x/y/z/t climate and ocean data. We have implemented WCPS based on an Array Database Management System, rasdaman, which is available in open source. In this demonstration, we study WCPS queries on 2D, 3D, and 4D data sets. Particular emphasis is placed on the computational load queries generate in such on-demand processing and filtering. We look at different techniques and their impact on performance, such as adaptive storage partitioning, query rewriting, and just-in-time compilation. Results show that there is significant potential for effective server-side optimization once a query language is sufficiently high-level and declarative.

  4. Model-based phase-shifting interferometer

    NASA Astrophysics Data System (ADS)

    Liu, Dong; Zhang, Lei; Shi, Tu; Yang, Yongying; Chong, Shiyao; Miao, Liang; Huang, Wei; Shen, Yibing; Bai, Jian

    2015-10-01

    A model-based phase-shifting interferometer (MPI) is developed, in which a novel calculation technique is proposed instead of the traditional complicated system structure, to achieve versatile, high precision and quantitative surface tests. In the MPI, the partial null lens (PNL) is employed to implement the non-null test. With some alternative PNLs, similar as the transmission spheres in ZYGO interferometers, the MPI provides a flexible test for general spherical and aspherical surfaces. Based on modern computer modeling technique, a reverse iterative optimizing construction (ROR) method is employed for the retrace error correction of non-null test, as well as figure error reconstruction. A self-compiled ray-tracing program is set up for the accurate system modeling and reverse ray tracing. The surface figure error then can be easily extracted from the wavefront data in forms of Zernike polynomials by the ROR method. Experiments of the spherical and aspherical tests are presented to validate the flexibility and accuracy. The test results are compared with those of Zygo interferometer (null tests), which demonstrates the high accuracy of the MPI. With such accuracy and flexibility, the MPI would possess large potential in modern optical shop testing.

  5. Automatic Blocking Of QR and LU Factorizations for Locality

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yi, Q; Kennedy, K; You, H

    2004-03-26

    QR and LU factorizations for dense matrices are important linear algebra computations that are widely used in scientific applications. To efficiently perform these computations on modern computers, the factorization algorithms need to be blocked when operating on large matrices to effectively exploit the deep cache hierarchy prevalent in today's computer memory systems. Because both QR (based on Householder transformations) and LU factorization algorithms contain complex loop structures, few compilers can fully automate the blocking of these algorithms. Though linear algebra libraries such as LAPACK provides manually blocked implementations of these algorithms, by automatically generating blocked versions of the computations, moremore » benefit can be gained such as automatic adaptation of different blocking strategies. This paper demonstrates how to apply an aggressive loop transformation technique, dependence hoisting, to produce efficient blockings for both QR and LU with partial pivoting. We present different blocking strategies that can be generated by our optimizer and compare the performance of auto-blocked versions with manually tuned versions in LAPACK, both using reference BLAS, ATLAS BLAS and native BLAS specially tuned for the underlying machine architectures.« less

  6. Functional Programming in Computer Science

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Loren James; Davis, Marion Kei

    We explore functional programming through a 16-week internship at Los Alamos National Laboratory. Functional programming is a branch of computer science that has exploded in popularity over the past decade due to its high-level syntax, ease of parallelization, and abundant applications. First, we summarize functional programming by listing the advantages of functional programming languages over the usual imperative languages, and we introduce the concept of parsing. Second, we discuss the importance of lambda calculus in the theory of functional programming. Lambda calculus was invented by Alonzo Church in the 1930s to formalize the concept of effective computability, and every functionalmore » language is essentially some implementation of lambda calculus. Finally, we display the lasting products of the internship: additions to a compiler and runtime system for the pure functional language STG, including both a set of tests that indicate the validity of updates to the compiler and a compiler pass that checks for illegal instances of duplicate names.« less

  7. Programs for Testing Processor-in-Memory Computing Systems

    NASA Technical Reports Server (NTRS)

    Katz, Daniel S.

    2006-01-01

    The Multithreaded Microbenchmarks for Processor-In-Memory (PIM) Compilers, Simulators, and Hardware are computer programs arranged in a series for use in testing the performances of PIM computing systems, including compilers, simulators, and hardware. The programs at the beginning of the series test basic functionality; the programs at subsequent positions in the series test increasingly complex functionality. The programs are intended to be used while designing a PIM system, and can be used to verify that compilers, simulators, and hardware work correctly. The programs can also be used to enable designers of these system components to examine tradeoffs in implementation. Finally, these programs can be run on non-PIM hardware (either single-threaded or multithreaded) using the POSIX pthreads standard to verify that the benchmarks themselves operate correctly. [POSIX (Portable Operating System Interface for UNIX) is a set of standards that define how programs and operating systems interact with each other. pthreads is a library of pre-emptive thread routines that comply with one of the POSIX standards.

  8. Teacher Educators Developing Professional Roles: Frictions between Current and Optimal Practices

    ERIC Educational Resources Information Center

    Meeus, Wil; Cools, Wouter; Placklé, Inge

    2018-01-01

    This article reports on a study of the professional learning of Flemish teacher educators. In the first part, an exemplary survey was conducted in order to compile an inventory of the existing types of education initiatives for teacher educators in Flanders. An electronic survey was then conducted in order to identify the professional needs of…

  9. A comparison of two rough mill cutting models

    Treesearch

    Steven Ruddell; Henry Huber; Powsiri Klinkhachorn

    1990-01-01

    A comparison of lumber yield using the Automated Lumber Processing System (ALPS) Cutting Program and the Optimal Furniture Cutting Program (OFCP) was conducted on eight cutting bills. No.1 Common grade hard maple data files were compiled using a board database collected and used by the USDA Forest Service's Forest Products Laboratory to develop standard hardwood...

  10. Automatic data partitioning on distributed memory multicomputers. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Gupta, Manish

    1992-01-01

    Distributed-memory parallel computers are increasingly being used to provide high levels of performance for scientific applications. Unfortunately, such machines are not very easy to program. A number of research efforts seek to alleviate this problem by developing compilers that take over the task of generating communication. The communication overheads and the extent of parallelism exploited in the resulting target program are determined largely by the manner in which data is partitioned across different processors of the machine. Most of the compilers provide no assistance to the programmer in the crucial task of determining a good data partitioning scheme. A novel approach is presented, the constraints-based approach, to the problem of automatic data partitioning for numeric programs. In this approach, the compiler identifies some desirable requirements on the distribution of various arrays being referenced in each statement, based on performance considerations. These desirable requirements are referred to as constraints. For each constraint, the compiler determines a quality measure that captures its importance with respect to the performance of the program. The quality measure is obtained through static performance estimation, without actually generating the target data-parallel program with explicit communication. Each data distribution decision is taken by combining all the relevant constraints. The compiler attempts to resolve any conflicts between constraints such that the overall execution time of the parallel program is minimized. This approach has been implemented as part of a compiler called Paradigm, that accepts Fortran 77 programs, and specifies the partitioning scheme to be used for each array in the program. We have obtained results on some programs taken from the Linpack and Eispack libraries, and the Perfect Benchmarks. These results are quite promising, and demonstrate the feasibility of automatic data partitioning for a significant class of scientific application programs with regular computations.

  11. TECHNOLOGIES FOR UPGRADING EXISTING OR DESIGNING NEW DRINKING WATER TREATMENT FACILITIES

    EPA Science Inventory

    The publication compiles material presented at a series of workshops and helps to focus attention on the many treatment and disinfection decisions that will be facing both ground water and surface source systems over the next several years, as implementation of the 1986 Safe Drin...

  12. H-Bridge Inverter Loading Analysis for an Energy Management System

    DTIC Science & Technology

    2013-06-01

    In order to accomplish the stated objectives, a physics-based model of the system was developed in MATLAB/Simulink. The system was also implemented ...functional architecture and then compile the high level design down to VHDL in order to program the designed functions to the FPGA. B. INSULATED

  13. Reinventing the Undergraduate Curriculum: Strategies To Enhance Student Learning in Mathematics and Science.

    ERIC Educational Resources Information Center

    Nelson, Barbara J., Comp.; Wallner, Barbara K., Comp.; Powers, Myra L. Ed.; Hartley, Nancy K., Ed.

    This publication is a compilation of examples of practical, easily implemented activities to help mathematics, science, and education faculty duplicate efforts by the Rocky Mountain Teacher Education Collaborative (RMTEC) to reform and revise curriculum for preservice educators. Activities are organized by content areas: mathematics; geology,…

  14. Education for Business in Iowa: Curriculum and Reference Guide.

    ERIC Educational Resources Information Center

    Iowa State Dept. of Public Instruction, Des Moines.

    In recognition of the need to strengthen schools' efforts in developing students' awareness of the technological, consumer, occupational, recreational, and cultural aspects of business, this guide was compiled to provide information assisting those who design and implement curricula relating to business. The first division consists of statements…

  15. 76 FR 59885 - Special Supplemental Nutrition Program for Women, Infants and Children (WIC): Implementation of...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-09-28

    ... participating in the WIC Program; increased emphasis on breastfeeding promotion and support; compiling and... the support and promotion of breastfeeding. WIC has historically promoted breastfeeding to all... promotion and support of breastfeeding as an integral element of WIC services and benefits. The specific...

  16. The Pentagonal E-Portfolio Model for Selecting, Adopting, Building, and Implementing an E-Portfolio

    ERIC Educational Resources Information Center

    Buzzetto-More, Nicole; Alade, Ayodele

    2008-01-01

    Electronic portfolios are a student-centered outcomes-based assessment regime involving learners in the gathering, selection, and organization of artifacts synthesized into a compilation purposed to demonstrate knowledge, skills, and/or achievements supported by reflections that articulate the relevance, credibility, and meaning of the artifacts…

  17. Building Blocks for School IPM: A Least-Toxic Pest Management Manual.

    ERIC Educational Resources Information Center

    Crouse, Becky, Ed.; Owens, Kagan, Ed.

    This publication is a compilation of original and republished materials from numerous individuals and organizations working on pesticide reform and integrated pest management (IPM)--using alternatives to prevailing chemical-intensive practices. The manual provides comprehensive information on implementing school IPM, including a practical guide to…

  18. Evaluation of Title I ESEA Projects, 1976-77: Technical Reports. Report #77140.

    ERIC Educational Resources Information Center

    Philadelphia School District, PA. Office of Research and Evaluation.

    This volume compiles technical reports of Title I Elementary and Secondary Education Act project evaluations conducted during the 1976-77 academic year in the school district of Philadelphia, Pennsylvania. The reports include rationale, expected outcomes, mode of operation, previous evaluative findings, current implementation, and attainment of…

  19. The Status of Crisis Management at NASPA Member Institutions

    ERIC Educational Resources Information Center

    Catullo, Linda A.; Walker, David A.; Floyd, Deborah L.

    2009-01-01

    This study assessed the level of crisis preparedness in higher education from the perspective of chief student affairs administrators at residential universities post-September 11, 2001 to pre-Virginia Tech shootings in April 2007. Crisis preparedness was determined by compiling and comparing data results derived from an instrument implemented in…

  20. Promising Practices in Florida: Integrating Academic and Vocational Education.

    ERIC Educational Resources Information Center

    Jones, Betty, Comp.

    This document is a compilation of 90 successful interdisciplinary projects and activities and integrated academic and vocational curriculum ideas implemented in Florida during the past 3 years. The activities and projects have been submitted by teachers and have not been officially evaluated or reviewed. Each description provides this information:…

  1. Real-World Examples: Developing a Departmental Alumni Network

    ERIC Educational Resources Information Center

    Ashline, George

    2017-01-01

    We describe the context for and implementation of a departmental alumni network. More than a database compiling facts about graduates, this network provides students with information and inspiration. It also offers a wonderful opportunity to support lifelong learning through the development of collaborative relationships between alumni and faculty…

  2. A bibliography for the northern Madrean Biogeographic Province

    Treesearch

    Peter F. Ffolliott; Leonard F. DeBano; Gerald J. Gottfried; Daniel P. Huebner; Carl B. Edminster

    1999-01-01

    An online bibliography was compiled to furnish a literature basis for implementing of land management activities and planning research endeavors in the Madrean Biogeographic Province, which includes the Madrean Archipelago region in the southwestern United States. Citations are listed alphabetically by author in categories appropriate to the subject-matter presented....

  3. Space Station Technology, 1983

    NASA Technical Reports Server (NTRS)

    Wright, R. L. (Editor); Mays, C. R. (Editor)

    1984-01-01

    This publication is a compilation of the panel summaries presented in the following areas: systems/operations technology; crew and life support; EVA; crew and life support: ECLSS; attitude, control, and stabilization; human capabilities; auxillary propulsion; fluid management; communications; structures and mechanisms; data management; power; and thermal control. The objective of the workshop was to aid the Space Station Technology Steering Committee in defining and implementing a technology development program to support the establishment of a permanent human presence in space. This compilation will provide the participants and their organizations with the information presented at this workshop in a referenceable format. This information will establish a stepping stone for users of space station technology to develop new technology and plan future tasks.

  4. The Concert system - Compiler and runtime technology for efficient concurrent object-oriented programming

    NASA Technical Reports Server (NTRS)

    Chien, Andrew A.; Karamcheti, Vijay; Plevyak, John; Sahrawat, Deepak

    1993-01-01

    Concurrent object-oriented languages, particularly fine-grained approaches, reduce the difficulty of large scale concurrent programming by providing modularity through encapsulation while exposing large degrees of concurrency. Despite these programmability advantages, such languages have historically suffered from poor efficiency. This paper describes the Concert project whose goal is to develop portable, efficient implementations of fine-grained concurrent object-oriented languages. Our approach incorporates aggressive program analysis and program transformation with careful information management at every stage from the compiler to the runtime system. The paper discusses the basic elements of the Concert approach along with a description of the potential payoffs. Initial performance results and specific plans for system development are also detailed.

  5. Real-time robot deliberation by compilation and monitoring of anytime algorithms

    NASA Technical Reports Server (NTRS)

    Zilberstein, Shlomo

    1994-01-01

    Anytime algorithms are algorithms whose quality of results improves gradually as computation time increases. Certainty, accuracy, and specificity are metrics useful in anytime algorighm construction. It is widely accepted that a successful robotic system must trade off between decision quality and the computational resources used to produce it. Anytime algorithms were designed to offer such a trade off. A model of compilation and monitoring mechanisms needed to build robots that can efficiently control their deliberation time is presented. This approach simplifies the design and implementation of complex intelligent robots, mechanizes the composition and monitoring processes, and provides independent real time robotic systems that automatically adjust resource allocation to yield optimum performance.

  6. Implementation of a Research Information Management System in a Pediatric Hospital.

    PubMed

    Kissling, Alison D; Ballinger, Kimberly D

    2018-01-01

    Faculty publications have been collected in universities, health, and medical institutions for many years, and Cincinnati Children's is no exception. Since 1949, a yearly list of faculty publications was manually compiled using multiple data sources and disseminated by the Edward L. Pratt Research Library. Products to centralize faculty publication collection and analysis with bibliometric tools are growing in popularity. This article will review the collaborative decision to choose a Research Information Management System and the implementation process including successes, challenges, and future opportunities.

  7. OpenMP-accelerated SWAT simulation using Intel C and FORTRAN compilers: Development and benchmark

    NASA Astrophysics Data System (ADS)

    Ki, Seo Jin; Sugimura, Tak; Kim, Albert S.

    2015-02-01

    We developed a practical method to accelerate execution of Soil and Water Assessment Tool (SWAT) using open (free) computational resources. The SWAT source code (rev 622) was recompiled using a non-commercial Intel FORTRAN compiler in Ubuntu 12.04 LTS Linux platform, and newly named iOMP-SWAT in this study. GNU utilities of make, gprof, and diff were used to develop the iOMP-SWAT package, profile memory usage, and check identicalness of parallel and serial simulations. Among 302 SWAT subroutines, the slowest routines were identified using GNU gprof, and later modified using Open Multiple Processing (OpenMP) library in an 8-core shared memory system. In addition, a C wrapping function was used to rapidly set large arrays to zero by cross compiling with the original SWAT FORTRAN package. A universal speedup ratio of 2.3 was achieved using input data sets of a large number of hydrological response units. As we specifically focus on acceleration of a single SWAT run, the use of iOMP-SWAT for parameter calibrations will significantly improve the performance of SWAT optimization.

  8. Use of statecharts in the modelling of dynamic behaviour in the ATLAS DAQ prototype-1

    NASA Astrophysics Data System (ADS)

    Croll, P.; Duval, P.-Y.; Jones, R.; Kolos, S.; Sari, R. F.; Wheeler, S.

    1998-08-01

    Many applications within the ATLAS DAQ prototype-1 system have complicated dynamic behaviour which can be successfully modelled in terms of states and transitions between states. Previously, state diagrams implemented as finite-state machines have been used. Although effective, they become ungainly as system size increases. Harel statecharts address this problem by implementing additional features such as hierarchy and concurrency. The CHSM object-oriented language system is freeware which implements Harel statecharts as concurrent, hierarchical, finite-state machines (CHSMs). An evaluation of this language system by the ATLAS DAQ group has shown it to be suitable for describing the dynamic behaviour of typical DAQ applications. The language is currently being used to model the dynamic behaviour of the prototype-1 run-control system. The design is specified by means of a CHSM description file, and C++ code is obtained by running the CHSM compiler on the file. In parallel with the modelling work, a code generator has been developed which translates statecharts, drawn using the StP CASE tool, into the CHSM language. C++ code, describing the dynamic behaviour of the run-control system, has been successfully generated directly from StP statecharts using the CHSM generator and compiler. The validity of the design was tested using the simulation features of the Statemate CASE tool.

  9. Bellman’s GAP—a language and compiler for dynamic programming in sequence analysis

    PubMed Central

    Sauthoff, Georg; Möhl, Mathias; Janssen, Stefan; Giegerich, Robert

    2013-01-01

    Motivation: Dynamic programming is ubiquitous in bioinformatics. Developing and implementing non-trivial dynamic programming algorithms is often error prone and tedious. Bellman’s GAP is a new programming system, designed to ease the development of bioinformatics tools based on the dynamic programming technique. Results: In Bellman’s GAP, dynamic programming algorithms are described in a declarative style by tree grammars, evaluation algebras and products formed thereof. This bypasses the design of explicit dynamic programming recurrences and yields programs that are free of subscript errors, modular and easy to modify. The declarative modules are compiled into C++ code that is competitive to carefully hand-crafted implementations. This article introduces the Bellman’s GAP system and its language, GAP-L. It then demonstrates the ease of development and the degree of re-use by creating variants of two common bioinformatics algorithms. Finally, it evaluates Bellman’s GAP as an implementation platform of ‘real-world’ bioinformatics tools. Availability: Bellman’s GAP is available under GPL license from http://bibiserv.cebitec.uni-bielefeld.de/bellmansgap. This Web site includes a repository of re-usable modules for RNA folding based on thermodynamics. Contact: robert@techfak.uni-bielefeld.de Supplementary information: Supplementary data are available at Bioinformatics online PMID:23355290

  10. GIS-assisted spatial analysis for urban regulatory detailed planning: designer's dimension in the Chinese code system

    NASA Astrophysics Data System (ADS)

    Yu, Yang; Zeng, Zheng

    2009-10-01

    By discussing the causes behind the high amendments ratio in the implementation of urban regulatory detailed plans in China despite its law-ensured status, the study aims to reconcile conflict between the legal authority of regulatory detailed planning and the insufficient scientific support in its decision-making and compilation by introducing into the process spatial analysis based on GIS technology and 3D modeling thus present a more scientific and flexible approach to regulatory detailed planning in China. The study first points out that the current compilation process of urban regulatory detailed plan in China employs mainly an empirical approach which renders it constantly subjected to amendments; the study then discusses the need and current utilization of GIS in the Chinese system and proposes the framework of a GIS-assisted 3D spatial analysis process from the designer's perspective which can be regarded as an alternating processes between the descriptive codes and physical design in the compilation of regulatory detailed planning. With a case study of the processes and results from the application of the framework, the paper concludes that the proposed framework can be an effective instrument which provides more rationality, flexibility and thus more efficiency to the compilation and decision-making process of urban regulatory detailed plan in China.

  11. Milestones of mathematical model for business process management related to cost estimate documentation in petroleum industry

    NASA Astrophysics Data System (ADS)

    Khamidullin, R. I.

    2018-05-01

    The paper is devoted to milestones of the optimal mathematical model for a business process related to cost estimate documentation compiled during construction and reconstruction of oil and gas facilities. It describes the study and analysis of fundamental issues in petroleum industry, which are caused by economic instability and deterioration of a business strategy. Business process management is presented as business process modeling aimed at the improvement of the studied business process, namely main criteria of optimization and recommendations for the improvement of the above-mentioned business model.

  12. NASA Electronic Library System (NELS) optimization

    NASA Technical Reports Server (NTRS)

    Pribyl, William L.

    1993-01-01

    This is a compilation of NELS (NASA Electronic Library System) Optimization progress/problem, interim, and final reports for all phases. The NELS database was examined, particularly in the memory, disk contention, and CPU, to discover bottlenecks. Methods to increase the speed of NELS code were investigated. The tasks included restructuring the existing code to interact with others more effectively. An error reporting code to help detect and remove bugs in the NELS was added. Report writing tools were recommended to integrate with the ASV3 system. The Oracle database management system and tools were to be installed on a Sun workstation, intended for demonstration purposes.

  13. A methodology for generating a tailored implementation blueprint: an exemplar from a youth residential setting.

    PubMed

    Lewis, Cara C; Scott, Kelli; Marriott, Brigid R

    2018-05-16

    Tailored implementation approaches are touted as more likely to support the integration of evidence-based practices. However, to our knowledge, few methodologies for tailoring implementations exist. This manuscript will apply a model-driven, mixed methods approach to a needs assessment to identify the determinants of practice, and pilot a modified conjoint analysis method to generate an implementation blueprint using a case example of a cognitive behavioral therapy (CBT) implementation in a youth residential center. Our proposed methodology contains five steps to address two goals: (1) identify the determinants of practice and (2) select and match implementation strategies to address the identified determinants (focusing on barriers). Participants in the case example included mental health therapists and operations staff in two programs of Wolverine Human Services. For step 1, the needs assessment, they completed surveys (clinician N = 10; operations staff N = 58; other N = 7) and participated in focus groups (clinician N = 15; operations staff N = 38) guided by the domains of the Framework for Diffusion [1]. For step 2, the research team conducted mixed methods analyses following the QUAN + QUAL structure for the purpose of convergence and expansion in a connecting process, revealing 76 unique barriers. Step 3 consisted of a modified conjoint analysis. For step 3a, agency administrators prioritized the identified barriers according to feasibility and importance. For step 3b, strategies were selected from a published compilation and rated for feasibility and likelihood of impacting CBT fidelity. For step 4, sociometric surveys informed implementation team member selection and a meeting was held to identify officers and clarify goals and responsibilities. For step 5, blueprints for each of pre-implementation, implementation, and sustainment phases were generated. Forty-five unique strategies were prioritized across the 5 years and three phases representing all nine categories. Our novel methodology offers a relatively low burden collaborative approach to generating a plan for implementation that leverages advances in implementation science including measurement, models, strategy compilations, and methods from other fields.

  14. Multiparadigm Design Environments

    DTIC Science & Technology

    1992-01-01

    following results: 1. New methods for programming in terms of conceptual models 2. Design of object-oriented languages 3. Compiler optimization and...experimented with object-based methods for programming directly in terms of conceptual models, object-oriented language design, computer program...expect the3e results to have a strong influence on future ,,j :- ...... L ! . . • a mm ammmml ll Illlll • l I 1 Conceptual Programming Conceptual

  15. General Algebraic Modeling System Tutorial | High-Performance Computing |

    Science.gov Websites

    power generation from two different fuels. The goal is to minimize the cost for one of the fuels while Here's a basic tutorial for modeling optimization problems with the General Algebraic Modeling System (GAMS). Overview The GAMS (General Algebraic Modeling System) package is essentially a compiler for a

  16. Quality standards for predialysis education: results from a consensus conference

    PubMed Central

    Isnard Bagnis, Corinne; Crepaldi, Carlo; Dean, Jessica; Goovaerts, Tony; Melander, Stefan; Nilsson, Eva-Lena; Prieto-Velasco, Mario; Trujillo, Carmen; Zambon, Roberto; Mooney, Andrew

    2015-01-01

    This position statement was compiled following an expert meeting in March 2013, Zurich, Switzerland. Attendees were invited from a spread of European renal units with established and respected renal replacement therapy option education programmes. Discussions centred around optimal ways of creating an education team, setting realistic and meaningful objectives for patient education, and assessing the quality of education delivered. PMID:24957808

  17. 76 FR 57636 - Privacy Act of 1974: Implementation and Amendment of Exemptions

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-09-16

    ..., application of the exemptions to the three new systems of records is necessary to protect information compiled... safety and security of employees in the workplace. Access to such information could allow the subject of...) Records; (8) SEC Security in the Workplace Incident Records; and (9) Investor Response Information System...

  18. Exercises to Accompany Mathematics 301. Curriculum Support Series.

    ERIC Educational Resources Information Center

    Manitoba Dept. of Education, Winnipeg.

    These sample problems, exercises, questions, and projects were compiled to supplement the guide for the Manitoba course Mathematics 301 in order to assist teachers in implementing the program. Arranged according to the modules of the course guide, they are coded to the objectives of the program. Review exercises follow either the subtopics within…

  19. Profiling under UNIX by patching

    NASA Technical Reports Server (NTRS)

    Bishop, Matt

    1986-01-01

    Profiling under UNIX is done by inserting counters into programs either before or during the compilation or assembly phases. A fourth type of profiling involves monitoring the execution of a program, and gathering relevant statistics during the run. This method and an implementation of this method are examined, and its advantages and disadvantages are discussed.

  20. Semantic Language Extensions for Implicit Parallel Programming

    DTIC Science & Technology

    2013-09-01

    mobile CPU interacts with a GPU on the same device and a cloud based backend at a remote location presents endless possibilities for solving com...for his contribution to the compiler infrastructure . His creativity in solving research problems and expertise in architecting and implementing...92 5.5.1 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5.2 Backend

  1. Peer Tutoring: A Guide to Program Design. Research and Development Series No. 260.

    ERIC Educational Resources Information Center

    Ashley, William L.; And Others

    This publication presents guidelines for planning, implementing, and evaluating a peer tutoring program within a vocational setting. Chapter 1 discusses benefits of peer tutoring and presents a compilation of guidelines, suggestions, and examples for planning, developing, and evaluating a peer tutoring program. Tasks in each area--program…

  2. Guidelines for Planning and Implementing a Comprehensive Community Environmental Inventory. Revised, 1972.

    ERIC Educational Resources Information Center

    Bennett, Dean B.; MacGown, Richard H.

    A comprehensive community environmental inventory is an ongoing process of investigation and study to compile and evaluate information about the natural and man-made environmental features and characteristics of an area, as well as related social, political, and economic information. Such information is important to the community in developmental…

  3. Resource Guide for Crisis Management in Schools.

    ERIC Educational Resources Information Center

    LaPointe, Richard T.; And Others

    A crisis can occur at any time, whether or not a school's staff plans for it. This resource guide is a compilation of user-friendly examples of policies, procedures, guidelines, checklists, and forms to help Virginia schools develop and implement a systematic crisis-management plan. Chapter 1 provides an introductory overview of the essential…

  4. Facing the Future--On the Edge of a New Millennium. University of Hawaii Community Colleges Report.

    ERIC Educational Resources Information Center

    Tsunoda, Joyce S.

    Compiled by the University of Hawaii Community Colleges (UHCC), this 1995 comprehensive report provides information about the seven UHCC campuses, focusing on educational programs, accomplishments, and enrollment. Following a message from the Chancellor, the report describes educational and employment training efforts implemented by the UHCC to…

  5. Implementing E-Government: A Case Study of Improving the Process for Transferring Conventional Ammunition Among the Military Services

    DTIC Science & Technology

    2003-03-01

    BUSINESS PROCESS REDESIGN.............................................................22 F . SECOND WAVE BPR...Receive/Review/Decide on Lot Data.........................58 f . Step 7 – Create Cross-Leveling Request.................................60 g. Step 8...Compile Cross-Leveling Request...............................60 h. Steps 9 and 10 – (No Change) ................................................60 F

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Allgood, Tiffany L.; Sorter, Andy

    The Coeur d'Alene Tribe's Energy Efficiency Feasibility Study (EEFS) is the culminating document that compiles the energy efficiency and building performance assessment and project prioritization process completed on 36 Tribally owned and operated facilities within Tribal lands. The EEFS contains sections on initial findings, utility billing analyses, energy conservation measures and prioritization and funding sources and strategies for energy project implementation.

  7. 24 CFR 13.4 - Reports.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 24 Housing and Urban Development 1 2013-04-01 2013-04-01 false Reports. 13.4 Section 13.4 Housing... PENALTY MAIL IN THE LOCATION AND RECOVERY OF MISSING CHILDREN § 13.4 Reports. HUD shall compile and submit... report on its experience in implementing S. 1195 Official Mail Use in the Location and Recovery of...

  8. 24 CFR 13.4 - Reports.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 24 Housing and Urban Development 1 2011-04-01 2011-04-01 false Reports. 13.4 Section 13.4 Housing... PENALTY MAIL IN THE LOCATION AND RECOVERY OF MISSING CHILDREN § 13.4 Reports. HUD shall compile and submit... report on its experience in implementing S. 1195 Official Mail Use in the Location and Recovery of...

  9. 24 CFR 13.4 - Reports.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 24 Housing and Urban Development 1 2012-04-01 2012-04-01 false Reports. 13.4 Section 13.4 Housing... PENALTY MAIL IN THE LOCATION AND RECOVERY OF MISSING CHILDREN § 13.4 Reports. HUD shall compile and submit... report on its experience in implementing S. 1195 Official Mail Use in the Location and Recovery of...

  10. 24 CFR 13.4 - Reports.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 24 Housing and Urban Development 1 2010-04-01 2010-04-01 false Reports. 13.4 Section 13.4 Housing... PENALTY MAIL IN THE LOCATION AND RECOVERY OF MISSING CHILDREN § 13.4 Reports. HUD shall compile and submit... report on its experience in implementing S. 1195 Official Mail Use in the Location and Recovery of...

  11. 24 CFR 13.4 - Reports.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 24 Housing and Urban Development 1 2014-04-01 2014-04-01 false Reports. 13.4 Section 13.4 Housing... PENALTY MAIL IN THE LOCATION AND RECOVERY OF MISSING CHILDREN § 13.4 Reports. HUD shall compile and submit... report on its experience in implementing S. 1195 Official Mail Use in the Location and Recovery of...

  12. International Reports on Literacy Research: Argentina, Mexico, France

    ERIC Educational Resources Information Center

    Malloy, Jacquelynn A., Comp.; Mallozzi, Christine, Comp.

    2007-01-01

    This is a compilation of reports on international literacy research. The report includes 3 separate reports on Argentina, Mexico, and France. In the first report, Melina Porto reports on a new implementation of a teacher-education program currently underway in the province of Buenos Aires, Argentina, under the leadership of teacher-researcher…

  13. Mileage-based user fees : defining a path toward implementation phase 2, an assessment of technology issues : final report

    DOT National Transportation Integrated Search

    2009-10-01

    This report reviews technology options for a mileage-based user fee system in the state of Texas. The report was : compiled based on input from a diverse range of sources, including a literature review of existing mileage-based : user fee technical w...

  14. Nutrition Education Development Project. ESEA Title IVC Project. Pamphlet File.

    ERIC Educational Resources Information Center

    Luzerne Intermediate Unit 18, Kingston, PA.

    This compilation of pamphlets and other educational materials on current issues in nutrition provides elementary and secondary teachers with a list of free or inexpensive materials to help them implement nutrition education into the existing curriculum. This list, organized in alphabetical order by topic, contains 293 entries and each one includes…

  15. Augmenting and updating NASA spacelink electronic information system

    NASA Technical Reports Server (NTRS)

    Blake, Jean A.

    1989-01-01

    The development of Spacelink during its gestation, birth, infancy, and childhood are described. In addition to compiling and developing more material for implementation in Spacelink, Summer 1989 was spent scanning the insignias of the various manned missions into Spacelink. Material for the above was extracted from existing NASA publications, documents and photographs.

  16. Implementation of a Compiler for the Functional Programming Language PHI.

    DTIC Science & Technology

    1987-06-01

    Chapter Three. 8 his acceptance speech for the 1977 ACM Turing Award, Backus criticized traditional programming languages and programming styles. He went... Knn "mfrn ~ i ptr ->type =type; :.f (f~ead :=NULL) { -st alreaay ex-s~ tracer f head; wnile (tracer->iink - NU:LL) rdent of >-sl tracer = : racer- Iik

  17. Compiling probabilistic, bio-inspired circuits on a field programmable analog array

    PubMed Central

    Marr, Bo; Hasler, Jennifer

    2014-01-01

    A field programmable analog array (FPAA) is presented as an energy and computational efficiency engine: a mixed mode processor for which functions can be compiled at significantly less energy costs using probabilistic computing circuits. More specifically, it will be shown that the core computation of any dynamical system can be computed on the FPAA at significantly less energy per operation than a digital implementation. A stochastic system that is dynamically controllable via voltage controlled amplifier and comparator thresholds is implemented, which computes Bernoulli random variables. From Bernoulli variables it is shown exponentially distributed random variables, and random variables of an arbitrary distribution can be computed. The Gillespie algorithm is simulated to show the utility of this system by calculating the trajectory of a biological system computed stochastically with this probabilistic hardware where over a 127X performance improvement over current software approaches is shown. The relevance of this approach is extended to any dynamical system. The initial circuits and ideas for this work were generated at the 2008 Telluride Neuromorphic Workshop. PMID:24847199

  18. Intelligent microchip networks: an agent-on-chip synthesis framework for the design of smart and robust sensor networks

    NASA Astrophysics Data System (ADS)

    Bosse, Stefan

    2013-05-01

    Sensorial materials consisting of high-density, miniaturized, and embedded sensor networks require new robust and reliable data processing and communication approaches. Structural health monitoring is one major field of application for sensorial materials. Each sensor node provides some kind of sensor, electronics, data processing, and communication with a strong focus on microchip-level implementation to meet the goals of miniaturization and low-power energy environments, a prerequisite for autonomous behaviour and operation. Reliability requires robustness of the entire system in the presence of node, link, data processing, and communication failures. Interaction between nodes is required to manage and distribute information. One common interaction model is the mobile agent. An agent approach provides stronger autonomy than a traditional object or remote-procedure-call based approach. Agents can decide for themselves, which actions are performed, and they are capable of flexible behaviour, reacting on the environment and other agents, providing some degree of robustness. Traditionally multi-agent systems are abstract programming models which are implemented in software and executed on program controlled computer architectures. This approach does not well scale to micro-chip level and requires full equipped computers and communication structures, and the hardware architecture does not consider and reflect the requirements for agent processing and interaction. We propose and demonstrate a novel design paradigm for reliable distributed data processing systems and a synthesis methodology and framework for multi-agent systems implementable entirely on microchip-level with resource and power constrained digital logic supporting Agent-On-Chip architectures (AoC). The agent behaviour and mobility is fully integrated on the micro-chip using pipelined communicating processes implemented with finite-state machines and register-transfer logic. The agent behaviour, interaction (communication), and mobility features are modelled and specified on a machine-independent abstract programming level using a state-based agent behaviour language (APL). With this APL a high-level agent compiler is able to synthesize a hardware model (RTL, VHDL), a software model (C, ML), or a simulation model (XML) suitable to simulate a multi-agent system using the SeSAm simulator framework. Agent communication is provided by a simple tuple-space database implemented on node level providing fault tolerant access of global data. A novel synthesis development kit (SynDK) based on a graph-structured database approach is introduced to support the rapid development of compilers and synthesis tools, used for example for the design and implementation of the APL compiler.

  19. Computer enhancement through interpretive techniques

    NASA Technical Reports Server (NTRS)

    Foster, G.; Spaanenburg, H. A. E.; Stumpf, W. E.

    1972-01-01

    The improvement in the usage of the digital computer through the use of the technique of interpretation rather than the compilation of higher ordered languages was investigated by studying the efficiency of coding and execution of programs written in FORTRAN, ALGOL, PL/I and COBOL. FORTRAN was selected as the high level language for examining programs which were compiled, and A Programming Language (APL) was chosen for the interpretive language. It is concluded that APL is competitive, not because it and the algorithms being executed are well written, but rather because the batch processing is less efficient than has been admitted. There is not a broad base of experience founded on trying different implementation strategies which have been targeted at open competition with traditional processing methods.

  20. A Compilation of Global Bio-Optical in Situ Data for Ocean-Colour Satellite Applications

    NASA Technical Reports Server (NTRS)

    Valente, Andre; Sathyendranath, Shubha; Brotus, Vanda; Groom, Steve; Grant, Michael; Taberner, Malcolm; Antoine, David; Arnone, Robert; Balch, William M.; Barker, Kathryn; hide

    2016-01-01

    A compiled set of in situ data is important to evaluate the quality of ocean-colour satellite-data records. Here we describe the data compiled for the validation of the ocean-colour products from the ESA Ocean Colour Climate Change Initiative (OC-CCI). The data were acquired from several sources (MOBY, BOUSSOLE, AERONET-OC, SeaBASS, NOMAD, MERMAID, AMT, ICES, HOT, GePCO), span between 1997 and 2012, and have a global distribution. Observations of the following variables were compiled: spectral remote-sensing reflectances, concentrations of chlorophyll a, spectral inherent optical properties and spectral diffuse attenuation coefficients. The data were from multi-project archives acquired via the open internet services or from individual projects, acquired directly from data providers. Methodologies were implemented for homogenisation, quality control and merging of all data. No changes were made to the original data, other than averaging of observations that were close in time and space, elimination of some points after quality control and conversion to a standard format. The final result is a merged table designed for validation of satellite-derived ocean-colour products and available in text format. Metadata of each in situ measurement (original source, cruise or experiment, principal investigator) were preserved throughout the work and made available in the final table. Using all the data in a validation exercise increases the number of matchups and enhances the representativeness of different marine regimes. By making available the metadata, it is also possible to analyse each set of data separately. The compiled data are available at doi:10.1594PANGAEA.854832 (Valente et al., 2015).

  1. An efficient, scalable, and adaptable framework for solving generic systems of level-set PDEs

    PubMed Central

    Mosaliganti, Kishore R.; Gelas, Arnaud; Megason, Sean G.

    2013-01-01

    In the last decade, level-set methods have been actively developed for applications in image registration, segmentation, tracking, and reconstruction. However, the development of a wide variety of level-set PDEs and their numerical discretization schemes, coupled with hybrid combinations of PDE terms, stopping criteria, and reinitialization strategies, has created a software logistics problem. In the absence of an integrative design, current toolkits support only specific types of level-set implementations which restrict future algorithm development since extensions require significant code duplication and effort. In the new NIH/NLM Insight Toolkit (ITK) v4 architecture, we implemented a level-set software design that is flexible to different numerical (continuous, discrete, and sparse) and grid representations (point, mesh, and image-based). Given that a generic PDE is a summation of different terms, we used a set of linked containers to which level-set terms can be added or deleted at any point in the evolution process. This container-based approach allows the user to explore and customize terms in the level-set equation at compile-time in a flexible manner. The framework is optimized so that repeated computations of common intensity functions (e.g., gradient and Hessians) across multiple terms is eliminated. The framework further enables the evolution of multiple level-sets for multi-object segmentation and processing of large datasets. For doing so, we restrict level-set domains to subsets of the image domain and use multithreading strategies to process groups of subdomains or level-set functions. Users can also select from a variety of reinitialization policies and stopping criteria. Finally, we developed a visualization framework that shows the evolution of a level-set in real-time to help guide algorithm development and parameter optimization. We demonstrate the power of our new framework using confocal microscopy images of cells in a developing zebrafish embryo. PMID:24501592

  2. An efficient, scalable, and adaptable framework for solving generic systems of level-set PDEs.

    PubMed

    Mosaliganti, Kishore R; Gelas, Arnaud; Megason, Sean G

    2013-01-01

    In the last decade, level-set methods have been actively developed for applications in image registration, segmentation, tracking, and reconstruction. However, the development of a wide variety of level-set PDEs and their numerical discretization schemes, coupled with hybrid combinations of PDE terms, stopping criteria, and reinitialization strategies, has created a software logistics problem. In the absence of an integrative design, current toolkits support only specific types of level-set implementations which restrict future algorithm development since extensions require significant code duplication and effort. In the new NIH/NLM Insight Toolkit (ITK) v4 architecture, we implemented a level-set software design that is flexible to different numerical (continuous, discrete, and sparse) and grid representations (point, mesh, and image-based). Given that a generic PDE is a summation of different terms, we used a set of linked containers to which level-set terms can be added or deleted at any point in the evolution process. This container-based approach allows the user to explore and customize terms in the level-set equation at compile-time in a flexible manner. The framework is optimized so that repeated computations of common intensity functions (e.g., gradient and Hessians) across multiple terms is eliminated. The framework further enables the evolution of multiple level-sets for multi-object segmentation and processing of large datasets. For doing so, we restrict level-set domains to subsets of the image domain and use multithreading strategies to process groups of subdomains or level-set functions. Users can also select from a variety of reinitialization policies and stopping criteria. Finally, we developed a visualization framework that shows the evolution of a level-set in real-time to help guide algorithm development and parameter optimization. We demonstrate the power of our new framework using confocal microscopy images of cells in a developing zebrafish embryo.

  3. On the tradeoffs of programming language choice for numerical modelling in geoscience. A case study comparing modern Fortran, C++/Blitz++ and Python/NumPy.

    NASA Astrophysics Data System (ADS)

    Jarecka, D.; Arabas, S.; Fijalkowski, M.; Gaynor, A.

    2012-04-01

    The language of choice for numerical modelling in geoscience has long been Fortran. A choice of a particular language and coding paradigm comes with different set of tradeoffs such as that between performance, ease of use (and ease of abuse), code clarity, maintainability and reusability, availability of open source compilers, debugging tools, adequate external libraries and parallelisation mechanisms. The availability of trained personnel and the scale and activeness of the developer community is of importance as well. We present a short comparison study aimed at identification and quantification of these tradeoffs for a particular example of an object oriented implementation of a parallel 2D-advection-equation solver in Python/NumPy, C++/Blitz++ and modern Fortran. The main angles of comparison will be complexity of implementation, performance of various compilers or interpreters and characterisation of the "added value" gained by a particular choice of the language. The choice of the numerical problem is dictated by the aim to make the comparison useful and meaningful to geoscientists. Python is chosen as a language that traditionally is associated with ease of use, elegant syntax but limited performance. C++ is chosen for its traditional association with high performance but even higher complexity and syntax obscurity. Fortran is included in the comparison for its widespread use in geoscience often attributed to its performance. We confront the validity of these traditional views. We point out how the usability of a particular language in geoscience depends on the characteristics of the language itself and the availability of pre-existing software libraries (e.g. NumPy, SciPy, PyNGL, PyNIO, MPI4Py for Python and Blitz++, Boost.Units, Boost.MPI for C++). Having in mind the limited complexity of the considered numerical problem, we present a tentative comparison of performance of the three implementations with different open source compilers including CPython and PyPy, Clang++ and GNU g++, and GNU gfortran.

  4. STAR- A SIMPLE TOOL FOR AUTOMATED REASONING SUPPORTING HYBRID APPLICATIONS OF ARTIFICIAL INTELLIGENCE (DEC VAX VERSION)

    NASA Technical Reports Server (NTRS)

    Borchardt, G. C.

    1994-01-01

    The Simple Tool for Automated Reasoning program (STAR) is an interactive, interpreted programming language for the development and operation of artificial intelligence (AI) application systems. STAR provides an environment for integrating traditional AI symbolic processing with functions and data structures defined in compiled languages such as C, FORTRAN and PASCAL. This type of integration occurs in a number of AI applications including interpretation of numerical sensor data, construction of intelligent user interfaces to existing compiled software packages, and coupling AI techniques with numerical simulation techniques and control systems software. The STAR language was created as part of an AI project for the evaluation of imaging spectrometer data at NASA's Jet Propulsion Laboratory. Programming in STAR is similar to other symbolic processing languages such as LISP and CLIP. STAR includes seven primitive data types and associated operations for the manipulation of these structures. A semantic network is used to organize data in STAR, with capabilities for inheritance of values and generation of side effects. The AI knowledge base of STAR can be a simple repository of records or it can be a highly interdependent association of implicit and explicit components. The symbolic processing environment of STAR may be extended by linking the interpreter with functions defined in conventional compiled languages. These external routines interact with STAR through function calls in either direction, and through the exchange of references to data structures. The hybrid knowledge base may thus be accessed and processed in general by either side of the application. STAR is initially used to link externally compiled routines and data structures. It is then invoked to interpret the STAR rules and symbolic structures. In a typical interactive session, the user enters an expression to be evaluated, STAR parses the input, evaluates the expression, performs any file input/output required, and displays the results. The STAR interpreter is written in the C language for interactive execution. It has been implemented on a VAX 11/780 computer operating under VMS, and the UNIX version has been implemented on a Sun Microsystems 2/170 workstation. STAR has a memory requirement of approximately 200K of 8 bit bytes, excluding externally compiled functions and application-dependent symbolic definitions. This program was developed in 1985.

  5. STAR- A SIMPLE TOOL FOR AUTOMATED REASONING SUPPORTING HYBRID APPLICATIONS OF ARTIFICIAL INTELLIGENCE (UNIX VERSION)

    NASA Technical Reports Server (NTRS)

    Borchardt, G. C.

    1994-01-01

    The Simple Tool for Automated Reasoning program (STAR) is an interactive, interpreted programming language for the development and operation of artificial intelligence (AI) application systems. STAR provides an environment for integrating traditional AI symbolic processing with functions and data structures defined in compiled languages such as C, FORTRAN and PASCAL. This type of integration occurs in a number of AI applications including interpretation of numerical sensor data, construction of intelligent user interfaces to existing compiled software packages, and coupling AI techniques with numerical simulation techniques and control systems software. The STAR language was created as part of an AI project for the evaluation of imaging spectrometer data at NASA's Jet Propulsion Laboratory. Programming in STAR is similar to other symbolic processing languages such as LISP and CLIP. STAR includes seven primitive data types and associated operations for the manipulation of these structures. A semantic network is used to organize data in STAR, with capabilities for inheritance of values and generation of side effects. The AI knowledge base of STAR can be a simple repository of records or it can be a highly interdependent association of implicit and explicit components. The symbolic processing environment of STAR may be extended by linking the interpreter with functions defined in conventional compiled languages. These external routines interact with STAR through function calls in either direction, and through the exchange of references to data structures. The hybrid knowledge base may thus be accessed and processed in general by either side of the application. STAR is initially used to link externally compiled routines and data structures. It is then invoked to interpret the STAR rules and symbolic structures. In a typical interactive session, the user enters an expression to be evaluated, STAR parses the input, evaluates the expression, performs any file input/output required, and displays the results. The STAR interpreter is written in the C language for interactive execution. It has been implemented on a VAX 11/780 computer operating under VMS, and the UNIX version has been implemented on a Sun Microsystems 2/170 workstation. STAR has a memory requirement of approximately 200K of 8 bit bytes, excluding externally compiled functions and application-dependent symbolic definitions. This program was developed in 1985.

  6. Programming models for energy-aware systems

    NASA Astrophysics Data System (ADS)

    Zhu, Haitao

    Energy efficiency is an important goal of modern computing, with direct impact on system operational cost, reliability, usability and environmental sustainability. This dissertation describes the design and implementation of two innovative programming languages for constructing energy-aware systems. First, it introduces ET, a strongly typed programming language to promote and facilitate energy-aware programming, with a novel type system design called Energy Types. Energy Types is built upon a key insight into today's energy-efficient systems and applications: despite the popular perception that energy and power can only be described in joules and watts, real-world energy management is often based on discrete phases and modes, which in turn can be reasoned about by type systems very effectively. A phase characterizes a distinct pattern of program workload, and a mode represents an energy state the program is expected to execute in. Energy Types is designed to reason about energy phases and energy modes, bringing programmers into the optimization of energy management. Second, the dissertation develops Eco, an energy-aware programming language centering around sustainability. A sustainable program built from Eco is able to adaptively adjusts its own behaviors to stay on a given energy budget, avoiding both deficit that would lead to battery drain or CPU overheating, and surplus that could have been used to improve the quality of the program output. Sustainability is viewed as a form of supply and demand matching, and a sustainable program consistently maintains the equilibrium between supply and demand. ET is implemented as a prototyped compiler for smartphone programming on Android, and Eco is implemented as a minimal extension to Java. Programming practices and benchmarking experiments in these two new languages showed that ET can lead to significant energy savings for Android Apps and Eco can efficiently promote battery awareness and temperature awareness in real-world Java programs.

  7. Rapid algorithm prototyping and implementation for power quality measurement

    NASA Astrophysics Data System (ADS)

    Kołek, Krzysztof; Piątek, Krzysztof

    2015-12-01

    This article presents a Model-Based Design (MBD) approach to rapidly implement power quality (PQ) metering algorithms. Power supply quality is a very important aspect of modern power systems and will become even more important in future smart grids. In this case, maintaining the PQ parameters at the desired level will require efficient implementation methods of the metering algorithms. Currently, the development of new, advanced PQ metering algorithms requires new hardware with adequate computational capability and time intensive, cost-ineffective manual implementations. An alternative, considered here, is an MBD approach. The MBD approach focuses on the modelling and validation of the model by simulation, which is well-supported by a Computer-Aided Engineering (CAE) packages. This paper presents two algorithms utilized in modern PQ meters: a phase-locked loop based on an Enhanced Phase Locked Loop (EPLL), and the flicker measurement according to the IEC 61000-4-15 standard. The algorithms were chosen because of their complexity and non-trivial development. They were first modelled in the MATLAB/Simulink package, then tested and validated in a simulation environment. The models, in the form of Simulink diagrams, were next used to automatically generate C code. The code was compiled and executed in real-time on the Zynq Xilinx platform that combines a reconfigurable Field Programmable Gate Array (FPGA) with a dual-core processor. The MBD development of PQ algorithms, automatic code generation, and compilation form a rapid algorithm prototyping and implementation path for PQ measurements. The main advantage of this approach is the ability to focus on the design, validation, and testing stages while skipping over implementation issues. The code generation process renders production-ready code that can be easily used on the target hardware. This is especially important when standards for PQ measurement are in constant development, and the PQ issues in emerging smart grids will require tools for rapid development and implementation of such algorithms.

  8. Large-Scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation

    DTIC Science & Technology

    2016-08-10

    AFRL-AFOSR-JP-TR-2016-0073 Large-scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation ...2016 4.  TITLE AND SUBTITLE Large-scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation 5a...performances on various machine learning tasks and it naturally lends itself to fast parallel implementations . Despite this, very little work has been

  9. An Ada implementation of the network manager for the advanced information processing system

    NASA Technical Reports Server (NTRS)

    Nagle, Gail A.

    1986-01-01

    From an implementation standpoint, the Ada language provided many features which facilitated the data and procedure abstraction process. The language supported a design which was dynamically flexible (despite strong typing), modular, and self-documenting. Adequate training of programmers requires access to an efficient compiler which supports full Ada. When the performance issues for real time processing are finally addressed by more stringent requirements for tasking features and the development of efficient run-time environments for embedded systems, the full power of the language will be realized.

  10. Ada Implementation Guide. Software Engineering With Ada. Volume 2

    DTIC Science & Technology

    1994-04-01

    copy of the latest Ada Compiler Validation Capability (ACVC), the validation test suite ADA-BIB 10/15/91 2048 How to obtain the AJPO’S Ada...A I A-4Department of the Navy I I I 3 Helpful Sources AF-INT9I 8/12/91 2048 Text of Air Force 1991 Interpretation of Congressional Mandate SAF-POL88...the Ada language I 3 Ada Implementation Guide A--45 I I Helpful Sources CREASE 11/27/91 2048 How to obtain AJPO’s April 1988 CREASE Version 5.0 3

  11. Automated Test for NASA CFS

    NASA Technical Reports Server (NTRS)

    McComas, David C.; Strege, Susanne L.; Carpenter, Paul B. Hartman, Randy

    2015-01-01

    The core Flight System (cFS) is a flight software (FSW) product line developed by the Flight Software Systems Branch (FSSB) at NASA's Goddard Space Flight Center (GSFC). The cFS uses compile-time configuration parameters to implement variable requirements to enable portability across embedded computing platforms and to implement different end-user functional needs. The verification and validation of these requirements is proving to be a significant challenge. This paper describes the challenges facing the cFS and the results of a pilot effort to apply EXB Solution's testing approach to the cFS applications.

  12. The Telecommunications and Data Acquisition Report

    NASA Technical Reports Server (NTRS)

    Posner, Edward C. (Editor)

    1991-01-01

    A compilation is presented of articles on developments in programs managed by JPL's Office of Telecommunications and Data Acquisition. In space communications, radio navigation, radio science, and ground based radio and radar astronomy, activities of the Deep Space Network are reported in planning, in supporting research and technology, in implementation, and in operations. Also included is standards activity at JPL for space data and information systems and reimbursable DSN work performed for other space agencies through NASA. In the search for extraterrestrial intelligence (SETI), implementation and operations are reported for searching the microwave spectrum.

  13. A learning apprentice for software parts composition

    NASA Technical Reports Server (NTRS)

    Allen, Bradley P.; Holtzman, Peter L.

    1987-01-01

    An overview of the knowledge acquisition component of the Bauhaus, a prototype computer aided software engineering (CASE) workstation for the development of domain-specific automatic programming systems (D-SAPS) is given. D-SAPS use domain knowledge in the refinement of a description of an application program into a compilable implementation. The approach to the construction of D-SAPS was to automate the process of refining a description of a program, expressed in an object-oriented domain language, into a configuration of software parts that implement the behavior of the domain objects.

  14. Comparing Neuromorphic Solutions in Action: Implementing a Bio-Inspired Solution to a Benchmark Classification Task on Three Parallel-Computing Platforms

    PubMed Central

    Diamond, Alan; Nowotny, Thomas; Schmuker, Michael

    2016-01-01

    Neuromorphic computing employs models of neuronal circuits to solve computing problems. Neuromorphic hardware systems are now becoming more widely available and “neuromorphic algorithms” are being developed. As they are maturing toward deployment in general research environments, it becomes important to assess and compare them in the context of the applications they are meant to solve. This should encompass not just task performance, but also ease of implementation, speed of processing, scalability, and power efficiency. Here, we report our practical experience of implementing a bio-inspired, spiking network for multivariate classification on three different platforms: the hybrid digital/analog Spikey system, the digital spike-based SpiNNaker system, and GeNN, a meta-compiler for parallel GPU hardware. We assess performance using a standard hand-written digit classification task. We found that whilst a different implementation approach was required for each platform, classification performances remained in line. This suggests that all three implementations were able to exercise the model's ability to solve the task rather than exposing inherent platform limits, although differences emerged when capacity was approached. With respect to execution speed and power consumption, we found that for each platform a large fraction of the computing time was spent outside of the neuromorphic device, on the host machine. Time was spent in a range of combinations of preparing the model, encoding suitable input spiking data, shifting data, and decoding spike-encoded results. This is also where a large proportion of the total power was consumed, most markedly for the SpiNNaker and Spikey systems. We conclude that the simulation efficiency advantage of the assessed specialized hardware systems is easily lost in excessive host-device communication, or non-neuronal parts of the computation. These results emphasize the need to optimize the host-device communication architecture for scalability, maximum throughput, and minimum latency. Moreover, our results indicate that special attention should be paid to minimize host-device communication when designing and implementing networks for efficient neuromorphic computing. PMID:26778950

  15. Pipeline Optimization Program (PLOP)

    DTIC Science & Technology

    2006-08-01

    the framework of the Dredging Operations Decision Support System (DODSS, https://dodss.wes.army.mil/wiki/0). PLOP compiles industry standards and...efficiency point ( BEP ). In the interest of acceptable wear rate on the pump, industrial standards dictate that the flow Figure 2. Pump class as a function of...percentage of the flow rate corresponding to the BEP . Pump Acceptability Rules. The facts for pump performance, industrial standards and pipeline and

  16. The Hermod Behavioral Synthesis System

    DTIC Science & Technology

    1988-06-08

    LDescription 1 lib tech-independent Transformation & Parser Optimization lib Hardware • g - utSynhesze Generator li Datapath lb Hardware liCotllb...Proc. 22nd Design Automation Conference, ACM/IEEE, June 1985, pp. 475-481. [7] G . De Micheli, "Synthesis of Control Systems", in Design Systems for...VLSI Circuits: Logic Synthesis and Silicon Compilation, G . De Micheli, A. Sangiovanni-Vincentelli, and P. Antognetti, (editor), Martinus Nijhoff

  17. A portable approach for PIC on emerging architectures

    NASA Astrophysics Data System (ADS)

    Decyk, Viktor

    2016-03-01

    A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.

  18. Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.

    Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less

  19. Machine-learned and codified synthesis parameters of oxide materials

    NASA Astrophysics Data System (ADS)

    Kim, Edward; Huang, Kevin; Tomala, Alex; Matthews, Sara; Strubell, Emma; Saunders, Adam; McCallum, Andrew; Olivetti, Elsa

    2017-09-01

    Predictive materials design has rapidly accelerated in recent years with the advent of large-scale resources, such as materials structure and property databases generated by ab initio computations. In the absence of analogous ab initio frameworks for materials synthesis, high-throughput and machine learning techniques have recently been harnessed to generate synthesis strategies for select materials of interest. Still, a community-accessible, autonomously-compiled synthesis planning resource which spans across materials systems has not yet been developed. In this work, we present a collection of aggregated synthesis parameters computed using the text contained within over 640,000 journal articles using state-of-the-art natural language processing and machine learning techniques. We provide a dataset of synthesis parameters, compiled autonomously across 30 different oxide systems, in a format optimized for planning novel syntheses of materials.

  20. RTE: A UNIX library with on-line documentation and sample programs for microwave radiative transfer calculations

    NASA Astrophysics Data System (ADS)

    Reynolds, J. C.; Schroeder, J. A.

    1993-03-01

    The FORTRAN library that the NOAA Wave Propagation Laboratory (WPL) developed to perform radiative transfer calculations for an upward-looking microwave radiometer is described. Although the theory and algorithms have been used for many years in WPL radiometer research, the Radiative Transfer Equation (RTE) software has combined them into a toolbox that is portable, readable, application independent, and easy to update. RTE has been optimized for the UNIX environment. However, the FORTRAN source code can be compiled on any platform that provides a Standard FORTRAN 77 compiler. RTE allows a user to do cloud modeling, calibrate radiometers, simulate hypothetical radiometer systems, develop retrieval techniques, and compute weighting functions. The radiative transfer model used is valid for channel frequencies below 1000 GHz in clear conditions and for frequencies below 100 GHz when clouds are present.

  1. Compile-Time Schedulability Analysis of Communicating Concurrent Programs

    DTIC Science & Technology

    2006-06-28

    synchronize via the read and write operations on the FIFO channels. These operations have been implemented with the help of semaphores , which...3 1.1.2 Synchronous Dataflow . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.1.3 Boolean Dataflow...described by concurrent programs . . . . . . . . . 4 1.3 A synchronous dataflow model, its topology matrix, and repetition vector . 10 1.4 Select and

  2. Relationships between Federal Accountability Mandates and Principal Turnover within Georgia Public Elementary Schools

    ERIC Educational Resources Information Center

    Garbade, Amy Leigh

    2013-01-01

    This study examines principal turnover in Georgia public elementary schools during a time period prior to the existence of the No Child Left Behind Act of 2001, through the law's full implementation. Data was compiled for the fourteen-year period and examined to determine if a relationship existed between principal attrition rates and the…

  3. Documenting and Maintaining Native American Languages for the 21st Century: The Indiana University Model.

    ERIC Educational Resources Information Center

    Parks, Douglas R.; Kushner, Julia; Hooper, Wallace; Flavin, Francis; Yellow Bird, Delilah; Ditmar, Selena

    This document compiles five short papers that describe the history and implementation of the Arikara Language Project and the Nakoda Language Project, the development of computer tools for language documentation, and the creation of curriculum materials for these and other projects. These papers are: "Genesis of the Project" (Douglas R.…

  4. A Comparison of State-Funded Pre-K Programs: Lessons for Indiana

    ERIC Educational Resources Information Center

    Chesnut, Colleen; Mosier, Gina; Sugimoto, Thomas; Ruddy, Anne-Maree

    2017-01-01

    In order to inform the Indiana State Board of Education's decision-making on Indiana's On My Way Pre-K Pilot program, researchers at the Center for Evaluation and Education Policy (CEEP) at Indiana University compiled existing data on ten states that have implemented pilot pre-Kindergarten (pre-K) programs and subsequently expanded these programs…

  5. Corpus-Supported Academic Writing: How Can Technology Help?

    ERIC Educational Resources Information Center

    Chitez, Madalina; Rapp, Christian; Kruse, Otto

    2015-01-01

    Phraseology has long been used in L2 teaching of academic writing, and corpus linguistics has played a major role in the compilation and assessment of academic phrases. However, there are only a few interactive academic writing tools in which corpus methodology is implemented in a real-time design to support formulation processes. In this paper,…

  6. Linking Soils and Down Woody Material Inventories for Cohesive Assessments of Ecosystem Carbon Pools

    Treesearch

    Katherine P. O' Neill; Christopher Woodall; Michael Amacher; Geoffrey Holden

    2005-01-01

    The Soils and Down Woody Materials (DWM) indicators collected by the Forest Inventory and Analysis program provide the only data available for nationally consistent monitoring of carbon storage in soils, the forest floor, and down woody debris. However, these indicators were developed and implemented separately, resulting in field methods and compilation procedures...

  7. A Guide to Selected Curriculum Materials on Interdependence, Conflict, and Change: Teacher Comments on Classroom Use and Implementation.

    ERIC Educational Resources Information Center

    New York Friends Group, Inc., New York. Center for War/Peace Studies.

    The purpose of this compilation of teacher-developed descriptive evaluations of curriculum materials is to provide practical guidance to available materials dealing with the selected themes of interdependence, conflict, and change. Each of six conceptual units presented on change, conflict, identity, interdependence, power and authority, and…

  8. Handbook of Information Relevant to Manpower Agencies: A Compilation of Practice Principles and Strategies for Manpower Operations.

    ERIC Educational Resources Information Center

    Erfurt, John C.; And Others

    Concepts of internal agency structure and operations, agency-company relations, and agency-enrollee relations, with recommendations for their implementation, form the three main sections of this handbook developed for manpower agency administrators, supervisory staffs and program planners. It is designed to aid those who organize and develop…

  9. Using GIS technology to analyze and understand wet meadow ecosystems

    Treesearch

    Joy Rosen; Roy Jemison; David Pawelek; Daniel Neary

    1999-01-01

    A Cibola National Forest wet meadow restoration was implemented as part of the Forest Road 49 enhancement near Grants, New Mexico. An Arc/View 3.0 Geographic Information System (GIS) was used to track the recovery of this ecosystem. Layers on topography, hydrology, vegetation, soils and human alterations were compiled using a GPS and commonly available data....

  10. Increasing Special Library Collection Use in Very Computer Intensive Environments: Automatic Bibliographic Compilation and the Dissemination of Electronic Newsletters.

    ERIC Educational Resources Information Center

    Sanchez, James Joseph

    This paper describes the development and implementation of an automatic bibliographic facility and an electronic newsletter created for a special collection of aerospace and mechanical engineering monographs and articles at the University of Arizona. The project included the development of an online catalog, increasing the depth of bibliographic…

  11. Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liao, C; Quinlan, D J; Willcock, J J

    2008-12-12

    Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. C++ applications using high-level abstractions, such as STL containers and complex user-defined types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we automatically parallelize C++ applications using ROSE, a multiple-language source-to-source compiler infrastructuremore » which preserves the high-level abstractions and gives us access to their semantics. Several representative parallelization candidate kernels are used to explore semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Those kernels include an array-base computation loop, a loop with task-level parallelism, and a domain-specific tree traversal. Our work extends the applicability of automatic parallelization to modern applications using high-level abstractions and exposes more opportunities to take advantage of multicore processors.« less

  12. Optimizing a mobile robot control system using GPU acceleration

    NASA Astrophysics Data System (ADS)

    Tuck, Nat; McGuinness, Michael; Martin, Fred

    2012-01-01

    This paper describes our attempt to optimize a robot control program for the Intelligent Ground Vehicle Competition (IGVC) by running computationally intensive portions of the system on a commodity graphics processing unit (GPU). The IGVC Autonomous Challenge requires a control program that performs a number of different computationally intensive tasks ranging from computer vision to path planning. For the 2011 competition our Robot Operating System (ROS) based control system would not run comfortably on the multicore CPU on our custom robot platform. The process of profiling the ROS control program and selecting appropriate modules for porting to run on a GPU is described. A GPU-targeting compiler, Bacon, is used to speed up development and help optimize the ported modules. The impact of the ported modules on overall performance is discussed. We conclude that GPU optimization can free a significant amount of CPU resources with minimal effort for expensive user-written code, but that replacing heavily-optimized library functions is more difficult, and a much less efficient use of time.

  13. Agile Implementation: A Blueprint for Implementing Evidence-Based Healthcare Solutions.

    PubMed

    Boustani, Malaz; Alder, Catherine A; Solid, Craig A

    2018-03-07

    To describe the essential components of an Agile Implementation (AI) process, which rapidly and effectively implements evidence-based healthcare solutions, and present a case study demonstrating its utility. Case demonstration study. Integrated, safety net healthcare delivery system in Indianapolis. Interdisciplinary team of clinicians and administrators. Reduction in dementia symptoms and caregiver burden; inpatient and outpatient care expenditures. Implementation scientists were able to implement a collaborative care model for dementia care and sustain it for more than 9 years. The model was implemented and sustained by using the elements of the AI process: proactive surveillance and confirmation of clinical opportunities, selection of the right evidence-based healthcare solution, localization (i.e., tailoring to the local environment) of the selected solution, development of an evaluation plan and performance feedback loop, development of a minimally standardized operation manual, and updating such manual annually. The AI process provides an effective model to implement and sustain evidence-based healthcare solutions. © 2018, Copyright the Authors Journal compilation © 2018, The American Geriatrics Society.

  14. Ada issues in implementing ART-Ada

    NASA Technical Reports Server (NTRS)

    Lee, S. Daniel

    1990-01-01

    Due to the Ada mandate of a number of government agencies, interest in deploying expert systems such as Ada has increased. Recently, several Ada-based expert system tools have been developed. According to a recent benchmark report, these tools do not perform as well as similar tools written in C. While poorly implemented Ada compilers contribute to the poor benchmark result, some fundamental problems of the Ada language itself have been uncovered. Here, the authors describe Ada language issues encountered during the deployment of ART-Ada, an expert system tool for Ada deployment. ART-Ada is being used to implement several prototype expert systems for the Space Station Freedom and the U.S. Air Force.

  15. The state of the Java universe

    ScienceCinema

    Gosling, James

    2018-05-22

    Speaker Bio: James Gosling received a B.Sc. in computer science from the University of Calgary, Canada in 1977. He received a Ph.D. in computer science from Carnegie-Mellon University in 1983. The title of his thesis was The Algebraic Manipulation of Constraints. He has built satellite data acquisition systems, a multiprocessor version of UNIX®, several compilers, mail systems, and window managers. He has also built a WYSIWYG text editor, a constraint-based drawing editor, and a text editor called Emacs, for UNIX systems. At Sun his early activity was as lead engineer of the NeWS window system. He did the original design of the Java programming language and implemented its original compiler and virtual machine. He has recently been a contributor to the Real-Time Specification for Java.

  16. Execution models for mapping programs onto distributed memory parallel computers

    NASA Technical Reports Server (NTRS)

    Sussman, Alan

    1992-01-01

    The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gosling, James

    Speaker Bio: James Gosling received a B.Sc. in computer science from the University of Calgary, Canada in 1977. He received a Ph.D. in computer science from Carnegie-Mellon University in 1983. The title of his thesis was The Algebraic Manipulation of Constraints. He has built satellite data acquisition systems, a multiprocessor version of UNIX®, several compilers, mail systems, and window managers. He has also built a WYSIWYG text editor, a constraint-based drawing editor, and a text editor called Emacs, for UNIX systems. At Sun his early activity was as lead engineer of the NeWS window system. He did the original designmore » of the Java programming language and implemented its original compiler and virtual machine. He has recently been a contributor to the Real-Time Specification for Java.« less

  18. Resonator reset in circuit QED by optimal control for large open quantum systems

    NASA Astrophysics Data System (ADS)

    Boutin, Samuel; Andersen, Christian Kraglund; Venkatraman, Jayameenakshi; Ferris, Andrew J.; Blais, Alexandre

    2017-10-01

    We study an implementation of the open GRAPE (gradient ascent pulse engineering) algorithm well suited for large open quantum systems. While typical implementations of optimal control algorithms for open quantum systems rely on explicit matrix exponential calculations, our implementation avoids these operations, leading to a polynomial speedup of the open GRAPE algorithm in cases of interest. This speedup, as well as the reduced memory requirements of our implementation, are illustrated by comparison to a standard implementation of open GRAPE. As a practical example, we apply this open-system optimization method to active reset of a readout resonator in circuit QED. In this problem, the shape of a microwave pulse is optimized such as to empty the cavity from measurement photons as fast as possible. Using our open GRAPE implementation, we obtain pulse shapes, leading to a reset time over 4 times faster than passive reset.

  19. Practical Aerodynamic Design Optimization Based on the Navier-Stokes Equations and a Discrete Adjoint Method

    NASA Technical Reports Server (NTRS)

    Grossman, Bernard

    1999-01-01

    Compressible and incompressible versions of a three-dimensional unstructured mesh Reynolds-averaged Navier-Stokes flow solver have been differentiated and resulting derivatives have been verified by comparisons with finite differences and a complex-variable approach. In this implementation, the turbulence model is fully coupled with the flow equations in order to achieve this consistency. The accuracy demonstrated in the current work represents the first time that such an approach has been successfully implemented. The accuracy of a number of simplifying approximations to the linearizations of the residual have been examined. A first-order approximation to the dependent variables in both the adjoint and design equations has been investigated. The effects of a "frozen" eddy viscosity and the ramifications of neglecting some mesh sensitivity terms were also examined. It has been found that none of the approximations yielded derivatives of acceptable accuracy and were often of incorrect sign. However, numerical experiments indicate that an incomplete convergence of the adjoint system often yield sufficiently accurate derivatives, thereby significantly lowering the time required for computing sensitivity information. The convergence rate of the adjoint solver relative to the flow solver has been examined. Inviscid adjoint solutions typically require one to four times the cost of a flow solution, while for turbulent adjoint computations, this ratio can reach as high as eight to ten. Numerical experiments have shown that the adjoint solver can stall before converging the solution to machine accuracy, particularly for viscous cases. A possible remedy for this phenomenon would be to include the complete higher-order linearization in the preconditioning step, or to employ a simple form of mesh sequencing to obtain better approximations to the solution through the use of coarser meshes. An efficient surface parameterization based on a free-form deformation technique has been utilized and the resulting codes have been integrated with an optimization package. Lastly, sample optimizations have been shown for inviscid and turbulent flow over an ONERA M6 wing. Drag reductions have been demonstrated by reducing shock strengths across the span of the wing. In order for large scale optimization to become routine, the benefits of parallel architectures should be exploited. Although the flow solver has been parallelized using compiler directives. The parallel efficiency is under 50 percent. Clearly, parallel versions of the codes will have an immediate impact on the ability to design realistic configurations on fine meshes, and this effort is currently underway.

  20. A compilation of global bio-optical in situ data for ocean-colour satellite applications

    NASA Astrophysics Data System (ADS)

    Valente, André; Sathyendranath, Shubha; Brotas, Vanda; Groom, Steve; Grant, Michael; Taberner, Malcolm; Antoine, David; Arnone, Robert; Balch, William M.; Barker, Kathryn; Barlow, Ray; Bélanger, Simon; Berthon, Jean-François; Beşiktepe, Şükrü; Brando, Vittorio; Canuti, Elisabetta; Chavez, Francisco; Claustre, Hervé; Crout, Richard; Frouin, Robert; García-Soto, Carlos; Gibb, Stuart W.; Gould, Richard; Hooker, Stanford; Kahru, Mati; Klein, Holger; Kratzer, Susanne; Loisel, Hubert; McKee, David; Mitchell, Brian G.; Moisan, Tiffany; Muller-Karger, Frank; O'Dowd, Leonie; Ondrusek, Michael; Poulton, Alex J.; Repecaud, Michel; Smyth, Timothy; Sosik, Heidi M.; Twardowski, Michael; Voss, Kenneth; Werdell, Jeremy; Wernand, Marcel; Zibordi, Giuseppe

    2016-06-01

    A compiled set of in situ data is important to evaluate the quality of ocean-colour satellite-data records. Here we describe the data compiled for the validation of the ocean-colour products from the ESA Ocean Colour Climate Change Initiative (OC-CCI). The data were acquired from several sources (MOBY, BOUSSOLE, AERONET-OC, SeaBASS, NOMAD, MERMAID, AMT, ICES, HOT, GeP&CO), span between 1997 and 2012, and have a global distribution. Observations of the following variables were compiled: spectral remote-sensing reflectances, concentrations of chlorophyll a, spectral inherent optical properties and spectral diffuse attenuation coefficients. The data were from multi-project archives acquired via the open internet services or from individual projects, acquired directly from data providers. Methodologies were implemented for homogenisation, quality control and merging of all data. No changes were made to the original data, other than averaging of observations that were close in time and space, elimination of some points after quality control and conversion to a standard format. The final result is a merged table designed for validation of satellite-derived ocean-colour products and available in text format. Metadata of each in situ measurement (original source, cruise or experiment, principal investigator) were preserved throughout the work and made available in the final table. Using all the data in a validation exercise increases the number of matchups and enhances the representativeness of different marine regimes. By making available the metadata, it is also possible to analyse each set of data separately. The compiled data are available at doi:10.1594/PANGAEA.854832 (Valente et al., 2015).

  1. Estimating Regional and National-Scale Greenhouse Gas Emissions in the Agriculture, Forestry, and Other Land Use (AFOLU) Sector using the `Agricultural and Land Use (ALU) Tool'

    NASA Astrophysics Data System (ADS)

    Spencer, S.; Ogle, S. M.; Wirth, T. C.; Sivakami, G.

    2016-12-01

    The Intergovernmental Panel on Climate Change (IPCC) provides methods and guidance for estimating anthropogenic greenhouse gas emissions for reporting to the United Nations Framework Convention on Climate Change. The methods are comprehensive and require extensive data compilation, management, aggregation, documentation and calculations of source and sink categories to achieve robust emissions estimates. IPCC Guidelines describe three estimation tiers that require increasing levels of country-specific data and method complexity. Use of higher tiers should improve overall accuracy and reduce uncertainty in estimates. The AFOLU sector represents a complex set of methods for estimating greenhouse gas emissions and carbon sinks. Major AFOLU emissions and sinks include carbon dioxide (CO2) from carbon stock change in biomass, dead organic matter and soils, urea or lime application to soils, and oxidation of carbon in drained organic soils; nitrous oxide (N2O) and methane (CH4) emissions from livestock management and biomass burning; N2O from organic amendments and fertilizer application to soils, and CH4 emissions from rice cultivation. To assist inventory compilers with calculating AFOLU-sector estimates, the Agriculture and Land Use Greenhouse Gas Inventory Tool (ALU) was designed to implement Tier 1 and 2 methods using IPCC Good Practice Guidance. It guides the compiler through activity data entry, emission factor assignment, and emissions calculations while carefully maintaining data integrity. ALU also provides IPCC defaults and can estimate uncertainty. ALU was designed to simplify the AFOLU inventory compilation process at regional or national scales, disaggregating the process into a series of steps reduces the potential for errors in the compilation process. An example application has been developed using ALU to estimate methane emissions from rice production in the United States.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bergen, Ben; Moss, Nicholas; Charest, Marc Robert Joseph

    FleCSI is a compile-time configurable framework designed to support multi-physics application development. As such, FleCSI attempts to provide a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. Current support includes multi-dimensional mesh topology, mesh geometry, and mesh adjacency information, n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures. FleCSI also introduces a functional programming model with control, execution, and data abstractions that are consistent with both MPI and state-of-the-art task-based runtimes such as Legion and Charm++. The FleCSI abstraction layer providesmore » the developer with insulation from the underlying runtime, while allowing support for multiple runtime systems, including conventional models like asynchronous MPI. The intent is to give developers a concrete set of user-friendly programming tools that can be used now, while allowing flexibility in choosing runtime implementations and optimizations that can be applied to architectures and runtimes that arise in the future. The control and execution models in FleCSI also provide formal nomenclature for describing poorly understood concepts like kernels and tasks.« less

  3. Embedded Streaming Deep Neural Networks Accelerator With Applications.

    PubMed

    Dundar, Aysegul; Jin, Jonghoon; Martini, Berin; Culurciello, Eugenio

    2017-07-01

    Deep convolutional neural networks (DCNNs) have become a very powerful tool in visual perception. DCNNs have applications in autonomous robots, security systems, mobile phones, and automobiles, where high throughput of the feedforward evaluation phase and power efficiency are important. Because of this increased usage, many field-programmable gate array (FPGA)-based accelerators have been proposed. In this paper, we present an optimized streaming method for DCNNs' hardware accelerator on an embedded platform. The streaming method acts as a compiler, transforming a high-level representation of DCNNs into operation codes to execute applications in a hardware accelerator. The proposed method utilizes maximum computational resources available based on a novel-scheduled routing topology that combines data reuse and data concatenation. It is tested with a hardware accelerator implemented on the Xilinx Kintex-7 XC7K325T FPGA. The system fully explores weight-level and node-level parallelizations of DCNNs and achieves a peak performance of 247 G-ops while consuming less than 4 W of power. We test our system with applications on object classification and object detection in real-world scenarios. Our results indicate high-performance efficiency, outperforming all other presented platforms while running these applications.

  4. A new and inexpensive non-bit-for-bit solution reproducibility test based on time step convergence (TSC1.0)

    NASA Astrophysics Data System (ADS)

    Wan, Hui; Zhang, Kai; Rasch, Philip J.; Singh, Balwinder; Chen, Xingyuan; Edwards, Jim

    2017-02-01

    A test procedure is proposed for identifying numerically significant solution changes in evolution equations used in atmospheric models. The test issues a fail signal when any code modifications or computing environment changes lead to solution differences that exceed the known time step sensitivity of the reference model. Initial evidence is provided using the Community Atmosphere Model (CAM) version 5.3 that the proposed procedure can be used to distinguish rounding-level solution changes from impacts of compiler optimization or parameter perturbation, which are known to cause substantial differences in the simulated climate. The test is not exhaustive since it does not detect issues associated with diagnostic calculations that do not feedback to the model state variables. Nevertheless, it provides a practical and objective way to assess the significance of solution changes. The short simulation length implies low computational cost. The independence between ensemble members allows for parallel execution of all simulations, thus facilitating fast turnaround. The new method is simple to implement since it does not require any code modifications. We expect that the same methodology can be used for any geophysical model to which the concept of time step convergence is applicable.

  5. How can primary care providers manage pediatric obesity in the real world?

    PubMed

    Hopkins, Kristy F; Decristofaro, Claire; Elliott, Lydia

    2011-06-01

    To provide information regarding evidence-based interventions and clinical practice guidelines as a basis for a clinical toolkit utilizing a step management approach for the primary care provider in managing childhood obesity. Evidence-based literature including original clinical trials, literature reviews, and clinical practice guidelines. Interventions can be stratified based on initial screening of children and adolescents so that selection of treatment options is optimized. For all treatments, lifestyle modifications include attention to diet and activity level. Levels of initial success, as well as maintenance of target body mass index, may be related to the intensity and duration of interventions; involvement of family may increase success rates. For failed lifestyle interventions, or for patients with extreme obesity and/or certain comorbidities, pharmacologic or surgical options should be considered. Many intensive programs have shown success, but the resources required for these approaches may be unavailable to the typical community provider and family. However, using current guidelines, the primary care provider can initiate and manage ongoing interventions in pediatric obesity. A toolkit for primary care implementation and maintenance interventions is provided. ©2011 The Author(s) Journal compilation ©2011 American Academy of Nurse Practitioners.

  6. An improved approach of register allocation via graph coloring

    NASA Astrophysics Data System (ADS)

    Gao, Lei; Shi, Ce

    2005-03-01

    Register allocation is an important part of optimizing compiler. The algorithm of register allocation via graph coloring is implemented by Chaitin and his colleagues firstly and improved by Briggs and others. By abstracting register allocation to graph coloring, the allocation process is simplified. As the physical register number is limited, coloring of the interference graph can"t succeed for every node. The uncolored nodes must be spilled. There is an assumption that almost all the allocation method obeys: when a register is allocated to a variable v, it can"t be used by others before v quit even if v is not used for a long time. This may causes a waste of register resource. The authors relax this restriction under certain conditions and make some improvement. In this method, one register can be mapped to two or more interfered "living" live ranges at the same time if they satisfy some requirements. An operation named merge is defined which can arrange two interfered nodes occupy the same register with some cost. Thus, the resource of register can be used more effectively and the cost of memory access can be reduced greatly.

  7. An Integrated Research Program for the Modeling, Analysis and Control of Aerospace Systems

    DTIC Science & Technology

    1992-03-03

    Fabiano, Jr. - Brown University Mitchell Feigenbaum - Rockefeller University Elena Fernandez - Institudo de Desarrollo Techologico, para la Industria...system. The system runs under DEC Ultrix; we have installed the GKS graphics system and language compilers (FORTRAN and C). The DELIGHT.MIMO software ...which links a sophisticated non-smooth optimization package to some linear system software , is on the system. The package was kindly furnished by

  8. An Integrated Research Program for the Modeling, Analysis and Control of Aerospace Systems

    DTIC Science & Technology

    1992-03-03

    Mitchell Feigenbaum - Rockefeller University Elena Fernandez - Institudo de Desarrollo Techologico, para la Industria Quimica Wilfred M. Greenlee...Ultrix; we have installed the GKS graphics system and language compilers (FORTRAN and C). The DELIGHT.MIMO software , which links a sophisticated non...smooth optimization package to some linear system software , is on the system. The package was kindly furnished by Professor E. Polak, Electrical and

  9. Near Hartree-Fock quality GTO basis sets for the first- and third-row atoms

    NASA Technical Reports Server (NTRS)

    Partridge, Harry

    1989-01-01

    Energy-optimized Gaussian-type-orbital (GTO) basis sets of accuracy approaching that of numerical Hartree-Fock computations are compiled for the elements of the first and third rows of the periodic table. The methods employed in calculating the sets are explained; the applicability of the sets to electronic-structure calculations is discussed; and the results are presented in tables and briefly characterized.

  10. Case Studies of Successful Schoolwide Enrichment Model-Reading (SEM-R) Classroom Implementations. Research Monograph Series. RM10204

    ERIC Educational Resources Information Center

    Reis, Sally M.; Little, Catherine A.; Fogarty, Elizabeth; Housand, Angela M.; Housand, Brian C.; Sweeny, Sheelah M.; Eckert, Rebecca D.; Muller, Lisa M.

    2010-01-01

    The purpose of this qualitative study was to examine the scaling up of the Schoolwide Enrichment Model in Reading (SEM-R) in 11 elementary and middle schools in geographically diverse sites across the country. Qualitative comparative analysis was used in this study, with multiple data sources compiled into 11 in-depth school case studies…

  11. LBNL Laboratory Directed Research and Development Program FY2016

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ho, D.

    2017-03-01

    The Berkeley Lab Laboratory Directed Research and Development Program FY2016 report is compiled from annual reports submitted by principal investigators following the close of the fiscal year. This report describes the supported projects and summarizes their accomplishments. It constitutes a part of the LDRD program planning and documentation process that includes an annual planning cycle, project selection, implementation and review.

  12. Just Another Gibbs Sampler (JAGS): Flexible Software for MCMC Implementation

    ERIC Educational Resources Information Center

    Depaoli, Sarah; Clifton, James P.; Cobb, Patrice R.

    2016-01-01

    A review of the software Just Another Gibbs Sampler (JAGS) is provided. We cover aspects related to history and development and the elements a user needs to know to get started with the program, including (a) definition of the data, (b) definition of the model, (c) compilation of the model, and (d) initialization of the model. An example using a…

  13. An Interface Transformation Strategy for AF-IPPS

    DTIC Science & Technology

    2012-12-01

    Representational State Transfer (REST) and Java Enterprise Edition ( Java EE) to implement a reusable “translation service.” For SOAP and REST protocols, XML and...of best-of-breed open source software. The product baseline is summarized in the following table: Product Function Description Java Language...Compiler & Runtime JBoss Application Server Applications, Messaging, Translation Java EE Application Server Ruby on Rails Applications Ruby Web

  14. Follow-Up Evaluation Project. From July 1, 1981 to June 30, 1983. Final Report.

    ERIC Educational Resources Information Center

    Santa Fe Community Coll., Gainesville, FL.

    A project was undertaken to revise a model competency-based trade and industrial education program that had been developed for use in Florida schools in a project that was implemented earlier. During the followup evaluation, the project staff compiled task listings for each of the following trade and industrial education program areas: automotive;…

  15. A Resource Document for Implementating Recruitment of Minorities and Women at The Florida State University.

    ERIC Educational Resources Information Center

    Florida State Univ., Tallahassee.

    The suggestions and recruitment sources contained in this document are compiled with the idea of aiding in the search for minorities and women to fill positions at all levels in the universities. The document contains: (1) innovative approach to increasing the number of minority and women faculty; (2) predominately black colleges and universities;…

  16. Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks

    NASA Technical Reports Server (NTRS)

    Saini, Subhash

    1997-01-01

    Compilers supporting High Performance Form (HPF) features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR), Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI) combinations will be compared, based on latest NAS Parallel Benchmark results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition, we would also present NPB, (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu CAPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, and SGI Origin2000. We would also present sustained performance per dollar for Class B LU, SP and BT benchmarks.

  17. Run-time implementation issues for real-time embedded Ada

    NASA Technical Reports Server (NTRS)

    Maule, Ruth A.

    1986-01-01

    A motivating factor in the development of Ada as the department of defense standard language was the high cost of embedded system software development. It was with embedded system requirements in mind that many of the features of the language were incorporated. Yet it is the designers of embedded systems that seem to comprise the majority of the Ada community dissatisfied with the language. There are a variety of reasons for this dissatisfaction, but many seem to be related in some way to the Ada run-time support system. Some of the areas in which the inconsistencies were found to have the greatest impact on performance from the standpoint of real-time systems are presented. In particular, a large part of the duties of the tasking supervisor are subject to the design decisions of the implementer. These include scheduling, rendezvous, delay processing, and task activation and termination. Some of the more general issues presented include time and space efficiencies, generic expansions, memory management, pragmas, and tracing features. As validated compilers become available for bare computer targets, it is important for a designer to be aware that, at least for many real-time issues, all validated Ada compilers are not created equal.

  18. Situational analysis of infant and young child nutrition policies and programmatic activities in Mali.

    PubMed

    Wuehler, Sara E; Coulibaly, Mouctar

    2011-04-01

    Progress towards reducing mortality and malnutrition among children <5 years of age has been less than needed to achieve related Millennium Development Goals. Therefore, several international agencies joined to 'Reposition children's right to adequate nutrition in the Sahel', starting with a situational analysis of current activities related to infant and young child nutrition (IYCN). The main objectives of the situational analysis are to compile, analyse and interpret available information on infant and young child feeding, and the nutrition situation of children <2 years of age in Mali, as one of the six targeted countries. Between June and September 2008, key informants responsible for conducting IYCN-related activities in Mali were interviewed, and 117 documents were examined on the following themes: optimal breastfeeding and complementary feeding practices, prevention of micronutrient deficiencies, screening and management of acute malnutrition, prevention of mother-to-child transmission of HIV, food security, and hygienic practices. Most of the key IYCN topics were addressed in national policies, training materials, and programme documents. Information on the national coverage and impact of these programmes is generally not available. Exclusive breastfeeding (<6 months) has increased in Mali, but no studies identified the contributors to this increase. Despite improvements in breastfeeding practices, optimal infant, and young child feeding is still practiced among too few young children in Mali. Several research articles were identified, but few of these were linked to programme development. Some programme monitoring and evaluation reports were available, but few of these were rigorous enough to identify whether IYCN-specific programme components were implemented as designed or were achieving desired outcomes. Therefore, we could not confirm which programmes contributed to reported improvements. Monitoring of programmes managing malnutrition identified gaps in human and institutional capacities to fully carry out intended interventions and the government has recognized the overall lack of adequate numbers of health care providers to carry out necessary programmes in Mali, of which nutrition programmes are a part. The policy and programme framework is well established for support of optimal IYCN practices, but greater resources and capacity building are needed to: (i) conduct necessary research to adapt training materials and programme protocols to programmatic needs; (ii) implement rigorous monitoring and evaluation that identify effective programme components; and (iii) apply these findings in developing, expanding, and improving effective programmes. © 2011 Blackwell Publishing Ltd.

  19. Analogy Mapping Development for Learning Programming

    NASA Astrophysics Data System (ADS)

    Sukamto, R. A.; Prabawa, H. W.; Kurniawati, S.

    2017-02-01

    Programming skill is an important skill for computer science students, whereas nowadays, there many computer science students are lack of skills and information technology knowledges in Indonesia. This is contrary with the implementation of the ASEAN Economic Community (AEC) since the end of 2015 which is the qualified worker needed. This study provided an effort for nailing programming skills by mapping program code to visual analogies as learning media. The developed media was based on state machine and compiler principle and was implemented in C programming language. The state of every basic condition in programming were successful determined as analogy visualization.

  20. Automatic Implementation of Ttethernet-Based Time-Triggered Avionics Applications

    NASA Astrophysics Data System (ADS)

    Gorcitz, Raul Adrian; Carle, Thomas; Lesens, David; Monchaux, David; Potop-Butucaruy, Dumitru; Sorel, Yves

    2015-09-01

    The design of safety-critical embedded systems such as those used in avionics still involves largely manual phases. But in avionics the definition of standard interfaces embodied in standards such as ARINC 653 or TTEthernet should allow the definition of fully automatic code generation flows that reduce the costs while improving the quality of the generated code, much like compilers have done when replacing manual assembly coding. In this paper, we briefly present such a fully automatic implementation tool, called Lopht, for ARINC653-based time-triggered systems, and then explain how it is currently extended to include support for TTEthernet networks.

  1. Insertion of operation-and-indicate instructions for optimized SIMD code

    DOEpatents

    Eichenberger, Alexander E; Gara, Alan; Gschwind, Michael K

    2013-06-04

    Mechanisms are provided for inserting indicated instructions for tracking and indicating exceptions in the execution of vectorized code. A portion of first code is received for compilation. The portion of first code is analyzed to identify non-speculative instructions performing designated non-speculative operations in the first code that are candidates for replacement by replacement operation-and-indicate instructions that perform the designated non-speculative operations and further perform an indication operation for indicating any exception conditions corresponding to special exception values present in vector register inputs to the replacement operation-and-indicate instructions. The replacement is performed and second code is generated based on the replacement of the at least one non-speculative instruction. The data processing system executing the compiled code is configured to store special exception values in vector output registers, in response to a speculative instruction generating an exception condition, without initiating exception handling.

  2. Barrier-breaking performance for industrial problems on the CRAY C916

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Graffunder, S.K.

    1993-12-31

    Nine applications, including third-party codes, were submitted to the Gordon Bell Prize committee showing the CRAY C916 supercomputer providing record-breaking time to solution for industrial problems in several disciplines. Performance was obtained by balancing raw hardware speed; effective use of large, real, shared memory; compiler vectorization and autotasking; hand optimization; asynchronous I/O techniques; and new algorithms. The highest GFLOPS performance for the submissions was 11.1 GFLOPS out of a peak advertised performance of 16 GFLOPS for the CRAY C916 system. One program achieved a 15.45 speedup from the compiler with just two hand-inserted directives to scope variables properly for themore » mathematical library. New I/O techniques hide tens of gigabytes of I/O behind parallel computations. Finally, new iterative solver algorithms have demonstrated times to solution on 1 CPU as high as 70 times faster than the best direct solvers.« less

  3. Automatic Generation of Algorithms for the Statistical Analysis of Planetary Nebulae Images

    NASA Technical Reports Server (NTRS)

    Fischer, Bernd

    2004-01-01

    Analyzing data sets collected in experiments or by observations is a Core scientific activity. Typically, experimentd and observational data are &aught with uncertainty, and the analysis is based on a statistical model of the conjectured underlying processes, The large data volumes collected by modern instruments make computer support indispensible for this. Consequently, scientists spend significant amounts of their time with the development and refinement of the data analysis programs. AutoBayes [GF+02, FS03] is a fully automatic synthesis system for generating statistical data analysis programs. Externally, it looks like a compiler: it takes an abstract problem specification and translates it into executable code. Its input is a concise description of a data analysis problem in the form of a statistical model as shown in Figure 1; its output is optimized and fully documented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Internally, however, it is quite different: AutoBayes derives a customized algorithm implementing the given model using a schema-based process, and then further refines and optimizes the algorithm into code. A schema is a parameterized code template with associated semantic constraints which define and restrict the template s applicability. The schema parameters are instantiated in a problem-specific way during synthesis as AutoBayes checks the constraints against the original model or, recursively, against emerging sub-problems. AutoBayes schema library contains problem decomposition operators (which are justified by theorems in a formal logic in the domain of Bayesian networks) as well as machine learning algorithms (e.g., EM, k-Means) and nu- meric optimization methods (e.g., Nelder-Mead simplex, conjugate gradient). AutoBayes augments this schema-based approach by symbolic computation to derive closed-form solutions whenever possible. This is a major advantage over other statistical data analysis systems which use numerical approximations even in cases where closed-form solutions exist. AutoBayes is implemented in Prolog and comprises approximately 75.000 lines of code. In this paper, we take one typical scientific data analysis problem-analyzing planetary nebulae images taken by the Hubble Space Telescope-and show how AutoBayes can be used to automate the implementation of the necessary anal- ysis programs. We initially follow the analysis described by Knuth and Hajian [KHO2] and use AutoBayes to derive code for the published models. We show the details of the code derivation process, including the symbolic computations and automatic integration of library procedures, and compare the results of the automatically generated and manually implemented code. We then go beyond the original analysis and use AutoBayes to derive code for a simple image segmentation procedure based on a mixture model which can be used to automate a manual preproceesing step. Finally, we combine the original approach with the simple segmentation which yields a more detailed analysis. This also demonstrates that AutoBayes makes it easy to combine different aspects of data analysis.

  4. Evaluation of the Intel Xeon Phi Co-processor to accelerate the sensitivity map calculation for PET imaging

    NASA Astrophysics Data System (ADS)

    Dey, T.; Rodrigue, P.

    2015-07-01

    We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 103 to 104 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called `intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The examination showed that a reasonable speedup of sensitivity map calculation could be achieved on the Xeon Phi either by a portable or a hardware specific implementation.

  5. Implementing optimal thinning strategies

    Treesearch

    Kurt H. Riitters; J. Douglas Brodie

    1984-01-01

    Optimal thinning regimes for achieving several management objectives were derived from two stand-growth simulators by dynamic programming. Residual mean tree volumes were then plotted against stand density management diagrams. The results supported the use of density management diagrams for comparing, checking, and implementing the results of optimization analyses....

  6. Situational analysis of infant and young child nutrition policies and programmatic activities in Niger.

    PubMed

    Wuehler, Sara E; Biga Hassoumi, Abdoulazize

    2011-04-01

    Due to limited progress towards reducing mortality and malnutrition among children <5 years of age, an alliance of international agencies joined to 'Reposition children's right to adequate nutrition in the Sahel,' starting with a situational analysis of current activities related to infant and young child nutrition (IYCN). The main objectives of this analysis are to compile, analyse, and interpret available information on infant and child feeding and the nutrition situation of children <2 years of age in Niger, as one of the six targeted countries. Between August and November 2008, key informants responsible for conducting IYCN-related activities in Niger were interviewed, and 90 documents were examined on: optimal breastfeeding and complementary feeding practices, prevention of micronutrient deficiencies, prevention of mother-to-child transmission of HIV, management of acute malnutrition, food security, and hygienic practices. The results reported are limited by the availability of documents for review. Mortality rates are on track to reaching the Millennium Development Goal to reduce mortality among young children by two-thirds by 2015, but there has been no change in undernutrition, and total mortality rates are still high among young children. Nearly all of the key IYCN topics were addressed, specifically or generally, in national policy documents, training materials, and programmes. A national nutrition council meets regularly to coordinate programme activities nationally. Many of the IYCN-related programmes are intended for national coverage, but few reach this coverage. Monitoring and impact evaluations were conducted on some programmes, but few of these reported on whether the specific IYCN components of the programme were implemented as designed or compared outcomes with non-intervention sites. Human resources have been identified as inadequate to fully carry out nutrition programmes in Niger. Due to these limitations, we could not confirm whether the lack of progress in reducing malnutrition was due to ineffective or inadequately implemented programmes, though both of these were likely contributors. The policy framework is well established for the promotion of optimal IYCN practices, but greater resources and capacity building are needed to: (i) increase human capacities to carry out nutrition programmes; (ii) expand and track the implementation of evidence-based programmes nationally; (iii) improve and carry out monitoring and evaluation that identify effective and ineffective programmes; and (iv) apply these findings in developing, expanding, and improving effective programmes. © 2011 Blackwell Publishing Ltd.

  7. SSR_pipeline--computer software for the identification of microsatellite sequences from paired-end Illumina high-throughput DNA sequence data

    USGS Publications Warehouse

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.

  8. A Unified Statistical Rain-Attenuation Model for Communication Link Fade Predictions and Optimal Stochastic Fade Control Design Using a Location-Dependent Rain-Statistic Database

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1990-01-01

    A static and dynamic rain-attenuation model is presented which describes the statistics of attenuation on an arbitrarily specified satellite link for any location for which there are long-term rainfall statistics. The model may be used in the design of the optimal stochastic control algorithms to mitigate the effects of attenuation and maintain link reliability. A rain-statistics data base is compiled, which makes it possible to apply the model to any location in the continental U.S. with a resolution of 0-5 degrees in latitude and longitude. The model predictions are compared with experimental observations, showing good agreement.

  9. Experiences in using the CYBER 203 for three-dimensional transonic flow calculations

    NASA Technical Reports Server (NTRS)

    Melson, N. D.; Keller, J. D.

    1982-01-01

    In this paper, the authors report on some of their experiences modifying two three-dimensional transonic flow programs (FLO22 and FLO27) for use on the NASA Langley Research Center CYBER 203. Both of the programs discussed were originally written for use on serial machines. Several methods were attempted to optimize the execution of the two programs on the vector machine, including: (1) leaving the program in a scalar form (i.e., serial computation) with compiler software used to optimize and vectorize the program, (2) vectorizing parts of the existing algorithm in the program, and (3) incorporating a new vectorizable algorithm (ZEBRA I or ZEBRA II) in the program.

  10. Optimization of the Laser Properties of Polymer Films Doped with N,N´-Bis(3-methylphenyl)-N,N´-diphenylbenzidine

    PubMed Central

    Calzado, Eva M.; Boj, Pedro G.; Díaz-García, María A.

    2009-01-01

    This review compiles the work performed in the field of organic solid-state lasers with the hole-transporting organic molecule N,N´-bis(3-methylphenyl)-N,N´-diphenyl-benzidine system (TPD), in view of improving active laser material properties. The optimization of the amplified spontaneous emission characteristics, i.e., threshold, linewidth, emission wavelength and photostability, of polystyrene films doped with TPD in waveguide configuration has been achieved by investigating the influence of several materials parameters such as film thickness and TPD concentration. In addition, the influence in the emission properties of the inclusion of a second-order distributed feedback grating in the substrate is discussed.

  11. Automating quantum experiment control

    NASA Astrophysics Data System (ADS)

    Stevens, Kelly E.; Amini, Jason M.; Doret, S. Charles; Mohler, Greg; Volin, Curtis; Harter, Alexa W.

    2017-03-01

    The field of quantum information processing is rapidly advancing. As the control of quantum systems approaches the level needed for useful computation, the physical hardware underlying the quantum systems is becoming increasingly complex. It is already becoming impractical to manually code control for the larger hardware implementations. In this chapter, we will employ an approach to the problem of system control that parallels compiler design for a classical computer. We will start with a candidate quantum computing technology, the surface electrode ion trap, and build a system instruction language which can be generated from a simple machine-independent programming language via compilation. We incorporate compile time generation of ion routing that separates the algorithm description from the physical geometry of the hardware. Extending this approach to automatic routing at run time allows for automated initialization of qubit number and placement and additionally allows for automated recovery after catastrophic events such as qubit loss. To show that these systems can handle real hardware, we present a simple demonstration system that routes two ions around a multi-zone ion trap and handles ion loss and ion placement. While we will mainly use examples from transport-based ion trap quantum computing, many of the issues and solutions are applicable to other architectures.

  12. Quality standards for predialysis education: results from a consensus conference.

    PubMed

    Isnard Bagnis, Corinne; Crepaldi, Carlo; Dean, Jessica; Goovaerts, Tony; Melander, Stefan; Nilsson, Eva-Lena; Prieto-Velasco, Mario; Trujillo, Carmen; Zambon, Roberto; Mooney, Andrew

    2015-07-01

    This position statement was compiled following an expert meeting in March 2013, Zurich, Switzerland. Attendees were invited from a spread of European renal units with established and respected renal replacement therapy option education programmes. Discussions centred around optimal ways of creating an education team, setting realistic and meaningful objectives for patient education, and assessing the quality of education delivered. © The Author 2014. Published by Oxford University Press on behalf of ERA-EDTA.

  13. Kernel-based Linux emulation for Plan 9.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Minnich, Ronald G.

    2010-09-01

    CNKemu is a kernel-based system for the 9k variant of the Plan 9 kernel. It is designed to provide transparent binary support for programs compiled for IBM's Compute Node Kernel (CNK) on the Blue Gene series of supercomputers. This support allows users to build applications with the standard Blue Gene toolchain, including C++ and Fortran compilers. While the CNK is not Linux, IBM designed the CNK so that the user interface has much in common with the Linux 2.0 system call interface. The Plan 9 CNK emulator hence provides the foundation of kernel-based Linux system call support on Plan 9.more » In this paper we discuss cnkemu's implementation and some of its more interesting features, such as the ability to easily intermix Plan 9 and Linux system calls.« less

  14. The language parallel Pascal and other aspects of the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  15. The Role of Reformulation in the Automatic Design of Satisfiability Procedures

    NASA Technical Reports Server (NTRS)

    VanBaalen, Jeffrey

    1992-01-01

    Recently there has been increasing interest in the problem of knowledge compilation (Selman & Kautz91). This is the problem of identifying tractable techniques for determining the consequences of a knowledge base. We have developed and implemented a technique, called DRAT, that given a theory, i.e., a collection of firstorder clauses, can often produce a type of decision procedure for that theory that can be used in the place of a general-purpose first-order theorem prover for determining many of the consequences of that theory. Hence, DRAT does a type of knowledge compilation. Central to the DRAT technique is a type of reformulation in which a problem's clauses are restated in terms of different nonlogical symbols. The reformulation is isomorphic in the sense that it does not change the semantics of a problem.

  16. The numerical solution of ordinary differential equations by the Taylor series method

    NASA Technical Reports Server (NTRS)

    Silver, A. H.; Sullivan, E.

    1973-01-01

    A programming implementation of the Taylor series method is presented for solving ordinary differential equations. The compiler is written in PL/1, and the target language is FORTRAN IV. The reduction of a differential system to rational form is described along with the procedures required for automatic numerical integration. The Taylor method is compared with two other methods for a number of differential equations. Algorithms using the Taylor method to find the zeroes of a given differential equation and to evaluate partial derivatives are presented. An annotated listing of the PL/1 program which performs the reduction and code generation is given. Listings of the FORTRAN routines used by the Taylor series method are included along with a compilation of all the recurrence formulas used to generate the Taylor coefficients for non-rational functions.

  17. A powerful graphical pulse sequence programming tool for magnetic resonance imaging.

    PubMed

    Jie, Shen; Ying, Liu; Jianqi, Li; Gengying, Li

    2005-12-01

    A powerful graphical pulse sequence programming tool has been designed for creating magnetic resonance imaging (MRI) applications. It allows rapid development of pulse sequences in graphical mode (allowing for the visualization of sequences), and consists of three modules which include a graphical sequence editor, a parameter management module and a sequence compiler. Its key features are ease to use, flexibility and hardware independence. When graphic elements are combined with a certain text expressions, the graphical pulse sequence programming is as flexible as text-based programming tool. In addition, a hardware-independent design is implemented by using the strategy of two step compilations. To demonstrate the flexibility and the capability of this graphical sequence programming tool, a multi-slice fast spin echo experiment is performed on our home-made 0.3 T permanent magnet MRI system.

  18. Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives

    NASA Astrophysics Data System (ADS)

    Eriksen, Janus J.

    2017-09-01

    It is demonstrated how the non-proprietary OpenACC standard of compiler directives may be used to compactly and efficiently accelerate the rate-determining steps of two of the most routinely applied many-body methods of electronic structure theory, namely the second-order Møller-Plesset (MP2) model in its resolution-of-the-identity approximated form and the (T) triples correction to the coupled cluster singles and doubles model (CCSD(T)). By means of compute directives as well as the use of optimised device math libraries, the operations involved in the energy kernels have been ported to graphics processing unit (GPU) accelerators, and the associated data transfers correspondingly optimised to such a degree that the final implementations (using either double and/or single precision arithmetics) are capable of scaling to as large systems as allowed for by the capacity of the host central processing unit (CPU) main memory. The performance of the hybrid CPU/GPU implementations is assessed through calculations on test systems of alanine amino acid chains using one-electron basis sets of increasing size (ranging from double- to pentuple-ζ quality). For all but the smallest problem sizes of the present study, the optimised accelerated codes (using a single multi-core CPU host node in conjunction with six GPUs) are found to be capable of reducing the total time-to-solution by at least an order of magnitude over optimised, OpenMP-threaded CPU-only reference implementations.

  19. Efficiently modeling neural networks on massively parallel computers

    NASA Technical Reports Server (NTRS)

    Farber, Robert M.

    1993-01-01

    Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.

  20. Flexible Environments for Grand-Challenge Simulation in Climate Science

    NASA Astrophysics Data System (ADS)

    Pierrehumbert, R.; Tobis, M.; Lin, J.; Dieterich, C.; Caballero, R.

    2004-12-01

    Current climate models are monolithic codes, generally in Fortran, aimed at high-performance simulation of the modern climate. Though they adequately serve their designated purpose, they present major barriers to application in other problems. Tailoring them to paleoclimate of planetary simulations, for instance, takes months of work. Theoretical studies, where one may want to remove selected processes or break feedback loops, are similarly hindered. Further, current climate models are of little value in education, since the implementation of textbook concepts and equations in the code is obscured by technical detail. The Climate Systems Center at the University of Chicago seeks to overcome these limitations by bringing modern object-oriented design into the business of climate modeling. Our ultimate goal is to produce an end-to-end modeling environment capable of configuring anything from a simple single-column radiative-convective model to a full 3-D coupled climate model using a uniform, flexible interface. Technically, the modeling environment is implemented as a Python-based software component toolkit: key number-crunching procedures are implemented as discrete, compiled-language components 'glued' together and co-ordinated by Python, combining the high performance of compiled languages and the flexibility and extensibility of Python. We are incrementally working towards this final objective following a series of distinct, complementary lines. We will present an overview of these activities, including PyOM, a Python-based finite-difference ocean model allowing run-time selection of different Arakawa grids and physical parameterizations; CliMT, an atmospheric modeling toolkit providing a library of 'legacy' radiative, convective and dynamical modules which can be knitted into dynamical models, and PyCCSM, a version of NCAR's Community Climate System Model in which the coupler and run-control architecture are re-implemented in Python, augmenting its flexibility and adaptability.

  1. "I got it on Ebay!": cost-effective approach to surgical skills laboratories.

    PubMed

    Schneider, Ethan; Schenarts, Paul J; Shostrom, Valerie; Schenarts, Kimberly D; Evans, Charity H

    2017-01-01

    Surgical education is witnessing a surge in the use of simulation. However, implementation of simulation is often cost-prohibitive. Online shopping offers a low budget alternative. The aim of this study was to implement cost-effective skills laboratories and analyze online versus manufacturers' prices to evaluate for savings. Four skills laboratories were designed for the surgery clerkship from July 2014 to June 2015. Skills laboratories were implemented using hand-built simulation and instruments purchased online. Trademarked simulation was priced online and instruments priced from a manufacturer. Costs were compiled, and a descriptive cost analysis of online and manufacturers' prices was performed. Learners rated their level of satisfaction for all educational activities, and levels of satisfaction were compared. A total of 119 third-year medical students participated. Supply lists and costs were compiled for each laboratory. A descriptive cost analysis of online and manufacturers' prices showed online prices were substantially lower than manufacturers, with a per laboratory savings of: $1779.26 (suturing), $1752.52 (chest tube), $2448.52 (anastomosis), and $1891.64 (laparoscopic), resulting in a year 1 savings of $47,285. Mean student satisfaction scores for the skills laboratories were 4.32, with statistical significance compared to live lectures at 2.96 (P < 0.05) and small group activities at 3.67 (P < 0.05). A cost-effective approach for implementation of skills laboratories showed substantial savings. By using hand-built simulation boxes and online resources to purchase surgical equipment, surgical educators overcome financial obstacles limiting the use of simulation and provide learning opportunities that medical students perceive as beneficial. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. Parallel optimization algorithms and their implementation in VLSI design

    NASA Technical Reports Server (NTRS)

    Lee, G.; Feeley, J. J.

    1991-01-01

    Two new parallel optimization algorithms based on the simplex method are described. They may be executed by a SIMD parallel processor architecture and be implemented in VLSI design. Several VLSI design implementations are introduced. An application example is reported to demonstrate that the algorithms are effective.

  3. Financing and funding health care: Optimal policy and political implementability.

    PubMed

    Nuscheler, Robert; Roeder, Kerstin

    2015-07-01

    Health care financing and funding are usually analyzed in isolation. This paper combines the corresponding strands of the literature and thereby advances our understanding of the important interaction between them. We investigate the impact of three modes of health care financing, namely, optimal income taxation, proportional income taxation, and insurance premiums, on optimal provider payment and on the political implementability of optimal policies under majority voting. Considering a standard multi-task agency framework we show that optimal health care policies will generally differ across financing regimes when the health authority has redistributive concerns. We show that health care financing also has a bearing on the political implementability of optimal health care policies. Our results demonstrate that an isolated analysis of (optimal) provider payment rests on very strong assumptions regarding both the financing of health care and the redistributive preferences of the health authority. Copyright © 2015 Elsevier B.V. All rights reserved.

  4. Systems, methods and apparatus for implementation of formal specifications derived from informal requirements

    NASA Technical Reports Server (NTRS)

    Hinchey, Michael G. (Inventor); Rouff, Christopher A. (Inventor); Rash, James L. (Inventor); Erickson, John D. (Inventor); Gracinin, Denis (Inventor)

    2010-01-01

    Systems, methods and apparatus are provided through which in some embodiments an informal specification is translated without human intervention into a formal specification. In some embodiments the formal specification is a process-based specification. In some embodiments, the formal specification is translated into a high-level computer programming language which is further compiled into a set of executable computer instructions.

  5. Using Food as a Tool to Teach Science to 3rd Grade Students in Appalachian Ohio

    ERIC Educational Resources Information Center

    Duffrin, Melani W.; Hovland, Jana; Carraway-Stage, Virginia; McLeod, Sara; Duffrin, Christopher; Phillips, Sharon; Rivera, David; Saum, Diana; Johanson, George; Graham, Annette; Lee, Tammy; Bosse, Michael; Berryman, Darlene

    2010-01-01

    The Food, Math, and Science Teaching Enhancement Resource (FoodMASTER) Initiative is a compilation of programs aimed at using food as a tool to teach mathematics and science. In 2007 to 2008, a foods curriculum developed by professionals in nutrition and education was implemented in 10 3rd-grade classrooms in Appalachian Ohio; teachers in these…

  6. "I Never Know What I Think Until I See What I Say," Saul Bellow. (Honing Critical Thinking Skills through Writing).

    ERIC Educational Resources Information Center

    Hallock, Sylvia M.; Downie, Susan L.

    A compilation of program ideas and related newspaper articles, cartoons, and visual aids, this booklet describes the objectives and procedure for the implementation of a pilot program at Frank W. Cox High School in Virginia Beach, Virginia, that emphasizes ethics education and the value of writing as a means to promote critical thinking. The first…

  7. Preventive oral health intervention for pediatricians.

    PubMed

    2008-12-01

    This policy is a compilation of current concepts and scientific evidence required to understand and implement practice-based preventive oral health programs designed to improve oral health outcomes for all children and especially children at significant risk of dental decay. In addition, it reviews cariology and caries risk assessment and defines, through available evidence, appropriate recommendations for preventive oral health intervention by primary care pediatric practitioners.

  8. 27th Annual Report to Congress on the Implementation of the "Individuals with Disabilities Education Act," 2005. Volume 2

    ERIC Educational Resources Information Center

    US Department of Education, 2007

    2007-01-01

    This 2005 Annual Report to Congress has two volumes. This volume consists of tables that also were compiled from data provided by the states. Such data are required under the law. In fact, collection and analysis of these data are the primary means by which the Office of Special Education Programs (OSEP) monitors activities under the…

  9. Solid Waste Processing. A State-of-the-Art Report on Unit Operations and Processes.

    ERIC Educational Resources Information Center

    Engdahl, Richard B.

    The importance and intricacy of the solid wastes disposal problem and the need to deal with it effectively and economically led to the state-of-the-art survey covered by this report. The material presented here was compiled to be used by those in government and private industry who must make or implement decisions concerning the processing of…

  10. Bibliography On Multiprocessors And Distributed Processing

    NASA Technical Reports Server (NTRS)

    Miya, Eugene N.

    1988-01-01

    Multiprocessor and Distributed Processing Bibliography package consists of large machine-readable bibliographic data base, which in addition to usual keyword searches, used for producing citations, indexes, and cross-references. Data base contains UNIX(R) "refer" -formatted ASCII data and implemented on any computer running under UNIX(R) operating system. Easily convertible to other operating systems. Requires approximately one megabyte of secondary storage. Bibliography compiled in 1985.

  11. Automatic controls and regulators: A compilation

    NASA Technical Reports Server (NTRS)

    1974-01-01

    Devices, methods, and techniques for control and regulation of the mechanical/physical functions involved in implementing the space program are discussed. Section one deals with automatic controls considered to be, essentially, start-stop operations or those holding the activity in a desired constraint. Devices that may be used to regulate activities within desired ranges or subject them to predetermined changes are dealt with in section two.

  12. Compiled MPI: Cost-Effective Exascale Applications Development

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bronevetsky, G; Quinlan, D; Lumsdaine, A

    2012-04-10

    The complexity of petascale and exascale machines makes it increasingly difficult to develop applications that can take advantage of them. Future systems are expected to feature billion-way parallelism, complex heterogeneous compute nodes and poor availability of memory (Peter Kogge, 2008). This new challenge for application development is motivating a significant amount of research and development on new programming models and runtime systems designed to simplify large-scale application development. Unfortunately, DoE has significant multi-decadal investment in a large family of mission-critical scientific applications. Scaling these applications to exascale machines will require a significant investment that will dwarf the costs of hardwaremore » procurement. A key reason for the difficulty in transitioning today's applications to exascale hardware is their reliance on explicit programming techniques, such as the Message Passing Interface (MPI) programming model to enable parallelism. MPI provides a portable and high performance message-passing system that enables scalable performance on a wide variety of platforms. However, it also forces developers to lock the details of parallelization together with application logic, making it very difficult to adapt the application to significant changes in the underlying system. Further, MPI's explicit interface makes it difficult to separate the application's synchronization and communication structure, reducing the amount of support that can be provided by compiler and run-time tools. This is in contrast to the recent research on more implicit parallel programming models such as Chapel, OpenMP and OpenCL, which promise to provide significantly more flexibility at the cost of reimplementing significant portions of the application. We are developing CoMPI, a novel compiler-driven approach to enable existing MPI applications to scale to exascale systems with minimal modifications that can be made incrementally over the application's lifetime. It includes: (1) New set of source code annotations, inserted either manually or automatically, that will clarify the application's use of MPI to the compiler infrastructure, enabling greater accuracy where needed; (2) A compiler transformation framework that leverages these annotations to transform the original MPI source code to improve its performance and scalability; (3) Novel MPI runtime implementation techniques that will provide a rich set of functionality extensions to be used by applications that have been transformed by our compiler; and (4) A novel compiler analysis that leverages simple user annotations to automatically extract the application's communication structure and synthesize most complex code annotations.« less

  13. ROSE Version 1.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Quinlan, D.; Yi, Q.; Buduc, R.

    2005-02-17

    ROSE is an object-oriented software infrastructure for source-to-source translation that provides an interface for programmers to write their own specialized translators for optimizing scientific applications. ROSE is a part of current research on telescoping languages, which provides optimizations of the use of libraries in scientific applications. ROSE defines approaches to extend the optimization techniques, common in well defined languages, to the optimization of scientific applications using well defined libraries. ROSE includes a rich set of tools for generating customized transformations to support optimization of applications codes. We currently support full C and C++ (including template instantiation etc.), with Fortran 90more » support under development as part of a collaboration and contract with Rice to use their version of the open source Open64 F90 front-end. ROSE represents an attempt to define an open compiler infrastructure to handle the full complexity of full scale DOE applications codes using the languages common to scientific computing within DOE. We expect that such an infrastructure will also be useful for the development of numerous tools that may then realistically expect to work on DOE full scale applications.« less

  14. Optimization of lattice surgery is NP-hard

    NASA Astrophysics Data System (ADS)

    Herr, Daniel; Nori, Franco; Devitt, Simon J.

    2017-09-01

    The traditional method for computation in either the surface code or in the Raussendorf model is the creation of holes or "defects" within the encoded lattice of qubits that are manipulated via topological braiding to enact logic gates. However, this is not the only way to achieve universal, fault-tolerant computation. In this work, we focus on the lattice surgery representation, which realizes transversal logic operations without destroying the intrinsic 2D nearest-neighbor properties of the braid-based surface code and achieves universality without defects and braid-based logic. For both techniques there are open questions regarding the compilation and resource optimization of quantum circuits. Optimization in braid-based logic is proving to be difficult and the classical complexity associated with this problem has yet to be determined. In the context of lattice-surgery-based logic, we can introduce an optimality condition, which corresponds to a circuit with the lowest resource requirements in terms of physical qubits and computational time, and prove that the complexity of optimizing a quantum circuit in the lattice surgery model is NP-hard.

  15. Optimization of topological quantum algorithms using Lattice Surgery is hard

    NASA Astrophysics Data System (ADS)

    Herr, Daniel; Nori, Franco; Devitt, Simon

    The traditional method for computation in the surface code or the Raussendorf model is the creation of holes or ''defects'' within the encoded lattice of qubits which are manipulated via topological braiding to enact logic gates. However, this is not the only way to achieve universal, fault-tolerant computation. In this work we turn attention to the Lattice Surgery representation, which realizes encoded logic operations without destroying the intrinsic 2D nearest-neighbor interactions sufficient for braided based logic and achieves universality without using defects for encoding information. In both braided and lattice surgery logic there are open questions regarding the compilation and resource optimization of quantum circuits. Optimization in braid-based logic is proving to be difficult to define and the classical complexity associated with this problem has yet to be determined. In the context of lattice surgery based logic, we can introduce an optimality condition, which corresponds to a circuit with lowest amount of physical qubit requirements, and prove that the complexity of optimizing the geometric (lattice surgery) representation of a quantum circuit is NP-hard.

  16. Evaluation of the Effectiveness of Stormwater Decision Support Tools for Infrastructure Selection and the Barriers to Implementation

    NASA Astrophysics Data System (ADS)

    Spahr, K.; Hogue, T. S.

    2016-12-01

    Selecting the most appropriate green, gray, and / or hybrid system for stormwater treatment and conveyance can prove challenging to decision markers across all scales, from site managers to large municipalities. To help streamline the selection process, a multi-disciplinary team of academics and professionals is developing an industry standard for selecting and evaluating the most appropriate stormwater management technology for different regions. To make the tool more robust and comprehensive, life-cycle cost assessment and optimization modules will be included to evaluate non-monetized and ecosystem benefits of selected technologies. Initial work includes surveying advisory board members based in cities that use existing decision support tools in their infrastructure planning process. These surveys will qualify the decisions currently being made and identify challenges within the current planning process across a range of hydroclimatic regions and city size. Analysis of social and other non-technical barriers to adoption of the existing tools is also being performed, with identification of regional differences and institutional challenges. Surveys will also gage the regional appropriateness of certain stormwater technologies based off experiences in implementing stormwater treatment and conveyance plans. In additional to compiling qualitative data on existing decision support tools, a technical review of components of the decision support tool used will be performed. Gaps in each tool's analysis, like the lack of certain critical functionalities, will be identified and ease of use will be evaluated. Conclusions drawn from both the qualitative and quantitative analyses will be used to inform the development of the new decision support tool and its eventual dissemination.

  17. SIMPSON: A General Simulation Program for Solid-State NMR Spectroscopy

    NASA Astrophysics Data System (ADS)

    Bak, Mads; Rasmussen, Jimmy T.; Nielsen, Niels Chr.

    2000-12-01

    A computer program for fast and accurate numerical simulation of solid-state NMR experiments is described. The program is designed to emulate a NMR spectrometer by letting the user specify high-level NMR concepts such as spin systems, nuclear spin interactions, RF irradiation, free precession, phase cycling, coherence-order filtering, and implicit/explicit acquisition. These elements are implemented using the Tcl scripting language to ensure a minimum of programming overhead and direct interpretation without the need for compilation, while maintaining the flexibility of a full-featured programming language. Basicly, there are no intrinsic limitations to the number of spins, types of interactions, sample conditions (static or spinning, powders, uniaxially oriented molecules, single crystals, or solutions), and the complexity or number of spectral dimensions for the pulse sequence. The applicability ranges from simple 1D experiments to advanced multiple-pulse and multiple-dimensional experiments, series of simulations, parameter scans, complex data manipulation/visualization, and iterative fitting of simulated to experimental spectra. A major effort has been devoted to optimizing the computation speed using state-of-the-art algorithms for the time-consuming parts of the calculations implemented in the core of the program using the C programming language. Modification and maintenance of the program are facilitated by releasing the program as open source software (General Public License) currently at http://nmr.imsb.au.dk. The general features of the program are demonstrated by numerical simulations of various aspects for REDOR, rotational resonance, DRAMA, DRAWS, HORROR, C7, TEDOR, POST-C7, CW decoupling, TPPM, F-SLG, SLF, SEMA-CP, PISEMA, RFDR, QCPMG-MAS, and MQ-MAS experiments.

  18. SIMPSON: A general simulation program for solid-state NMR spectroscopy

    NASA Astrophysics Data System (ADS)

    Bak, Mads; Rasmussen, Jimmy T.; Nielsen, Niels Chr.

    2011-12-01

    A computer program for fast and accurate numerical simulation of solid-state NMR experiments is described. The program is designed to emulate a NMR spectrometer by letting the user specify high-level NMR concepts such as spin systems, nuclear spin interactions, RF irradiation, free precession, phase cycling, coherence-order filtering, and implicit/explicit acquisition. These elements are implemented using the Tel scripting language to ensure a minimum of programming overhead and direct interpretation without the need for compilation, while maintaining the flexibility of a full-featured programming language. Basicly, there are no intrinsic limitations to the number of spins, types of interactions, sample conditions (static or spinning, powders, uniaxially oriented molecules, single crystals, or solutions), and the complexity or number of spectral dimensions for the pulse sequence. The applicability ranges from simple ID experiments to advanced multiple-pulse and multiple-dimensional experiments, series of simulations, parameter scans, complex data manipulation/visualization, and iterative fitting of simulated to experimental spectra. A major effort has been devoted to optimizing the computation speed using state-of-the-art algorithms for the time-consuming parts of the calculations implemented in the core of the program using the C programming language. Modification and maintenance of the program are facilitated by releasing the program as open source software (General Public License) currently at http://nmr.imsb.au.dk. The general features of the program are demonstrated by numerical simulations of various aspects for REDOR, rotational resonance, DRAMA, DRAWS, HORROR, C7, TEDOR, POST-C7, CW decoupling, TPPM, F-SLG, SLF, SEMA-CP, PISEMA, RFDR, QCPMG-MAS, and MQ-MAS experiments.

  19. Feedback Implementation of Zermelo's Optimal Control by Sugeno Approximation

    NASA Technical Reports Server (NTRS)

    Clifton, C.; Homaifax, A.; Bikdash, M.

    1997-01-01

    This paper proposes an approach to implement optimal control laws of nonlinear systems in real time. Our methodology does not require solving two-point boundary value problems online and may not require it off-line either. The optimal control law is learned using the original Sugeno controller (OSC) from a family of optimal trajectories. We compare the trajectories generated by the OSC and the trajectories yielded by the optimal feedback control law when applied to Zermelo's ship steering problem.

  20. Architectural-level power estimation and experimentation

    NASA Astrophysics Data System (ADS)

    Ye, Wu

    With the emergence of a plethora of embedded and portable applications and ever increasing integration levels, power dissipation of integrated circuits has moved to the forefront as a design constraint. Recent years have also seen a significant trend towards designs starting at the architectural (or RT) level. Those demand accurate yet fast RT level power estimation methodologies and tools. This thesis addresses issues and experiments associate with architectural level power estimation. An execution driven, cycle-accurate RT level power simulator, SimplePower, was developed using transition-sensitive energy models. It is based on the architecture of a five-stage pipelined RISC datapath for both 0.35mum and 0.8mum technology and can execute the integer subset of the instruction set of SimpleScalar . SimplePower measures the energy consumed in the datapath, memory and on-chip buses. During the development of SimplePower , a partitioning power modeling technique was proposed to model the energy consumed in complex functional units. The accuracy of this technique was validated with HSPICE simulation results for a register file and a shifter. A novel, selectively gated pipeline register optimization technique was proposed to reduce the datapath energy consumption. It uses the decoded control signals to selectively gate the data fields of the pipeline registers. Simulation results show that this technique can reduce the datapath energy consumption by 18--36% for a set of benchmarks. A low-level back-end compiler optimization, register relabeling, was applied to reduce the on-chip instruction cache data bus switch activities. Its impact was evaluated by SimplePower. Results show that it can reduce the energy consumed in the instruction data buses by 3.55--16.90%. A quantitative evaluation was conducted for the impact of six state-of-art high-level compilation techniques on both datapath and memory energy consumption. The experimental results provide a valuable insight for designers to develop future power-aware compilation frameworks for embedded systems.

  1. From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    DOE PAGES

    Blazewicz, Marek; Hinder, Ian; Koppelman, David M.; ...

    2013-01-01

    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization ismore » based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.« less

  2. DTS: Building custom, intelligent schedulers

    NASA Technical Reports Server (NTRS)

    Hansson, Othar; Mayer, Andrew

    1994-01-01

    DTS is a decision-theoretic scheduler, built on top of a flexible toolkit -- this paper focuses on how the toolkit might be reused in future NASA mission schedulers. The toolkit includes a user-customizable scheduling interface, and a 'Just-For-You' optimization engine. The customizable interface is built on two metaphors: objects and dynamic graphs. Objects help to structure problem specifications and related data, while dynamic graphs simplify the specification of graphical schedule editors (such as Gantt charts). The interface can be used with any 'back-end' scheduler, through dynamically-loaded code, interprocess communication, or a shared database. The 'Just-For-You' optimization engine includes user-specific utility functions, automatically compiled heuristic evaluations, and a postprocessing facility for enforcing scheduling policies. The optimization engine is based on BPS, the Bayesian Problem-Solver (1,2), which introduced a similar approach to solving single-agent and adversarial graph search problems.

  3. Spectral optimization and uncertainty quantification in combustion modeling

    NASA Astrophysics Data System (ADS)

    Sheen, David Allan

    Reliable simulations of reacting flow systems require a well-characterized, detailed chemical model as a foundation. Accuracy of such a model can be assured, in principle, by a multi-parameter optimization against a set of experimental data. However, the inherent uncertainties in the rate evaluations and experimental data leave a model still characterized by some finite kinetic rate parameter space. Without a careful analysis of how this uncertainty space propagates into the model's predictions, those predictions can at best be trusted only qualitatively. In this work, the Method of Uncertainty Minimization using Polynomial Chaos Expansions is proposed to quantify these uncertainties. In this method, the uncertainty in the rate parameters of the as-compiled model is quantified. Then, the model is subjected to a rigorous multi-parameter optimization, as well as a consistency-screening process. Lastly, the uncertainty of the optimized model is calculated using an inverse spectral optimization technique, and then propagated into a range of simulation conditions. An as-compiled, detailed H2/CO/C1-C4 kinetic model is combined with a set of ethylene combustion data to serve as an example. The idea that the hydrocarbon oxidation model should be understood and developed in a hierarchical fashion has been a major driving force in kinetics research for decades. How this hierarchical strategy works at a quantitative level, however, has never been addressed. In this work, we use ethylene and propane combustion as examples and explore the question of hierarchical model development quantitatively. The Method of Uncertainty Minimization using Polynomial Chaos Expansions is utilized to quantify the amount of information that a particular combustion experiment, and thereby each data set, contributes to the model. This knowledge is applied to explore the relationships among the combustion chemistry of hydrogen/carbon monoxide, ethylene, and larger alkanes. Frequently, new data will become available, and it will be desirable to know the effect that inclusion of these data has on the optimized model. Two cases are considered here. In the first, a study of H2/CO mass burning rates has recently been published, wherein the experimentally-obtained results could not be reconciled with any extant H2/CO oxidation model. It is shown in that an optimized H2/CO model can be developed that will reproduce the results of the new experimental measurements. In addition, the high precision of the new experiments provide a strong constraint on the reaction rate parameters of the chemistry model, manifested in a significant improvement in the precision of simulations. In the second case, species time histories were measured during n-heptane oxidation behind reflected shock waves. The highly precise nature of these measurements is expected to impose critical constraints on chemical kinetic models of hydrocarbon combustion. The results show that while an as-compiled, prior reaction model of n-alkane combustion can be accurate in its prediction of the detailed species profiles, the kinetic parameter uncertainty in the model remains to be too large to obtain a precise prediction of the data. Constraining the prior model against the species time histories within the measurement uncertainties led to notable improvements in the precision of model predictions against the species data as well as the global combustion properties considered. Lastly, we show that while the capability of the multispecies measurement presents a step-change in our precise knowledge of the chemical processes in hydrocarbon combustion, accurate data of global combustion properties are still necessary to predict fuel combustion.

  4. Design of a Knowledge Driven HIS

    PubMed Central

    Pryor, T. Allan; Clayton, Paul D.; Haug, Peter J.; Wigertz, Ove

    1987-01-01

    Design of the software architecture for a knowledge driven HIS is presented. In our design the frame has been used as the basic unit of knowledge representation. The structure of the frame is being designed to be sufficiently universal to contain knowledge required to implement not only expert systems, but almost all traditional HIS functions including ADT, order entry and results review. The design incorporates a two level format for the knowledge. The first level as ASCII records is used to maintain the knowledge base while the second level converted by special knowledge compilers to standard computer languages is used for efficient implementation of the knowledge applications.

  5. Extended computational kernels in a massively parallel implementation of the Trotter-Suzuki approximation

    NASA Astrophysics Data System (ADS)

    Wittek, Peter; Calderaro, Luca

    2015-12-01

    We extended a parallel and distributed implementation of the Trotter-Suzuki algorithm for simulating quantum systems to study a wider range of physical problems and to make the library easier to use. The new release allows periodic boundary conditions, many-body simulations of non-interacting particles, arbitrary stationary potential functions, and imaginary time evolution to approximate the ground state energy. The new release is more resilient to the computational environment: a wider range of compiler chains and more platforms are supported. To ease development, we provide a more extensive command-line interface, an application programming interface, and wrappers from high-level languages.

  6. Human factors in incident reporting

    NASA Technical Reports Server (NTRS)

    Jones, S. G.

    1993-01-01

    The paper proposes a cooperative research effort be undertaken by academic institutions and industry organizations toward the compilation of a human factors data base in conjunction with technical information. Team members in any discipline can benefit and learn from observing positive examples of decision making and performance by crews under stressful or less than optimal circumstances. The opportunity to note trends in interpersonal and interactive behaviors and to categorize them is terms of more or less desirable outcomes should not be missed.

  7. [Limited access to the international medical literature in Russia].

    PubMed

    Jargin, Sergei V

    2012-06-01

    Limited access to foreign professional literature in the former Soviet Union had consequences for public health: persistence of some outdated methods and approaches. Several examples are discussed in this letter. The shortage of foreign literature has been partly compensated by domestic editions, sometimes containing compilations from foreign sources, borrowings without references, and mistranslations. International literature is on average scarcely quoted in Russian language scientific publications. Today, however, there are grounds for optimism: the economic upturn must bring improvements.

  8. Approaches in highly parameterized inversion—PEST++ Version 3, a Parameter ESTimation and uncertainty analysis software suite optimized for large environmental models

    USGS Publications Warehouse

    Welter, David E.; White, Jeremy T.; Hunt, Randall J.; Doherty, John E.

    2015-09-18

    The PEST++ Version 3 software suite can be compiled for Microsoft Windows®4 and Linux®5 operating systems; the source code is available in a Microsoft Visual Studio®6 2013 solution; Linux Makefiles are also provided. PEST++ Version 3 continues to build a foundation for an open-source framework capable of producing robust and efficient parameter estimation tools for large environmental models.

  9. Cedar-a large scale multiprocessor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gajski, D.; Kuck, D.; Lawrie, D.

    1983-01-01

    This paper presents an overview of Cedar, a large scale multiprocessor being designed at the University of Illinois. This machine is designed to accommodate several thousand high performance processors which are capable of working together on a single job, or they can be partitioned into groups of processors where each group of one or more processors can work on separate jobs. Various aspects of the machine are described including the control methodology, communication network, optimizing compiler and plans for construction. 13 references.

  10. Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

    NASA Astrophysics Data System (ADS)

    Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei

    We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.

  11. Spiking neuron network Helmholtz machine.

    PubMed

    Sountsov, Pavel; Miller, Paul

    2015-01-01

    An increasing amount of behavioral and neurophysiological data suggests that the brain performs optimal (or near-optimal) probabilistic inference and learning during perception and other tasks. Although many machine learning algorithms exist that perform inference and learning in an optimal way, the complete description of how one of those algorithms (or a novel algorithm) can be implemented in the brain is currently incomplete. There have been many proposed solutions that address how neurons can perform optimal inference but the question of how synaptic plasticity can implement optimal learning is rarely addressed. This paper aims to unify the two fields of probabilistic inference and synaptic plasticity by using a neuronal network of realistic model spiking neurons to implement a well-studied computational model called the Helmholtz Machine. The Helmholtz Machine is amenable to neural implementation as the algorithm it uses to learn its parameters, called the wake-sleep algorithm, uses a local delta learning rule. Our spiking-neuron network implements both the delta rule and a small example of a Helmholtz machine. This neuronal network can learn an internal model of continuous-valued training data sets without supervision. The network can also perform inference on the learned internal models. We show how various biophysical features of the neural implementation constrain the parameters of the wake-sleep algorithm, such as the duration of the wake and sleep phases of learning and the minimal sample duration. We examine the deviations from optimal performance and tie them to the properties of the synaptic plasticity rule.

  12. Spiking neuron network Helmholtz machine

    PubMed Central

    Sountsov, Pavel; Miller, Paul

    2015-01-01

    An increasing amount of behavioral and neurophysiological data suggests that the brain performs optimal (or near-optimal) probabilistic inference and learning during perception and other tasks. Although many machine learning algorithms exist that perform inference and learning in an optimal way, the complete description of how one of those algorithms (or a novel algorithm) can be implemented in the brain is currently incomplete. There have been many proposed solutions that address how neurons can perform optimal inference but the question of how synaptic plasticity can implement optimal learning is rarely addressed. This paper aims to unify the two fields of probabilistic inference and synaptic plasticity by using a neuronal network of realistic model spiking neurons to implement a well-studied computational model called the Helmholtz Machine. The Helmholtz Machine is amenable to neural implementation as the algorithm it uses to learn its parameters, called the wake-sleep algorithm, uses a local delta learning rule. Our spiking-neuron network implements both the delta rule and a small example of a Helmholtz machine. This neuronal network can learn an internal model of continuous-valued training data sets without supervision. The network can also perform inference on the learned internal models. We show how various biophysical features of the neural implementation constrain the parameters of the wake-sleep algorithm, such as the duration of the wake and sleep phases of learning and the minimal sample duration. We examine the deviations from optimal performance and tie them to the properties of the synaptic plasticity rule. PMID:25954191

  13. On optimal infinite impulse response edge detection filters

    NASA Technical Reports Server (NTRS)

    Sarkar, Sudeep; Boyer, Kim L.

    1991-01-01

    The authors outline the design of an optimal, computationally efficient, infinite impulse response edge detection filter. The optimal filter is computed based on Canny's high signal to noise ratio, good localization criteria, and a criterion on the spurious response of the filter to noise. An expression for the width of the filter, which is appropriate for infinite-length filters, is incorporated directly in the expression for spurious responses. The three criteria are maximized using the variational method and nonlinear constrained optimization. The optimal filter parameters are tabulated for various values of the filter performance criteria. A complete methodology for implementing the optimal filter using approximating recursive digital filtering is presented. The approximating recursive digital filter is separable into two linear filters operating in two orthogonal directions. The implementation is very simple and computationally efficient, has a constant time of execution for different sizes of the operator, and is readily amenable to real-time hardware implementation.

  14. The effect of Fisher information matrix approximation methods in population optimal design calculations.

    PubMed

    Strömberg, Eric A; Nyberg, Joakim; Hooker, Andrew C

    2016-12-01

    With the increasing popularity of optimal design in drug development it is important to understand how the approximations and implementations of the Fisher information matrix (FIM) affect the resulting optimal designs. The aim of this work was to investigate the impact on design performance when using two common approximations to the population model and the full or block-diagonal FIM implementations for optimization of sampling points. Sampling schedules for two example experiments based on population models were optimized using the FO and FOCE approximations and the full and block-diagonal FIM implementations. The number of support points was compared between the designs for each example experiment. The performance of these designs based on simulation/estimations was investigated by computing bias of the parameters as well as through the use of an empirical D-criterion confidence interval. Simulations were performed when the design was computed with the true parameter values as well as with misspecified parameter values. The FOCE approximation and the Full FIM implementation yielded designs with more support points and less clustering of sample points than designs optimized with the FO approximation and the block-diagonal implementation. The D-criterion confidence intervals showed no performance differences between the full and block diagonal FIM optimal designs when assuming true parameter values. However, the FO approximated block-reduced FIM designs had higher bias than the other designs. When assuming parameter misspecification in the design evaluation, the FO Full FIM optimal design was superior to the FO block-diagonal FIM design in both of the examples.

  15. WIS Implementation Study Report. Volume 2. Resumes.

    DTIC Science & Technology

    1983-10-01

    WIS modernization that major attention be paid to interface definition and design, system integra- tion and test , and configuration management of the...Estimates -- Computer Corporation of America -- 155 Test Processing Systems -- Newburyport Computer Associates, Inc. -- 183 Cluster II Papers-- Standards...enhancements of the SPL/I compiler system, development of test systems for the verification of SDEX/M and the timing and architecture of the AN/U YK-20 and

  16. Trimming the Fat in America's Schools: Where Are We One Year Following Implementation of Federally Mandated Local School Wellness Plans (LSWPs)?

    ERIC Educational Resources Information Center

    Curriculum Review, 2007

    2007-01-01

    This article presents the results of a survey by the School Nutrition Association (SNA) on the year-long adoption of wellness policies of 15,000 local schools nationwide. Released September 5, SNA's "From Cupcakes to Carrots: Local Wellness Policies One Year Later" was compiled from a survey of 976 school nutrition directors conducted in May 2007.…

  17. Schedulers with load-store queue awareness

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Tong; Eichenberger, Alexandre E.; Jacob, Arpith C.

    2017-02-07

    In one embodiment, a computer-implemented method includes tracking a size of a load-store queue (LSQ) during compile time of a program. The size of the LSQ is time-varying and indicates how many memory access instructions of the program are on the LSQ. The method further includes scheduling, by a computer processor, a plurality of memory access instructions of the program based on the size of the LSQ.

  18. Schedulers with load-store queue awareness

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Tong; Eichenberger, Alexandre E.; Jacob, Arpith C.

    2017-01-24

    In one embodiment, a computer-implemented method includes tracking a size of a load-store queue (LSQ) during compile time of a program. The size of the LSQ is time-varying and indicates how many memory access instructions of the program are on the LSQ. The method further includes scheduling, by a computer processor, a plurality of memory access instructions of the program based on the size of the LSQ.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bonachea, Dan; Hargrove, P.

    GASNet is a language-independent, low-level networking layer that provides network-independent, high-performance communication primitives tailored for implementing parallel global address space SPMD languages and libraries such as UPC, UPC++, Co-Array Fortran, Legion, Chapel, and many others. The interface is primarily intended as a compilation target and for use by runtime library writers (as opposed to end users), and the primary goals are high performance, interface portability, and expressiveness. GASNet stands for "Global-Address Space Networking".

  20. Current and Future Applications of Machine Learning for the US Army

    DTIC Science & Technology

    2018-04-13

    designing from the unwieldy application of the first principles of flight controls, aerodynamics, blade propulsion, and so on, the designers turned...when the number of features runs into millions can become challenging. To overcome these issues, regularization techniques have been developed which...and compiled to run efficiently on either CPU or GPU architectures. 5) Keras63 is a library that contains numerous implementations of commonly used

Top