phoenixcloud provisioning runtime: Topics by Science.gov

Sample records for phoenixcloud provisioning runtime

Hierarchical and hybrid energy storage devices in data centers: Architecture, control and provisioning.

PubMed

Sun, Mengshu; Xue, Yuankun; Bogdan, Paul; Tang, Jian; Wang, Yanzhi; Lin, Xue

2018-01-01

Recently, a new approach has been introduced that leverages and over-provisions energy storage devices (ESDs) in data centers for performing power capping and facilitating capex/opex reductions, without performance overhead. To fully realize the potential benefits of the hierarchical ESD structure, we propose a comprehensive design, control, and provisioning framework including (i) designing power delivery architecture supporting hierarchical ESD structure and hybrid ESDs for some levels, as well as (ii) control and provisioning of the hierarchical ESD structure including run-time ESD charging/discharging control and design-time determination of ESD types, homogeneous/hybrid options, ESD provisioning at each level. Experiments have been conducted using real Google data center workloads based on realistic data center specifications.
Hierarchical and hybrid energy storage devices in data centers: Architecture, control and provisioning

PubMed Central

Xue, Yuankun; Bogdan, Paul; Tang, Jian; Wang, Yanzhi; Lin, Xue

2018-01-01

Recently, a new approach has been introduced that leverages and over-provisions energy storage devices (ESDs) in data centers for performing power capping and facilitating capex/opex reductions, without performance overhead. To fully realize the potential benefits of the hierarchical ESD structure, we propose a comprehensive design, control, and provisioning framework including (i) designing power delivery architecture supporting hierarchical ESD structure and hybrid ESDs for some levels, as well as (ii) control and provisioning of the hierarchical ESD structure including run-time ESD charging/discharging control and design-time determination of ESD types, homogeneous/hybrid options, ESD provisioning at each level. Experiments have been conducted using real Google data center workloads based on realistic data center specifications. PMID:29351553
Design for Run-Time Monitor on Cloud Computing

NASA Astrophysics Data System (ADS)

Kang, Mikyung; Kang, Dong-In; Yun, Mira; Park, Gyung-Leen; Lee, Junghoon

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is the type of a parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring the system status change, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize resources on cloud computing. RTM monitors application software through library instrumentation as well as underlying hardware through performance counter optimizing its computing configuration based on the analyzed data.
Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing

PubMed Central

Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon

2011-01-01

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811
Design and development of a run-time monitor for multi-core architectures in cloud computing.

PubMed

Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon

2011-01-01

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.
Ada (trademark) projects at NASA. Runtime environment issues and recommendations

NASA Technical Reports Server (NTRS)

Roy, Daniel M.; Wilke, Randall W.

1988-01-01

Ada practitioners should use this document to discuss and establish common short term requirements for Ada runtime environments. The major current Ada runtime environment issues are identified through the analysis of some of the Ada efforts at NASA and other research centers. The runtime environment characteristics of major compilers are compared while alternate runtime implementations are reviewed. Modifications and extensions to the Ada Language Reference Manual to address some of these runtime issues are proposed. Three classes of projects focusing on the most critical runtime features of Ada are recommended, including a range of immediately feasible full scale Ada development projects. Also, a list of runtime features and procurement issues is proposed for consideration by the vendors, contractors and the government.
Apparatuses and Methods for Producing Runtime Architectures of Computer Program Modules

NASA Technical Reports Server (NTRS)

Abi-Antoun, Marwan Elia (Inventor); Aldrich, Jonathan Erik (Inventor)

2013-01-01

Apparatuses and methods for producing run-time architectures of computer program modules. One embodiment includes creating an abstract graph from the computer program module and from containment information corresponding to the computer program module, wherein the abstract graph has nodes including types and objects, and wherein the abstract graph relates an object to a type, and wherein for a specific object the abstract graph relates the specific object to a type containing the specific object; and creating a runtime graph from the abstract graph, wherein the runtime graph is a representation of the true runtime object graph, wherein the runtime graph represents containment information such that, for a specific object, the runtime graph relates the specific object to another object that contains the specific object.
The HARNESS Workbench: Unified and Adaptive Access to Diverse HPC Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sunderam, Vaidy S.

2012-03-20

The primary goal of the Harness WorkBench (HWB) project is to investigate innovative software environments that will help enhance the overall productivity of applications science on diverse HPC platforms. Two complementary frameworks were designed: one, a virtualized command toolkit for application building, deployment, and execution, that provides a common view across diverse HPC systems, in particular the DOE leadership computing platforms (Cray, IBM, SGI, and clusters); and two, a unified runtime environment that consolidates access to runtime services via an adaptive framework for execution-time and post processing activities. A prototype of the first was developed based on the concept ofmore » a 'system-call virtual machine' (SCVM), to enhance portability of the HPC application deployment process across heterogeneous high-end machines. The SCVM approach to portable builds is based on the insertion of toolkit-interpretable directives into original application build scripts. Modifications resulting from these directives preserve the semantics of the original build instruction flow. The execution of the build script is controlled by our toolkit that intercepts build script commands in a manner transparent to the end-user. We have applied this approach to a scientific production code (Gamess-US) on the Cray-XT5 machine. The second facet, termed Unibus, aims to facilitate provisioning and aggregation of multifaceted resources from resource providers and end-users perspectives. To achieve that, Unibus proposes a Capability Model and mediators (resource drivers) to virtualize access to diverse resources, and soft and successive conditioning to enable automatic and user-transparent resource provisioning. A proof of concept implementation has demonstrated the viability of this approach on high end machines, grid systems and computing clouds.« less
Towards real-time photon Monte Carlo dose calculation in the cloud

NASA Astrophysics Data System (ADS)

Ziegenhein, Peter; Kozin, Igor N.; Kamerling, Cornelis Ph; Oelfke, Uwe

2017-06-01

Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
Towards real-time photon Monte Carlo dose calculation in the cloud.

PubMed

Ziegenhein, Peter; Kozin, Igor N; Kamerling, Cornelis Ph; Oelfke, Uwe

2017-06-07

Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
Asynchronous Runtimes in Action: An Introspective Framework for a Next Gen Runtime

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suetterlein, Joshua D.; Landwehr, Joshua B.; Marquez, Andres

2016-05-23

One of the most critical challenges that new high performance systems face is the lack of system software support for these large scale systems. Investment on system stack components is essential in the development, debugging and optimization of the new emerging programming models. These emerging models have the promise to better utilize the vast hardware resources available in current and future systems. To aid in the development of applications and new system stacks, runtimes, as instances of their respective execution models, need to produce facilities to introspect their inner workings and allow an indepth attribution of performance bottlenecks and computationalmore » patterns. In other words, the runtime systems need to reduce their opacity to observers so that users of a novel program execution model can adapt their designs to fit the intended model usage, regardless of the layer that they are working on. This design/development loop (akin to co-design) enables synergistic opportunities across the entire computational stack. This paper presents the design and implementation of a simple “gray” box performance attribution harness running inside a fine grain runtime system: the Open Community Runtime (OCR). We showcase what such a framework can indicate regarding the runtime behavior while running at scale. To this end, we have designed a set of synthetic scenarios aimed to test the runtime at their best and worst cases. We present an analysis of the most important runtime features, properties and idiosyncrasies that will affect the development of new runtime features, algorithmic selection, and application development.« less
Adaptive runtime for a multiprocessing API

DOEpatents

Antao, Samuel F.; Bertolli, Carlo; Eichenberger, Alexandre E.; O'Brien, John K.

2016-11-15

A computer-implemented method includes selecting a runtime for executing a program. The runtime includes a first combination of feature implementations, where each feature implementation implements a feature of an application programming interface (API). Execution of the program is monitored, and the execution uses the runtime. Monitor data is generated based on the monitoring. A second combination of feature implementations are selected, by a computer processor, where the selection is based at least in part on the monitor data. The runtime is modified by activating the second combination of feature implementations to replace the first combination of feature implementations.
Adaptive runtime for a multiprocessing API

DOEpatents

Antao, Samuel F.; Bertolli, Carlo; Eichenberger, Alexandre E.; O'Brien, John K.

2016-10-11

A computer-implemented method includes selecting a runtime for executing a program. The runtime includes a first combination of feature implementations, where each feature implementation implements a feature of an application programming interface (API). Execution of the program is monitored, and the execution uses the runtime. Monitor data is generated based on the monitoring. A second combination of feature implementations are selected, by a computer processor, where the selection is based at least in part on the monitor data. The runtime is modified by activating the second combination of feature implementations to replace the first combination of feature implementations.
ASC ATDM Level 2 Milestone #5325: Asynchronous Many-Task Runtime System Analysis and Assessment for Next Generation Platforms.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, Gavin Matthew; Bettencourt, Matthew Tyler; Bova, Steven W.

2015-09-01

This report provides in-depth information and analysis to help create a technical road map for developing next- generation Orogramming mocleN and runtime systemsl that support Advanced Simulation and Computing (ASC) work- load requirements. The focus herein is on 4synchronous many-task (AMT) model and runtime systems, which are of great interest in the context of "Oriascale7 computing, as they hold the promise to address key issues associated with future extreme-scale computer architectures. This report includes a thorough qualitative and quantitative examination of three best-of-class AIM] runtime systemsHCharm-HE, Legion, and Uintah, all of which are in use as part of the Centers.more » The studies focus on each of the runtimes' programmability, performance, and mutability. Through the experiments and analysis presented, several overarching Predictive Science Academic Alliance Program II (PSAAP-II) Ascl findings emerge. From a performance perspective, AIVT11runtimes show tremendous potential for addressing extreme- scale challenges. Empirical studies show an AM11 runtime can mitigate performance heterogeneity inherent to the machine itself and that Message Passing Interface (MP1) and AM11runtimes perform comparably under balanced con- ditions. From a programmability and mutability perspective however, none of the runtimes in this study are currently ready for use in developing production-ready Sandia ASCIapplications. The report concludes by recommending a co- design path forward, wherein application, programming model, and runtime system developers work together to define requirements and solutions. Such a requirements-driven co-design approach benefits the community as a whole, with widespread community engagement mitigating risk for both application developers developers. and high-performance computing inntime systein« less
DARMA v. Beta 0.5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hollman, David; Lifflander, Jonathon; Wilke, Jeremiah

2017-03-14

DARMA is a portability layer for asynchronous many-task (AMT) runtime systems. AMT runtime systems show promise to mitigate challenges imposed by next generation high performance computing architectures. However, current runtime system technologies are not production-ready. DARMA is a portability layer that seeks to insulate application developers from idiosyncrasies of individual runtime systems, thereby facilitating application-developer use of these technologies. DARMA comprises a frontend application programming interface (API) for application developers, a backend API for runtime system developers, and a translation that translates frontend API calls into backend API calls. Application developers use C++ abstractions to annotate both data and tasksmore » in their code. The DARMA translation layer uses C++ template metaprogramming to capture data-task dependencies, and provides this information to a potential backend runtime system via a series of backend API calls.« less
An integrated runtime and compile-time approach for parallelizing structured and block structured applications

NASA Technical Reports Server (NTRS)

Agrawal, Gagan; Sussman, Alan; Saltz, Joel

1993-01-01

Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). A combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion was described. A runtime library which can be used to port these applications on distributed memory machines was designed and implemented. The library is currently implemented on several different systems. To further ease the task of application programmers, methods were developed for integrating this runtime library with compilers for HPK-like parallel programming languages. How this runtime library was integrated with the Fortran 90D compiler being developed at Syracuse University is discussed. Experimental results to demonstrate the efficacy of our approach are presented. A multiblock Navier-Stokes solver template and a multigrid code were experimented with. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20 percent of the code parallelized by manually inserting calls to the runtime library.
2014 Runtime Systems Summit. Runtime Systems Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sarkar, Vivek; Budimlic, Zoran; Kulkani, Milind

2016-09-19

This report summarizes runtime system challenges for exascale computing, that follow from the fundamental challenges for exascale systems that have been well studied in past reports, e.g., [6, 33, 34, 32, 24]. Some of the key exascale challenges that pertain to runtime systems include parallelism, energy efficiency, memory hierarchies, data movement, heterogeneous processors and memories, resilience, performance variability, dynamic resource allocation, performance portability, and interoperability with legacy code. In addition to summarizing these challenges, the report also outlines different approaches to addressing these significant challenges that have been pursued by research projects in the DOE-sponsored X-Stack and OS/R programs. Sincemore » there is often confusion as to what exactly the term “runtime system” refers to in the software stack, we include a section on taxonomy to clarify the terminology used by participants in these research projects. In addition, we include a section on deployment opportunities for vendors and government labs to build on the research results from these projects. Finally, this report is also intended to provide a framework for discussing future research and development investments for exascale runtime systems, and for clarifying the role of runtime systems in exascale software.« less
COMP Superscalar, an interoperable programming framework

NASA Astrophysics Data System (ADS)

Badia, Rosa M.; Conejero, Javier; Diaz, Carlos; Ejarque, Jorge; Lezzi, Daniele; Lordan, Francesc; Ramon-Cortes, Cristian; Sirvent, Raul

2015-12-01

COMPSs is a programming framework that aims to facilitate the parallelization of existing applications written in Java, C/C++ and Python scripts. For that purpose, it offers a simple programming model based on sequential development in which the user is mainly responsible for (i) identifying the functions to be executed as asynchronous parallel tasks and (ii) annotating them with annotations or standard Python decorators. A runtime system is in charge of exploiting the inherent concurrency of the code, automatically detecting and enforcing the data dependencies between tasks and spawning these tasks to the available resources, which can be nodes in a cluster, clouds or grids. In cloud environments, COMPSs provides scalability and elasticity features allowing the dynamic provision of resources.
MOLAR: Modular Linux and Adaptive Runtime Support for HEC OS/R Research

DOE Office of Scientific and Technical Information (OSTI.GOV)

Frank Mueller

2009-02-05

MOLAR is a multi-institution research effort that concentrates on adaptive, reliable,and efficient operating and runtime system solutions for ultra-scale high-end scientific computing on the next generation of supercomputers. This research addresses the challenges outlined by the FAST-OS - forum to address scalable technology for runtime and operating systems --- and HECRTF --- high-end computing revitalization task force --- activities by providing a modular Linux and adaptable runtime support for high-end computing operating and runtime systems. The MOLAR research has the following goals to address these issues. (1) Create a modular and configurable Linux system that allows customized changes based onmore » the requirements of the applications, runtime systems, and cluster management software. (2) Build runtime systems that leverage the OS modularity and configurability to improve efficiency, reliability, scalability, ease-of-use, and provide support to legacy and promising programming models. (3) Advance computer reliability, availability and serviceability (RAS) management systems to work cooperatively with the OS/R to identify and preemptively resolve system issues. (4) Explore the use of advanced monitoring and adaptation to improve application performance and predictability of system interruptions. The overall goal of the research conducted at NCSU is to develop scalable algorithms for high-availability without single points of failure and without single points of control.« less
Alternatives to Re-Planning: Methods for Plan Re-Evaluation at Runtime

NASA Technical Reports Server (NTRS)

Benazera, Emmanuel

2005-01-01

Current planning algorithms have difficulty handling the complexity that is due to an increase in domain uncertainty, and especially in the case of multi-dimensional continuous spaces. Therefore, they produce plans that do not take into account numerous situations that can occur at runtime, such as faults or other changes in the planning domain itself. Thus there is a gap between the plan generation and the reality experienced at runtime. Here we present two methods that allow the plan conditionals to be revised w.r.t. uncertainty on the system as estimated at runtime.

Preventing Run-Time Bugs at Compile-Time Using Advanced C++

DOE Office of Scientific and Technical Information (OSTI.GOV)

Neswold, Richard

When writing software, we develop algorithms that tell the computer what to do at run-time. Our solutions are easier to understand and debug when they are properly modeled using class hierarchies, enumerations, and a well-factored API. Unfortunately, even with these design tools, we end up having to debug our programs at run-time. Worse still, debugging an embedded system changes its dynamics, making it tough to find and fix concurrency issues. This paper describes techniques using C++ to detect run-time bugs *at compile time*. A concurrency library, developed at Fermilab, is used for examples in illustrating these techniques.
An Analytical Framework for Runtime of a Class of Continuous Evolutionary Algorithms.

PubMed

Zhang, Yushan; Hu, Guiwu

2015-01-01

Although there have been many studies on the runtime of evolutionary algorithms in discrete optimization, relatively few theoretical results have been proposed on continuous optimization, such as evolutionary programming (EP). This paper proposes an analysis of the runtime of two EP algorithms based on Gaussian and Cauchy mutations, using an absorbing Markov chain. Given a constant variation, we calculate the runtime upper bound of special Gaussian mutation EP and Cauchy mutation EP. Our analysis reveals that the upper bounds are impacted by individual number, problem dimension number n, searching range, and the Lebesgue measure of the optimal neighborhood. Furthermore, we provide conditions whereby the average runtime of the considered EP can be no more than a polynomial of n. The condition is that the Lebesgue measure of the optimal neighborhood is larger than a combinatorial calculation of an exponential and the given polynomial of n.
Runtime Performance Monitoring Tool for RTEMS System Software

NASA Astrophysics Data System (ADS)

Cho, B.; Kim, S.; Park, H.; Kim, H.; Choi, J.; Chae, D.; Lee, J.

2007-08-01

RTEMS is a commercial-grade real-time operating system that supports multi-processor computers. However, there are not many development tools for RTEMS. In this paper, we report new RTEMS-based runtime performance monitoring tool. We have implemented a light weight runtime monitoring task with an extension to the RTEMS APIs. Using our tool, software developers can verify various performance- related parameters during runtime. Our tool can be used during software development phase and in-orbit operation as well. Our implemented target agent is light weight and has small overhead using SpaceWire interface. Efforts to reduce overhead and to add other monitoring parameters are currently under research.
Towards Run-time Assurance of Advanced Propulsion Algorithms

NASA Technical Reports Server (NTRS)

Wong, Edmond; Schierman, John D.; Schlapkohl, Thomas; Chicatelli, Amy

2014-01-01

This paper covers the motivation and rationale for investigating the application of run-time assurance methods as a potential means of providing safety assurance for advanced propulsion control systems. Certification is becoming increasingly infeasible for such systems using current verification practices. Run-time assurance systems hold the promise of certifying these advanced systems by continuously monitoring the state of the feedback system during operation and reverting to a simpler, certified system if anomalous behavior is detected. The discussion will also cover initial efforts underway to apply a run-time assurance framework to NASA's model-based engine control approach. Preliminary experimental results are presented and discussed.
Compilation time analysis to minimize run-time overhead in preemptive scheduling on multiprocessors

NASA Astrophysics Data System (ADS)

Wauters, Piet; Lauwereins, Rudy; Peperstraete, J.

1994-10-01

This paper describes a scheduling method for hard real-time Digital Signal Processing (DSP) applications, implemented on a multi-processor. Due to the very high operating frequencies of DSP applications (typically hundreds of kHz) runtime overhead should be kept as small as possible. Because static scheduling introduces very little run-time overhead it is used as much as possible. Dynamic pre-emption of tasks is allowed if and only if it leads to better performance in spite of the extra run-time overhead. We essentially combine static scheduling with dynamic pre-emption using static priorities. Since we are dealing with hard real-time applications we must be able to guarantee at compile-time that all timing requirements will be satisfied at run-time. We will show that our method performs at least as good as any static scheduling method. It also reduces the total amount of dynamic pre-emptions compared with run time methods like deadline monotonic scheduling.
Accountable Information Flow for Java-Based Web Applications

DTIC Science & Technology

2010-01-01

runtime library Swift server runtime Java servlet framework HTTP Web server Web browser Figure 2: The Swift architecture introduced an open-ended...On the server, the Java application code links against Swift’s server-side run-time library, which in turn sits on top of the standard Java servlet ...AFRL-RI-RS-TR-2010-9 Final Technical Report January 2010 ACCOUNTABLE INFORMATION FLOW FOR JAVA -BASED WEB APPLICATIONS
Application Characterization at Scale: Lessons learned from developing a distributed Open Community Runtime system for High Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Landwehr, Joshua B.; Suetterlein, Joshua D.; Marquez, Andres

2016-05-16

Since 2012, the U.S. Department of Energy’s X-Stack program has been developing solutions including runtime systems, programming models, languages, compilers, and tools for the Exascale system software to address crucial performance and power requirements. Fine grain programming models and runtime systems show a great potential to efficiently utilize the underlying hardware. Thus, they are essential to many X-Stack efforts. An abundant amount of small tasks can better utilize the vast parallelism available on current and future machines. Moreover, finer tasks can recover faster and adapt better, due to a decrease in state and control. Nevertheless, current applications have been writtenmore » to exploit old paradigms (such as Communicating Sequential Processor and Bulk Synchronous Parallel processing). To fully utilize the advantages of these new systems, applications need to be adapted to these new paradigms. As part of the applications’ porting process, in-depth characterization studies, focused on both application characteristics and runtime features, need to take place to fully understand the application performance bottlenecks and how to resolve them. This paper presents a characterization study for a novel high performance runtime system, called the Open Community Runtime, using key HPC kernels as its vehicle. This study has the following contributions: one of the first high performance, fine grain, distributed memory runtime system implementing the OCR standard (version 0.99a); and a characterization study of key HPC kernels in terms of runtime primitives running on both intra and inter node environments. Running on a general purpose cluster, we have found up to 1635x relative speed-up for a parallel tiled Cholesky Kernels on 128 nodes with 16 cores each and a 1864x relative speed-up for a parallel tiled Smith-Waterman kernel on 128 nodes with 30 cores.« less
An overview of the Opus language and runtime system

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Haines, Matthew

1994-01-01

We have recently introduced a new language, called Opus, which provides a set of Fortran language extensions that allow for integrated support of task and data parallelism. lt also provides shared data abstractions (SDA's) as a method for communication and synchronization among these tasks. In this paper, we first provide a brief description of the language features and then focus on both the language-dependent and language-independent parts of the runtime system that support the language. The language-independent portion of the runtime system supports lightweight threads across multiple address spaces, and is built upon existing lightweight thread and communication systems. The language-dependent portion of the runtime system supports conditional invocation of SDA methods and distributed SDA argument handling.
Compiler analysis for irregular problems in FORTRAN D

NASA Technical Reports Server (NTRS)

Vonhanxleden, Reinhard; Kennedy, Ken; Koelbel, Charles; Das, Raja; Saltz, Joel

1992-01-01

We developed a dataflow framework which provides a basis for rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same off-processor memory locations. Our runtime support gives us a mechanism for tracking and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-processor data. Another crucial function of the compiler analysis is to identify situations which allow runtime preprocessing overheads to be amortized. This dataflow analysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime preprocessing.
A ROSE-based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liao, C; Quinlan, D; Panas, T

2010-01-25

OpenMP is a popular and evolving programming model for shared-memory platforms. It relies on compilers for optimal performance and to target modern hardware architectures. A variety of extensible and robust research compilers are key to OpenMP's sustainable success in the future. In this paper, we present our efforts to build an OpenMP 3.0 research compiler for C, C++, and Fortran; using the ROSE source-to-source compiler framework. Our goal is to support OpenMP research for ourselves and others. We have extended ROSE's internal representation to handle all of the OpenMP 3.0 constructs and facilitate their manipulation. Since OpenMP research is oftenmore » complicated by the tight coupling of the compiler translations and the runtime system, we present a set of rules to define a common OpenMP runtime library (XOMP) on top of multiple runtime libraries. These rules additionally define how to build a set of translations targeting XOMP. Our work demonstrates how to reuse OpenMP translations across different runtime libraries. This work simplifies OpenMP research by decoupling the problematic dependence between the compiler translations and the runtime libraries. We present an evaluation of our work by demonstrating an analysis tool for OpenMP correctness. We also show how XOMP can be defined using both GOMP and Omni and present comparative performance results against other OpenMP compilers.« less
Transforming parts of a differential equations system to difference equations as a method for run-time savings in NONMEM.

PubMed

Petersson, K J F; Friberg, L E; Karlsson, M O

2010-10-01

Computer models of biological systems grow more complex as computing power increase. Often these models are defined as differential equations and no analytical solutions exist. Numerical integration is used to approximate the solution; this can be computationally intensive, time consuming and be a large proportion of the total computer runtime. The performance of different integration methods depend on the mathematical properties of the differential equations system at hand. In this paper we investigate the possibility of runtime gains by calculating parts of or the whole differential equations system at given time intervals, outside of the differential equations solver. This approach was tested on nine models defined as differential equations with the goal to reduce runtime while maintaining model fit, based on the objective function value. The software used was NONMEM. In four models the computational runtime was successfully reduced (by 59-96%). The differences in parameter estimates, compared to using only the differential equations solver were less than 12% for all fixed effects parameters. For the variance parameters, estimates were within 10% for the majority of the parameters. Population and individual predictions were similar and the differences in OFV were between 1 and -14 units. When computational runtime seriously affects the usefulness of a model we suggest evaluating this approach for repetitive elements of model building and evaluation such as covariate inclusions or bootstraps.
Run-time parallelization and scheduling of loops

NASA Technical Reports Server (NTRS)

Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay

1991-01-01

Run-time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run-time, wavefronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing, and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run-time reordering of loop indexes can have a significant impact on performance.
Using Runtime Analysis to Guide Model Checking of Java Programs

NASA Technical Reports Server (NTRS)

Havelund, Klaus; Norvig, Peter (Technical Monitor)

2001-01-01

This paper describes how two runtime analysis algorithms, an existing data race detection algorithm and a new deadlock detection algorithm, have been implemented to analyze Java programs. Runtime analysis is based on the idea of executing the program once. and observing the generated run to extract various kinds of information. This information can then be used to predict whether other different runs may violate some properties of interest, in addition of course to demonstrate whether the generated run itself violates such properties. These runtime analyses can be performed stand-alone to generate a set of warnings. It is furthermore demonstrated how these warnings can be used to guide a model checker, thereby reducing the search space. The described techniques have been implemented in the b e grown Java model checker called PathFinder.
Quantified Event Automata: Towards Expressive and Efficient Runtime Monitors

NASA Technical Reports Server (NTRS)

Barringer, Howard; Falcone, Ylies; Havelund, Klaus; Reger, Giles; Rydeheard, David

2012-01-01

Runtime verification is the process of checking a property on a trace of events produced by the execution of a computational system. Runtime verification techniques have recently focused on parametric specifications where events take data values as parameters. These techniques exist on a spectrum inhabited by both efficient and expressive techniques. These characteristics are usually shown to be conflicting - in state-of-the-art solutions, efficiency is obtained at the cost of loss of expressiveness and vice-versa. To seek a solution to this conflict we explore a new point on the spectrum by defining an alternative runtime verification approach.We introduce a new formalism for concisely capturing expressive specifications with parameters. Our technique is more expressive than the currently most efficient techniques while at the same time allowing for optimizations.
Optimized Temporal Monitors for SystemC

NASA Technical Reports Server (NTRS)

Tabakov, Deian; Rozier, Kristin Y.; Vardi, Moshe Y.

2012-01-01

SystemC is a modeling language built as an extension of C++. Its growing popularity and the increasing complexity of designs have motivated research efforts aimed at the verification of SystemC models using assertion-based verification (ABV), where the designer asserts properties that capture the design intent in a formal language such as PSL or SVA. The model then can be verified against the properties using runtime or formal verification techniques. In this paper we focus on automated generation of runtime monitors from temporal properties. Our focus is on minimizing runtime overhead, rather than monitor size or monitor-generation time. We identify four issues in monitor generation: state minimization, alphabet representation, alphabet minimization, and monitor encoding. We conduct extensive experimentation and identify a combination of settings that offers the best performance in terms of runtime overhead.
Establish and Evaluate Ada Runtime Features of Interest for Real-Time Systems

DTIC Science & Technology

1989-02-15

Runtime Features of Interest for Real - Time Systems -,-. CLEARED POR :)E,4 pUEL tCATLON SEP 2 0 19E19 ,CETM ORP t ’R RE LOO O Nt-U~HM- ANDQ SECURITY...ESTABLISH AND EVALUATE py ADA RUNTIME FEATURES OF INTEREST FOR REAL - TIME SYSTEMS CONTRACT NUMBER: MDA 903-87-D-0056 IITRI PROJECT NUMBER: T06168 PREPARED...2 2.0 SELECTION PROCESS OVERVIEW .................................... 3 2.1 REAL - TIME SYSTEMS IDENTIFICATION ........................... 4 2.2
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bergen, Ben; Moss, Nicholas; Charest, Marc Robert Joseph

FleCSI is a compile-time configurable framework designed to support multi-physics application development. As such, FleCSI attempts to provide a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. Current support includes multi-dimensional mesh topology, mesh geometry, and mesh adjacency information, n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures. FleCSI also introduces a functional programming model with control, execution, and data abstractions that are consistent with both MPI and state-of-the-art task-based runtimes such as Legion and Charm++. The FleCSI abstraction layer providesmore » the developer with insulation from the underlying runtime, while allowing support for multiple runtime systems, including conventional models like asynchronous MPI. The intent is to give developers a concrete set of user-friendly programming tools that can be used now, while allowing flexibility in choosing runtime implementations and optimizations that can be applied to architectures and runtimes that arise in the future. The control and execution models in FleCSI also provide formal nomenclature for describing poorly understood concepts like kernels and tasks.« less
Application configuration selection for energy-efficient execution on multicore systems

DOE PAGES

Wang, Shinan; Luo, Bing; Shi, Weisong; ...

2015-09-21

Balanced performance and energy consumption are incorporated in the design of modern computer systems. Several runtime factors, such as concurrency levels, thread mapping strategies, and dynamic voltage and frequency scaling (DVFS) should be considered in order to achieve optimal energy efficiency fora workload. Selecting appropriate run-time factors, however, is one of the most challenging tasks because the run-time factors are architecture-specific and workload-specific. And while most existing works concentrate on either static analysis of the workload or run-time prediction results, we present a hybrid two-step method that utilizes concurrency levels and DVFS settings to achieve the energy efficiency configuration formore » a worldoad. The experimental results based on a Xeon E5620 server with NPB and PARSEC benchmark suites show that the model is able to predict the energy efficient configuration accurately. On average, an additional 10% EDP (Energy Delay Product) saving is obtained by using run-time DVFS for the entire system. An off-line optimal solution is used to compare with the proposed scheme. Finally, the experimental results show that the average extra EDP saved by the optimal solution is within 5% on selective parallel benchmarks.« less
Framework for architecture-independent run-time reconfigurable applications

NASA Astrophysics Data System (ADS)

Lehn, David I.; Hudson, Rhett D.; Athanas, Peter M.

2000-10-01

Configurable Computing Machines (CCMs) have emerged as a technology with the computational benefits of custom ASICs as well as the flexibility and reconfigurability of general-purpose microprocessors. Significant effort from the research community has focused on techniques to move this reconfigurability from a rapid application development tool to a run-time tool. This requires the ability to change the hardware design while the application is executing and is known as Run-Time Reconfiguration (RTR). Widespread acceptance of run-time reconfigurable custom computing depends upon the existence of high-level automated design tools. Such tools must reduce the designers effort to port applications between different platforms as the architecture, hardware, and software evolves. A Java implementation of a high-level application framework, called Janus, is presented here. In this environment, developers create Java classes that describe the structural behavior of an application. The framework allows hardware and software modules to be freely mixed and interchanged. A compilation phase of the development process analyzes the structure of the application and adapts it to the target platform. Janus is capable of structuring the run-time behavior of an application to take advantage of the memory and computational resources available.
Monitoring Distributed Real-Time Systems: A Survey and Future Directions

NASA Technical Reports Server (NTRS)

Goodloe, Alwyn E.; Pike, Lee

2010-01-01

Runtime monitors have been proposed as a means to increase the reliability of safety-critical systems. In particular, this report addresses runtime monitors for distributed hard real-time systems. This class of systems has had little attention from the monitoring community. The need for monitors is shown by discussing examples of avionic systems failure. We survey related work in the field of runtime monitoring. Several potential monitoring architectures for distributed real-time systems are presented along with a discussion of how they might be used to monitor properties of interest.

Runtime Verification of Pacemaker Functionality Using Hierarchical Fuzzy Colored Petri-nets.

PubMed

Majma, Negar; Babamir, Seyed Morteza; Monadjemi, Amirhassan

2017-02-01

Today, implanted medical devices are increasingly used for many patients and in case of diverse health problems. However, several runtime problems and errors are reported by the relevant organizations, even resulting in patient death. One of those devices is the pacemaker. The pacemaker is a device helping the patient to regulate the heartbeat by connecting to the cardiac vessels. This device is directed by its software, so any failure in this software causes a serious malfunction. Therefore, this study aims to a better way to monitor the device's software behavior to decrease the failure risk. Accordingly, we supervise the runtime function and status of the software. The software verification means examining limitations and needs of the system users by the system running software. In this paper, a method to verify the pacemaker software, based on the fuzzy function of the device, is presented. So, the function limitations of the device are identified and presented as fuzzy rules and then the device is verified based on the hierarchical Fuzzy Colored Petri-net (FCPN), which is formed considering the software limits. Regarding the experiences of using: 1) Fuzzy Petri-nets (FPN) to verify insulin pumps, 2) Colored Petri-nets (CPN) to verify the pacemaker and 3) To verify the pacemaker by a software agent with Petri-network based knowledge, which we gained during the previous studies, the runtime behavior of the pacemaker software is examined by HFCPN, in this paper. This is considered a developing step compared to the earlier work. HFCPN in this paper, compared to the FPN and CPN used in our previous studies reduces the complexity. By presenting the Petri-net (PN) in a hierarchical form, the verification runtime, decreased as 90.61% compared to the verification runtime in the earlier work. Since we need an inference engine in the runtime verification, we used the HFCPN to enhance the performance of the inference engine.
The SERENITY Runtime Framework

NASA Astrophysics Data System (ADS)

Crespo, Beatriz Gallego-Nicasio; Piñuela, Ana; Soria-Rodriguez, Pedro; Serrano, Daniel; Maña, Antonio

The SERENITY Runtime Framework (SRF) provides support for applications at runtime, by managing S&D Solutions and monitoring the systems’ context. The main functionality of the SRF, amongst others, is to provide S&D Solutions, by means of Executable Components, in response to applications security requirements. Runtime environment is defined in SRF through the S&D Library and Context Manager components. S&D Library is a local S&D Artefact repository, and stores S&D Classes, S&D Patterns and S&D Implementations. The Context Manager component is in charge of storing and management of the information used by the SRF to select the most appropriate S&D Pattern for a given scenario. The management of the execution of the Executable Component, as running realizations of the S&D Patterns, including instantiation, de-activation and control, as well as providing communication and monitoring mechanisms, besides the recovery and reconfiguration aspects, complete the list of tasks performed by the SRF.
ATDM LANL FleCSI: Topology and Execution Framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bergen, Benjamin Karl

FleCSI is a compile-time configurable C++ framework designed to support multi-physics application development. As such, FleCSI attempts to provide a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. This means that FleCSI is potentially useful to many different ECP projects. Current support includes multidimensional mesh topology, mesh geometry, and mesh adjacency information, n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures (to identify data dependencies between distributed-memory address spaces). FleCSI introduces a functional programming model with control, execution, and data abstractionsmore » that are consistent with state-of-the-art task-based runtimes such as Legion and Charm++. The model also provides support for fine-grained, data-parallel execution with backend support for runtimes such as OpenMP and C++17. The FleCSI abstraction layer provides the developer with insulation from the underlying runtimes, while allowing support for multiple runtime systems, including conventional models like asynchronous MPI. The intent is to give developers a concrete set of user-friendly programming tools that can be used now, while allowing flexibility in choosing runtime implementations and optimizations that can be applied to architectures and runtimes that arise in the future. This project is essential to the ECP Ristra Next-Generation Code project, part of ASC ATDM, because it provides a hierarchically parallel programming model that is consistent with the design of modern system architectures, but which allows for the straightforward expression of algorithmic parallelism in a portably performant manner.« less
Improved Air Combat Awareness; with AESA and Next-Generation Signal Processing

DTIC Science & Technology

2002-09-01

competence network Building techniques Software development environment Communication Computer architecture Modeling Real-time programming Radar...memory access, skewed load and store, 3.2 GB/s BW • Performance: 400 MFLOPS Runtime environment Custom runtime routines Driver routines Hardware
Runtime support for data parallel tasks

NASA Technical Reports Server (NTRS)

Haines, Matthew; Hess, Bryan; Mehrotra, Piyush; Vanrosendale, John; Zima, Hans

1994-01-01

We have recently introduced a set of Fortran language extensions that allow for integrated support of task and data parallelism, and provide for shared data abstractions (SDA's) as a method for communications and synchronization among these tasks. In this paper we discuss the design and implementation issues of the runtime system necessary to support these extensions, and discuss the underlying requirements for such a system. To test the feasibility of this approach, we implement a prototype of the runtime system and use this to support an abstract multidisciplinary optimization (MDO) problem for aircraft design. We give initial results and discuss future plans.
PARLO: PArallel Run-Time Layout Optimization for Scientific Data Explorations with Heterogeneous Access Pattern

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gong, Zhenhuan; Boyuka, David; Zou, X

Download Citation Email Print Request Permissions Save to Project The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-levelmore » data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.« less
Specification-based Error Recovery: Theory, Algorithms, and Usability

DTIC Science & Technology

2013-02-01

transmuting the specification to an implementation at run-time and reducing the performance overhead. A suite of techniques and tools were designed...in the specification, thereby transmuting the specification to an implementation at run-time and reducing the perfor- mance overhead. A suite of
The SERENITY Runtime Monitoring Framework

NASA Astrophysics Data System (ADS)

Spanoudakis, George; Kloukinas, Christos; Mahbub, Khaled

This chapter describes SERENITY’s approach to runtime monitoring and the framework that has been developed to support it. Runtime monitoring is required in SERENITY in order to check for violations of security and dependability properties which are necessary for the correct operation of the security and dependability solutions that are available from the SERENITY framework. This chapter discusses how such properties are specified and monitored. The chapter focuses on the activation and execution of monitoring activities using S&D Patterns and the actions that may be undertaken following the detection of property violations. The approach is demonstrated in reference to one of the industrial case studies of the SERENITY project.
A Compiler and Run-time System for Network Programming Languages

DTIC Science & Technology

2012-01-01

A Compiler and Run-time System for Network Programming Languages Christopher Monsanto Princeton University Nate Foster Cornell University Rob...Foster, R. Harrison, M. Freedman, C. Monsanto , J. Rexford, A. Story, and D. Walker. Frenetic: A network programming language. In ICFP, Sep 2011. [10] A
A Simplified Method for Implementing Run-Time Polymorphism in Fortran95

DOE PAGES

Decyk, Viktor K.; Norton, Charles D.

2004-01-01

This paper discusses a simplified technique for software emulation of inheritance and run-time polymorphism in Fortran95. This technique involves retaining the same type throughout an inheritance hierarchy, so that only functions which are modified in a derived class need to be implemented.
Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime.

PubMed

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-06-01

We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

PubMed Central

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-01-01

We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Optimizing ROOT’s Performance Using C++ Modules

NASA Astrophysics Data System (ADS)

Vassilev, Vassil

2017-10-01

ROOT comes with a C++ compliant interpreter cling. Cling needs to understand the content of the libraries in order to interact with them. Exposing the full shared library descriptors to the interpreter at runtime translates into increased memory footprint. ROOT’s exploratory programming concepts allow implicit and explicit runtime shared library loading. It requires the interpreter to load the library descriptor. Re-parsing of descriptors’ content has a noticeable effect on the runtime performance. Present state-of-art lazy parsing technique brings the runtime performance to reasonable levels but proves to be fragile and can introduce correctness issues. An elegant solution is to load information from the descriptor lazily and in a non-recursive way. The LLVM community advances its C++ Modules technology providing an io-efficient, on-disk representation capable to reduce build times and peak memory usage. The feature is standardized as a C++ technical specification. C++ Modules are a flexible concept, which can be employed to match CMS and other experiments’ requirement for ROOT: to optimize both runtime memory usage and performance. Cling technically “inherits” the feature, however tweaking it to ROOT scale and beyond is a complex endeavor. The paper discusses the status of the C++ Modules in the context of ROOT, supported by few preliminary performance results. It shows a step-by-step migration plan and describes potential challenges which could appear.
PISCES: An environment for parallel scientific computation

NASA Technical Reports Server (NTRS)

Pratt, T. W.

1985-01-01

The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.
CIFO 3.0

NASA Technical Reports Server (NTRS)

Rogers, Pat

1992-01-01

The Ada Runtime Environment Working Group has, since 1985, developed and published the Catalog of Interface Features and Options (CFIO) for Ada runtime environments. These interfaces, expressed in legal Ada, provide 'hooks' into the runtime system to export both functionality and enhanced performance beyond that of 'vanilla' Ada implementations. Such enhancements include high- and low-level scheduling control, asynchronous communications facilities, predictable storage management facilities, and fast interrupt response. CIFO 3.0 represents the latest release, which incorporates the efforts of the European real time community as well as new interfaces and expansions of previous catalog entries. This presentation will give both an overview of the Catalog's contents and an 'insider's' view of the Catalog as a whole.
Archer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Atzeni, Simone; Ahn, Dong; Gopalakrishnan, Ganesh

2017-01-12

Archer is built on top of the LLVM/Clang compilers that support OpenMP. It applies static and dynamic analysis techniques to detect data races in OpenMP programs generating a very low runtime and memory overhead. Static analyses identify data race free OpenMP regions and exclude them from runtime analysis, which is performed by ThreadSanitizer included in LLVM/Clang.
Towards Just-In-Time Partial Evaluation of Prolog

NASA Astrophysics Data System (ADS)

Bolz, Carl Friedrich; Leuschel, Michael; Rigo, Armin

We introduce a just-in-time specializer for Prolog. Just-in-time specialization attempts to unify of the concepts and benefits of partial evaluation (PE) and just-in-time (JIT) compilation. It is a variant of PE that occurs purely at runtime, which lazily generates residual code and is constantly driven by runtime feedback.
Final Report from The University of Texas at Austin for DEGAS: Dynamic Global Address Space programming environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Erez, Mattan; Yelick, Katherine; Sarkar, Vivek

The Dynamic, Exascale Global Address Space programming environment (DEGAS) project will develop the next generation of programming models and runtime systems to meet the challenges of Exascale computing. Our approach is to provide an efficient and scalable programming model that can be adapted to application needs through the use of dynamic runtime features and domain-specific languages for computational kernels. We address the following technical challenges: Programmability: Rich set of programming constructs based on a Hierarchical Partitioned Global Address Space (HPGAS) model, demonstrated in UPC++. Scalability: Hierarchical locality control, lightweight communication (extended GASNet), and ef- ficient synchronization mechanisms (Phasers). Performance Portability:more » Just-in-time specialization (SEJITS) for generating hardware-specific code and scheduling libraries for domain-specific adaptive runtimes (Habanero). Energy Efficiency: Communication-optimal code generation to optimize energy efficiency by re- ducing data movement. Resilience: Containment Domains for flexible, domain-specific resilience, using state capture mechanisms and lightweight, asynchronous recovery mechanisms. Interoperability: Runtime and language interoperability with MPI and OpenMP to encourage broad adoption.« less
AMRZone: A Runtime AMR Data Sharing Framework For Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Wenzhao; Tang, Houjun; Harenberg, Steven

Frameworks that facilitate runtime data sharing across multiple applications are of great importance for scientific data analytics. Although existing frameworks work well over uniform mesh data, they can not effectively handle adaptive mesh refinement (AMR) data. Among the challenges to construct an AMR-capable framework include: (1) designing an architecture that facilitates online AMR data management; (2) achieving a load-balanced AMR data distribution for the data staging space at runtime; and (3) building an effective online index to support the unique spatial data retrieval requirements for AMR data. Towards addressing these challenges to support runtime AMR data sharing across scientific applications,more » we present the AMRZone framework. Experiments over real-world AMR datasets demonstrate AMRZone's effectiveness at achieving a balanced workload distribution, reading/writing large-scale datasets with thousands of parallel processes, and satisfying queries with spatial constraints. Moreover, AMRZone's performance and scalability are even comparable with existing state-of-the-art work when tested over uniform mesh data with up to 16384 cores; in the best case, our framework achieves a 46% performance improvement.« less
Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications.

PubMed

Cabezas, Javier; Gelado, Isaac; Stone, John E; Navarro, Nacho; Kirk, David B; Hwu, Wen-Mei

2015-05-01

Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice.

Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications

PubMed Central

Cabezas, Javier; Gelado, Isaac; Stone, John E.; Navarro, Nacho; Kirk, David B.; Hwu, Wen-mei

2014-01-01

Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice. PMID:26180487
Secure and Resilient Functional Modeling for Navy Cyber-Physical Systems

DTIC Science & Technology

2017-05-24

Functional Modeling Compiler (SCCT) FM Compiler and Key Performance Indicators (KPI) May 2018 Pending. Model Management Backbone (SCCT) MMB Demonstration...implement the agent- based distributed runtime. - KPIs for single/multicore controllers and temporal/spatial domains. - Integration of the model management ...Distributed Runtime (UCI) Not started. Model Management Backbone (SCCT) Not started. Siemens Corporation Corporate Technology Unrestricted
A manual for PARTI runtime primitives

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel

1990-01-01

Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Neuron splitting in compute-bound parallel network simulations enables runtime scaling with twice as many processors.

PubMed

Hines, Michael L; Eichner, Hubert; Schürmann, Felix

2008-08-01

Neuron tree topology equations can be split into two subtrees and solved on different processors with no change in accuracy, stability, or computational effort; communication costs involve only sending and receiving two double precision values by each subtree at each time step. Splitting cells is useful in attaining load balance in neural network simulations, especially when there is a wide range of cell sizes and the number of cells is about the same as the number of processors. For compute-bound simulations load balance results in almost ideal runtime scaling. Application of the cell splitting method to two published network models exhibits good runtime scaling on twice as many processors as could be effectively used with whole-cell balancing.
An enhanced Ada run-time system for real-time embedded processors

NASA Technical Reports Server (NTRS)

Sims, J. T.

1991-01-01

An enhanced Ada run-time system has been developed to support real-time embedded processor applications. The primary focus of this development effort has been on the tasking system and the memory management facilities of the run-time system. The tasking system has been extended to support efficient and precise periodic task execution as required for control applications. Event-driven task execution providing a means of task-asynchronous control and communication among Ada tasks is supported in this system. Inter-task control is even provided among tasks distributed on separate physical processors. The memory management system has been enhanced to provide object allocation and protected access support for memory shared between disjoint processors, each of which is executing a distinct Ada program.
Runtime Verification in Context : Can Optimizing Error Detection Improve Fault Diagnosis

NASA Technical Reports Server (NTRS)

Dwyer, Matthew B.; Purandare, Rahul; Person, Suzette

2010-01-01

Runtime verification has primarily been developed and evaluated as a means of enriching the software testing process. While many researchers have pointed to its potential applicability in online approaches to software fault tolerance, there has been a dearth of work exploring the details of how that might be accomplished. In this paper, we describe how a component-oriented approach to software health management exposes the connections between program execution, error detection, fault diagnosis, and recovery. We identify both research challenges and opportunities in exploiting those connections. Specifically, we describe how recent approaches to reducing the overhead of runtime monitoring aimed at error detection might be adapted to reduce the overhead and improve the effectiveness of fault diagnosis.
Toward real-time performance benchmarks for Ada

NASA Technical Reports Server (NTRS)

Clapp, Russell M.; Duchesneau, Louis; Volz, Richard A.; Mudge, Trevor N.; Schultze, Timothy

1986-01-01

The issue of real-time performance measurements for the Ada programming language through the use of benchmarks is addressed. First, the Ada notion of time is examined and a set of basic measurement techniques are developed. Then a set of Ada language features believed to be important for real-time performance are presented and specific measurement methods discussed. In addition, other important time related features which are not explicitly part of the language but are part of the run-time related features which are not explicitly part of the language but are part of the run-time system are also identified and measurement techniques developed. The measurement techniques are applied to the language and run-time system features and the results are presented.
Usability of a Runtime Environment for the Use of IMS Learning Design in Mixed Mode Higher Education

ERIC Educational Resources Information Center

Klebl, Michael

2006-01-01

Starting from the first public draft of IMS Learning Design in November 2002, a research project at the Catholic University Eichstaett-Ingolstadt in Germany was dedicated to the conceptual examination and empirical review of IMS Learning Design Level A. A prototypical runtime environment called "lab005" was developed. It was built based…
National Information Exchange Model (NIEM): DoD Adoption and Implications for C2 (Briefing Charts)

DTIC Science & Technology

2014-06-18

Application Data Consumers Information Exchange Package ( IEP ) the data exchanged at runtime Data Producers IES defines Information Exchange...Specification (IES) build-time description of the data to be exchanged Developers System / Application System / Application IEP | 9 | Data...Exchange Package ( IEP ) the data exchanged at runtime Data Producers System / Application System / Application IEP Consumer’s Understanding
Implementation of a Learning Design Run-Time Environment for the .LRN Learning Management System

ERIC Educational Resources Information Center

del Cid, Jose Pablo Escobedo; de la Fuente Valentin, Luis; Gutierrez, Sergio; Pardo, Abelardo; Kloos, Carlos Delgado

2007-01-01

The IMS Learning Design specification aims at capturing the complete learning flow of courses, without being restricted to a particular pedagogical model. Such flow description for a course, called a Unit of Learning, must be able to be reproduced in different systems using a so called run-time environment. In the last few years there has been…
A manual for PARTI runtime primitives, revision 1

NASA Technical Reports Server (NTRS)

Das, Raja; Saltz, Joel; Berryman, Harry

1991-01-01

Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
MIC-SVM: Designing A Highly Efficient Support Vector Machine For Advanced Modern Multi-Core and Many-Core Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Song, Shuaiwen; Fu, Haohuan

2014-08-16

Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. To address the challenges above, we designed and implemented MICSVM, a highly efficient parallel SVM for x86 based multi-core and many core architectures,more » such as the Intel Ivy Bridge CPUs and Intel Xeon Phi coprocessor (MIC).« less
Implementation of and Ada real-time executive: A case study

NASA Technical Reports Server (NTRS)

Laird, James D.; Burton, Bruce A.; Koppes, Mary R.

1986-01-01

Current Ada language implementations and runtime environments are immature, unproven and are a key risk area for real-time embedded computer system (ECS). A test-case environment is provided in which the concerns of the real-time, ECS community are addressed. A priority driven executive is selected to be implemented in the Ada programming language. The model selected is representative of real-time executives tailored for embedded systems used missile, spacecraft, and avionics applications. An Ada-based design methodology is utilized, and two designs are considered. The first of these designs requires the use of vendor supplied runtime and tasking support. An alternative high-level design is also considered for an implementation requiring no vendor supplied runtime or tasking support. The former approach is carried through to implementation.
Autonomic Management of Application Workflows on Hybrid Computing Infrastructure

DOE PAGES

Kim, Hyunjoo; el-Khamra, Yaakoub; Rodero, Ivan; ...

2011-01-01

In this paper, we present a programming and runtime framework that enables the autonomic management of complex application workflows on hybrid computing infrastructures. The framework is designed to address system and application heterogeneity and dynamics to ensure that application objectives and constraints are satisfied. The need for such autonomic system and application management is becoming critical as computing infrastructures become increasingly heterogeneous, integrating different classes of resources from high-end HPC systems to commodity clusters and clouds. For example, the framework presented in this paper can be used to provision the appropriate mix of resources based on application requirements and constraints.more » The framework also monitors the system/application state and adapts the application and/or resources to respond to changing requirements or environment. To demonstrate the operation of the framework and to evaluate its ability, we employ a workflow used to characterize an oil reservoir executing on a hybrid infrastructure composed of TeraGrid nodes and Amazon EC2 instances of various types. Specifically, we show how different applications objectives such as acceleration, conservation and resilience can be effectively achieved while satisfying deadline and budget constraints, using an appropriate mix of dynamically provisioned resources. Our evaluations also demonstrate that public clouds can be used to complement and reinforce the scheduling and usage of traditional high performance computing infrastructure.« less
Cooperative runtime monitoring

NASA Astrophysics Data System (ADS)

Hallé, Sylvain

2013-11-01

Requirements on message-based interactions can be formalised as an interface contract that specifies constraints on the sequence of possible messages that can be exchanged by multiple parties. At runtime, each peer can monitor incoming messages and check that the contract is correctly being followed by their respective senders. We introduce cooperative runtime monitoring, where a recipient 'delegates' its monitoring task to the sender, which is required to provide evidence that the message it sends complies with the contract. In turn, this evidence can be quickly checked by the recipient, which is then guaranteed of the sender's compliance to the contract without doing the monitoring computation by itself. A particular application of this concept is shown on web services, where service providers can monitor and enforce contract compliance of third-party clients at a small cost on the server side, while avoiding to certify or digitally sign them.
Bypassing Races in Live Applications with Execution Filters

DTIC Science & Technology

2010-01-01

LOOM creates the needed locks and semaphores on demand. The first time a lock or semaphore is refer- enced by one of the inserted synchronization ...runtime. LOOM provides a flexible and safe language for develop- ers to write execution filters that explicitly synchronize code. It then uses an...first compile their application with LOOM. At runtime, to workaround a race, an application developer writes an execution filter that synchronizes the
Argobots: A Lightweight Low-Level Threading and Tasking Framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan

In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. We describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less
Runtime Detection of C-Style Errors in UPC Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pirkelbauer, P; Liao, C; Panas, T

2011-09-29

Unified Parallel C (UPC) extends the C programming language (ISO C 99) with explicit parallel programming support for the partitioned global address space (PGAS), which provides a global memory space with localized partitions to each thread. Like its ancestor C, UPC is a low-level language that emphasizes code efficiency over safety. The absence of dynamic (and static) safety checks allows programmer oversights and software flaws that can be hard to spot. In this paper, we present an extension of a dynamic analysis tool, ROSE-Code Instrumentation and Runtime Monitor (ROSECIRM), for UPC to help programmers find C-style errors involving the globalmore » address space. Built on top of the ROSE source-to-source compiler infrastructure, the tool instruments source files with code that monitors operations and keeps track of changes to the system state. The resulting code is linked to a runtime monitor that observes the program execution and finds software defects. We describe the extensions to ROSE-CIRM that were necessary to support UPC. We discuss complications that arise from parallel code and our solutions. We test ROSE-CIRM against a runtime error detection test suite, and present performance results obtained from running error-free codes. ROSE-CIRM is released as part of the ROSE compiler under a BSD-style open source license.« less
Optimization Strategies for Hardware-Based Cofactorization

NASA Astrophysics Data System (ADS)

Loebenberger, Daniel; Putzka, Jens

We use the specific structure of the inputs to the cofactorization step in the general number field sieve (GNFS) in order to optimize the runtime for the cofactorization step on a hardware cluster. An optimal distribution of bitlength-specific ECM modules is proposed and compared to existing ones. With our optimizations we obtain a speedup between 17% and 33% of the cofactorization step of the GNFS when compared to the runtime of an unoptimized cluster.
Uniform Data Access Using GXD

NASA Technical Reports Server (NTRS)

Vanderbilt, Peter

1999-01-01

This paper gives an overview of GXD, a framework facilitating publication and use of data from diverse data sources. GXD defines an object-oriented data model designed to represent a wide range of things including data, its metadata, resources and query results. GXD also defines a data transport language. a dialect of XML, for representing instances of the data model. This language allows for a wide range of data source implementations by supporting both the direct incorporation of data and the specification of data by various rules. The GXD software library, proto-typed in Java, includes client and server runtimes. The server runtime facilitates the generation of entities containing data encoded in the GXD transport language. The GXD client runtime interprets these entities (potentially from many data sources) to create an illusion of a globally interconnected data space, one that is independent of data source location and implementation.

Increasing the Runtime Speed of Case-Based Plan Recognition

DTIC Science & Technology

2015-05-01

number of situations that the robot might reasonably be expected to encounter. This requires ef- ficient indexing schemes to ensure that plan retrieval...collection of information if it does not display a currently valid OMB control number . 1. REPORT DATE MAY 2015 2. REPORT TYPE 3. DATES COVERED 00...00-2015 to 00-00-2015 4. TITLE AND SUBTITLE Increasing the Runtime Speed of Case-Based Plan Recognition 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c
Determination of the Underlying Task Scheduling Algorithm for an Ada Runtime System

DTIC Science & Technology

1989-12-01

was also curious as to how well I could model the test cases with Ada programs . In particular, I wanted to see whether I could model the equal arrival...parameter relationshis=s required to detect the execution of individual algorithms. These test cases were modeled using Ada programs . Then, the...results were analyzed to determine whether the Ada programs were capable of revealing the task scheduling algorithm used by the Ada run-time system. This
Accelerating semantic graph databases on commodity clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morari, Alessandro; Castellana, Vito G.; Haglin, David J.

We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.
Semantically Aware Foundation Environment (SAFE) for Clean-Slate Design of Resilient, Adaptive Secure Hosts (CRASH)

DTIC Science & Technology

2016-02-01

system consists of a high-fidelity hardware simulation using field programmable gate arrays (FPGAs), with a set of runtime services (ConcreteWare...perimeter protection, patch, and pray” is not aligned with the threat. Programmers will not bail us out of this situation (by writing defect free code...hosted on a Field Programmable Gate Array (FPGA), with a set of runtime services (concreteware) running on the hardware. Secure applications can be
Argobots: A Lightweight Low-Level Threading and Tasking Framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan

In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach.« less
MESA: Message-Based System Analysis Using Runtime Verification

NASA Technical Reports Server (NTRS)

Shafiei, Nastaran; Tkachuk, Oksana; Mehlitz, Peter

2017-01-01

In this paper, we present a novel approach and framework for run-time verication of large, safety critical messaging systems. This work was motivated by verifying the System Wide Information Management (SWIM) project of the Federal Aviation Administration (FAA). SWIM provides live air traffic, site and weather data streams for the whole National Airspace System (NAS), which can easily amount to several hundred messages per second. Such safety critical systems cannot be instrumented, therefore, verification and monitoring has to happen using a nonintrusive approach, by connecting to a variety of network interfaces. Due to a large number of potential properties to check, the verification framework needs to support efficient formulation of properties with a suitable Domain Specific Language (DSL). Our approach is to utilize a distributed system that is geared towards connectivity and scalability and interface it at the message queue level to a powerful verification engine. We implemented our approach in the tool called MESA: Message-Based System Analysis, which leverages the open source projects RACE (Runtime for Airspace Concept Evaluation) and TraceContract. RACE is a platform for instantiating and running highly concurrent and distributed systems and enables connectivity to SWIM and scalability. TraceContract is a runtime verication tool that allows for checking traces against properties specified in a powerful DSL. We applied our approach to verify a SWIM service against several requirements.We found errors such as duplicate and out-of-order messages.
Argobots: A Lightweight Low-Level Threading and Tasking Framework

DOE PAGES

Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan; ...

2017-10-24

In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less
Argobots: A Lightweight Low-Level Threading and Tasking Framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan

In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less
Static and Dynamic Frequency Scaling on Multicore CPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bao, Wenlei; Hong, Changwan; Chunduri, Sudheer

2016-12-28

Dynamic voltage and frequency scaling (DVFS) adapts CPU power consumption by modifying a processor’s operating frequency (and the associated voltage). Typical approaches employing DVFS involve default strategies such as running at the lowest or the highest frequency, or observing the CPU’s runtime behavior and dynamically adapting the voltage/frequency configuration based on CPU usage. In this paper, we argue that many previous approaches suffer from inherent limitations, such as not account- ing for processor-specific impact of frequency changes on energy for different workload types. We first propose a lightweight runtime-based approach to automatically adapt the frequency based on the CPU workload,more » that is agnostic of the processor characteristics. We then show that further improvements can be achieved for affine kernels in the application, using a compile-time characterization instead of run-time monitoring to select the frequency and number of CPU cores to use. Our framework relies on a one-time energy characterization of CPU-specific DVFS profiles followed by a compile-time categorization of loop-based code segments in the application. These are combined to determine a priori of the frequency and the number of cores to use to execute the application so as to optimize energy or energy-delay product, outperforming runtime approach. Extensive evaluation on 60 benchmarks and five multi-core CPUs show that our approach systematically outperforms the powersave Linux governor, while improving overall performance.« less
A Modular Environment for Geophysical Inversion and Run-time Autotuning using Heterogeneous Computing Systems

NASA Astrophysics Data System (ADS)

Myre, Joseph M.

Heterogeneous computing systems have recently come to the forefront of the High-Performance Computing (HPC) community's interest. HPC computer systems that incorporate special purpose accelerators, such as Graphics Processing Units (GPUs), are said to be heterogeneous. Large scale heterogeneous computing systems have consistently ranked highly on the Top500 list since the beginning of the heterogeneous computing trend. By using heterogeneous computing systems that consist of both general purpose processors and special- purpose accelerators, the speed and problem size of many simulations could be dramatically increased. Ultimately this results in enhanced simulation capabilities that allows, in some cases for the first time, the execution of parameter space and uncertainty analyses, model optimizations, and other inverse modeling techniques that are critical for scientific discovery and engineering analysis. However, simplifying the usage and optimization of codes for heterogeneous computing systems remains a challenge. This is particularly true for scientists and engineers for whom understanding HPC architectures and undertaking performance analysis may not be primary research objectives. To enable scientists and engineers to remain focused on their primary research objectives, a modular environment for geophysical inversion and run-time autotuning on heterogeneous computing systems is presented. This environment is composed of three major components: 1) CUSH---a framework for reducing the complexity of programming heterogeneous computer systems, 2) geophysical inversion routines which can be used to characterize physical systems, and 3) run-time autotuning routines designed to determine configurations of heterogeneous computing systems in an attempt to maximize the performance of scientific and engineering codes. Using three case studies, a lattice-Boltzmann method, a non-negative least squares inversion, and a finite-difference fluid flow method, it is shown that this environment provides scientists and engineers with means to reduce the programmatic complexity of their applications, to perform geophysical inversions for characterizing physical systems, and to determine high-performing run-time configurations of heterogeneous computing systems using a run-time autotuner.
Challenges in High-Assurance Runtime Verification

NASA Technical Reports Server (NTRS)

Goodloe, Alwyn E.

2016-01-01

Safety-critical systems are growing more complex and becoming increasingly autonomous. Runtime Verification (RV) has the potential to provide protections when a system cannot be assured by conventional means, but only if the RV itself can be trusted. In this paper, we proffer a number of challenges to realizing high-assurance RV and illustrate how we have addressed them in our research. We argue that high-assurance RV provides a rich target for automated verification tools in hope of fostering closer collaboration among the communities.
Angular-contact ball-bearing internal load estimation algorithm using runtime adaptive relaxation

NASA Astrophysics Data System (ADS)

Medina, H.; Mutu, R.

2017-07-01

An algorithm to estimate internal loads for single-row angular contact ball bearings due to externally applied thrust loads and high-operating speeds is presented. A new runtime adaptive relaxation procedure and blending function is proposed which ensures algorithm stability whilst also reducing the number of iterations needed to reach convergence, leading to an average reduction in computation time in excess of approximately 80%. The model is validated based on a 218 angular contact bearing and shows excellent agreement compared to published results.
Affordance Templates for Shared Robot Control

NASA Technical Reports Server (NTRS)

Hart, Stephen; Dinh, Paul; Hambuchen, Kim

2014-01-01

This paper introduces the Affordance Template framework used to supervise task behaviors on the NASA-JSC Valkyrie robot at the 2013 DARPA Robotics Challenge (DRC) Trials. This framework provides graphical interfaces to human supervisors that are adjustable based on the run-time environmental context (e.g., size, location, and shape of objects that the robot must interact with, etc.). Additional improvements, described below, inject degrees of autonomy into instantiations of affordance templates at run-time in order to enable efficient human supervision of the robot for accomplishing tasks.
Deep learning for media analysis in defense scenariosan evaluation of an open source framework for object detection in intelligence related image sets

DTIC Science & Technology

2017-06-01

Training time statistics from Jones’ thesis. . . . . . . . . . . . . . 15 Table 2.2 Evaluation runtime statistics from Camp’s thesis for a single image. 17...Table 2.3 Training and evaluation runtime statistics from Sharpe’s thesis. . . 19 Table 2.4 Sharpe’s screenshot detector results for combinations of...training resources available and time required for each algorithm Jones [15] tested. Table 2.1. Training time statistics from Jones’ [15] thesis. Algorithm
A Scala DSL for RETE-Based Runtime Verification

NASA Technical Reports Server (NTRS)

Havelund, Klaus

2013-01-01

Runtime verification (RV) consists in part of checking execution traces against formalized specifications. Several systems have emerged, most of which support specification notations based on state machines, regular expressions, temporal logic, or grammars. The field of Artificial Intelligence (AI) has for an even longer period of time studied rule-based production systems, which at a closer look appear to be relevant for RV, although seemingly focused on slightly different application domains, such as for example business processes and expert systems. The core algorithm in many of these systems is the Rete algorithm. We have implemented a Rete-based runtime verification system, named LogFire (originally intended for offline log analysis but also applicable to online analysis), as an internal DSL in the Scala programming language, using Scala's support for defining DSLs. This combination appears attractive from a practical point of view. Our contribution is in part conceptual in arguing that such rule-based frameworks originating from AI may be suited for RV.
Colt: an experiment in wormhole run-time reconfiguration

NASA Astrophysics Data System (ADS)

Bittner, Ray; Athanas, Peter M.; Musgrove, Mark

1996-10-01

Wormhole run-time reconfiguration (RTR) is an attempt to create a refined computing paradigm for high performance computational tasks. By combining concepts from field programmable gate array (FPGA) technologies with data flow computing, the Colt/Stallion architecture achieves high utilization of hardware resources, and facilitates rapid run-time reconfiguration. Targeted mainly at DSP-type operations, the Colt integrated circuit -- a prototype wormhole RTR device -- compares favorably to contemporary DSP alternatives in terms of silicon area consumed per unit computation and in computing performance. Although emphasis has been placed on signal processing applications, general purpose computation has not been overlooked. Colt is a prototype that defines an architecture not only at the chip level but also in terms of an overall system design. As this system is realized, the concept of wormhole RTR will be applied to numerical computation and DSP applications including those common to image processing, communications systems, digital filters, acoustic processing, real-time control systems and simulation acceleration.
Pattern Driven Selection and Configuration of S&D Mechanisms at Runtime

NASA Astrophysics Data System (ADS)

Crespo, Beatriz Gallego-Nicasio; Piñuela, Ana; Soria-Rodriguez, Pedro; Serrano, Daniel; Maña, Antonio

In order to satisfy the requests of SERENITY-aware applications, the SERENITY Runtime Framework’s main task is to perform pattern selection, to provide the application with the most suitable S&D Solution that satisfies the request. The result of this selection process depends on two main factors: the content of the S&D Library and the information stored and managed by the Context Manager. Three processes are involved: searching of the S&D Library to get the initial set of candidates to be selected; filtering and ordering the collection, based on the SRF configuration; and perform a loop to check S&D Pattern preconditions over the remaining S&D Artifacts in order to select the most suitable S&D Pattern first, and later the appropriate S&D Implementation for the environment conditions. Once the S&D Implementation is selected, the SERENITY Runtime Framework instantiates an Executable Component (EC) and provides the application with the necessary information and mechanism to make use of the EC.
Design and Implementation of a Scalable Membership Service for Supercomputer Resiliency-Aware Runtime

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tock, Yoav; Mandler, Benjamin; Moreira, Jose

2013-01-01

As HPC systems and applications get bigger and more complex, we are approaching an era in which resiliency and run-time elasticity concerns be- come paramount.We offer a building block for an alternative resiliency approach in which computations will be able to make progress while components fail, in addition to enabling a dynamic set of nodes throughout a computation lifetime. The core of our solution is a hierarchical scalable membership service provid- ing eventual consistency semantics. An attribute replication service is used for hierarchy organization, and is exposed to external applications. Our solution is based on P2P technologies and provides resiliencymore » and elastic runtime support at ultra large scales. Resulting middleware is general purpose while exploiting HPC platform unique features and architecture. We have implemented and tested this system on BlueGene/P with Linux, and using worst-case analysis, evaluated the service scalability as effective for up to 1M nodes.« less
An efficient algorithm for accurate computation of the Dirichlet-multinomial log-likelihood function.

PubMed

Yu, Peng; Shaw, Chad A

2014-06-01

The Dirichlet-multinomial (DMN) distribution is a fundamental model for multicategory count data with overdispersion. This distribution has many uses in bioinformatics including applications to metagenomics data, transctriptomics and alternative splicing. The DMN distribution reduces to the multinomial distribution when the overdispersion parameter ψ is 0. Unfortunately, numerical computation of the DMN log-likelihood function by conventional methods results in instability in the neighborhood of [Formula: see text]. An alternative formulation circumvents this instability, but it leads to long runtimes that make it impractical for large count data common in bioinformatics. We have developed a new method for computation of the DMN log-likelihood to solve the instability problem without incurring long runtimes. The new approach is composed of a novel formula and an algorithm to extend its applicability. Our numerical experiments show that this new method both improves the accuracy of log-likelihood evaluation and the runtime by several orders of magnitude, especially in high-count data situations that are common in deep sequencing data. Using real metagenomic data, our method achieves manyfold runtime improvement. Our method increases the feasibility of using the DMN distribution to model many high-throughput problems in bioinformatics. We have included in our work an R package giving access to this method and a vingette applying this approach to metagenomic data. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Rule Systems for Runtime Verification: A Short Tutorial

NASA Astrophysics Data System (ADS)

Barringer, Howard; Havelund, Klaus; Rydeheard, David; Groce, Alex

In this tutorial, we introduce two rule-based systems for on and off-line trace analysis, RuleR and LogScope. RuleR is a conditional rule-based system, which has a simple and easily implemented algorithm for effective runtime verification, and into which one can compile a wide range of temporal logics and other specification formalisms used for runtime verification. Specifications can be parameterized with data, or even with specifications, allowing for temporal logic combinators to be defined. We outline a number of simple syntactic extensions of core RuleR that can lead to further conciseness of specification but still enabling easy and efficient implementation. RuleR is implemented in Java and we will demonstrate its ease of use in monitoring Java programs. LogScope is a derivation of RuleR adding a simple very user-friendly temporal logic. It was developed in Python, specifically for supporting testing of spacecraft flight software for NASA’s next 2011 Mars mission MSL (Mars Science Laboratory). The system has been applied by test engineers to analysis of log files generated by running the flight software. Detailed logging is already part of the system design approach, and hence there is no added instrumentation overhead caused by this approach. While post-mortem log analysis prevents the autonomous reaction to problems possible with traditional runtime verification, it provides a powerful tool for test automation. A new system is being developed that integrates features from both RuleR and LogScope.

Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools

PubMed Central

Cheng, Yinhe; Tzeng, Tzy-Hwa Kathy

2016-01-01

This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156–186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators, if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting input data are provided by using plug-in tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of next generation sequencing (NGS) data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime of NGS data pre-processing from about 20 hours to about nine minutes for a whole-genome sequencing data set on the same system using up to 711 GB of memory. PMID:27861637
COVERT: A Framework for Finding Buffer Overflows in C Programs via Software Verification

DTIC Science & Technology

2010-08-01

is greater than the allocated size of B. In the case of a type-safe language or a language with runtime bounds checking (such as Java), an overflow...leads either to a (compile-time) type error or a (runtime) exception. In such languages , a buffer overflow can lead to a denial of service attack (i.e...of current and legacy software is written in unsafe languages (such as C or C++) that allow buffers to be overflowed with impunity. For reasons such as
A Runtime Performance Predictor for Selecting Tabu Tenures

NASA Technical Reports Server (NTRS)

Allen, John A.; Minton, Steven N.

1997-01-01

One of the drawbacks of parameter based systems, such as tabu search, is the difficulty of finding the correct parameter for a particular problem. Often, rule-of-thumb advice is given which may have little or no applicability to the domain or problem instance at hand. This paper describes the application of a general technique, Runtime Performance Predictors (RPP) which can be used to determine, in an efficient manner, the correct tabu tenure for a particular problem instance. The details of the approach and a demonstration using a variant of GSAT are presented.
FleCSPH notes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lim, Hyun; Loiseau, Julien

FleCSI is a compile-time con gurable framework designed to support multi-physics application development. As such, FleCSI provides a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. FleCSI currently supports multi-dimensional mesh topology, geometry, and adjacency information, as well as n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures. FleCSI introduces a functional programming model with control, execution, and data abstractions that are consistent both with MPI and with state-of-the-art, task-based runtimes such as Legion and Charm++. The abstraction layer insulates developersmore » from the underlying runtime, while allowing support for multiple runtime systems including conventional models like asynchronous MPI. The intent is to provide developers with a concrete set of user-friendly programming tools that can be used now, while allowing exibility in choosing runtime implementations and optimization that can be applied to future architectures and runtimes. FleCSI's control and execution models provide formal nomenclature for describing poorly understood concepts such as kernels and tasks. FleCSI's data model provides a low-buy-in approach that makes it an attractive option for many application projects, as developers are not locked into particular layouts or data structure representations. FleCSI currently provides a parallel but not distributed implementation of Binary, Quad and Oct-tree topology. This implementation is base on space lling curves domain decomposition, the Morton order. The current FleCSI version requires the implementation of a driver and a specialization driver. The role of the specialization driver is to provide the data distribution. This feature is not complete in FleCSI code and we provide it. The next step will be to incorporate it directly from FleCSPH to FleCSI as we reach a good level of performance. Then the driver represent the general execution of the resolution without worrying of the data locality and communications. As FleCSI is an On-Development code the structure may change in the future and we keep track of these changes in FleCSPH.« less
A Hartree-Fock Application Using UPC++ and the New DArray Library

DOE PAGES

Ozog, David; Kamil, Amir; Zheng, Yili; ...

2016-07-21

The Hartree-Fock (HF) method is the fundamental first step for incorporating quantum mechanics into many-electron simulations of atoms and molecules, and it is an important component of computational chemistry toolkits like NWChem. The GTFock code is an HF implementation that, while it does not have all the features in NWChem, represents crucial algorithmic advances that reduce communication and improve load balance by doing an up-front static partitioning of tasks, followed by work stealing whenever necessary. To enable innovations in algorithms and exploit next generation exascale systems, it is crucial to support quantum chemistry codes using expressive and convenient programming modelsmore » and runtime systems that are also efficient and scalable. Here, this paper presents an HF implementation similar to GTFock using UPC++, a partitioned global address space model that includes flexible communication, asynchronous remote computation, and a powerful multidimensional array library. UPC++ offers runtime features that are useful for HF such as active messages, a rich calculus for array operations, hardware-supported fetch-and-add, and functions for ensuring asynchronous runtime progress. We present a new distributed array abstraction, DArray, that is convenient for the kinds of random-access array updates and linear algebra operations on block-distributed arrays with irregular data ownership. Finally, we analyze the performance of atomic fetch-and-add operations (relevant for load balancing) and runtime attentiveness, then compare various techniques and optimizations for each. Our optimized implementation of HF using UPC++ and the DArrays library shows up to 20% improvement over GTFock with Global Arrays at scales up to 24,000 cores.« less
A fundamental study of suction for Laminar Flow Control (LFC)

NASA Astrophysics Data System (ADS)

Watmuff, Jonathan H.

1992-10-01

This report covers the period forming the first year of the project. The aim is to experimentally investigate the effects of suction as a technique for Laminar Flow Control. Experiments are to be performed which require substantial modifications to be made to the experimental facility. Considerable effort has been spent developing new high performance constant temperature hot-wire anemometers for general purpose use in the Fluid Mechanics Laboratory. Twenty instruments have been delivered. An important feature of the facility is that it is totally automated under computer control. Unprecedently large quantities of data can be acquired and the results examined using the visualization tools developed specifically for studying the results of numerical simulations on graphics works stations. The experiment must be run for periods of up to a month at a time since the data is collected on a point-by-point basis. Several techniques were implemented to reduce the experimental run-time by a significant factor. Extra probes have been constructed and modifications have been made to the traverse hardware and to the real-time experimental code to enable multiple probes to be used. This will reduce the experimental run-time by the appropriate factor. Hot-wire calibration drift has been a frustrating problem owing to the large range of ambient temperatures experienced in the laboratory. The solution has been to repeat the calibrations at frequent intervals. However the calibration process has consumed up to 40 percent of the run-time. A new method of correcting the drift is very nearly finalized and when implemented it will also lead to a significant reduction in the experimental run-time.
A Hartree-Fock Application Using UPC++ and the New DArray Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ozog, David; Kamil, Amir; Zheng, Yili

The Hartree-Fock (HF) method is the fundamental first step for incorporating quantum mechanics into many-electron simulations of atoms and molecules, and it is an important component of computational chemistry toolkits like NWChem. The GTFock code is an HF implementation that, while it does not have all the features in NWChem, represents crucial algorithmic advances that reduce communication and improve load balance by doing an up-front static partitioning of tasks, followed by work stealing whenever necessary. To enable innovations in algorithms and exploit next generation exascale systems, it is crucial to support quantum chemistry codes using expressive and convenient programming modelsmore » and runtime systems that are also efficient and scalable. Here, this paper presents an HF implementation similar to GTFock using UPC++, a partitioned global address space model that includes flexible communication, asynchronous remote computation, and a powerful multidimensional array library. UPC++ offers runtime features that are useful for HF such as active messages, a rich calculus for array operations, hardware-supported fetch-and-add, and functions for ensuring asynchronous runtime progress. We present a new distributed array abstraction, DArray, that is convenient for the kinds of random-access array updates and linear algebra operations on block-distributed arrays with irregular data ownership. Finally, we analyze the performance of atomic fetch-and-add operations (relevant for load balancing) and runtime attentiveness, then compare various techniques and optimizations for each. Our optimized implementation of HF using UPC++ and the DArrays library shows up to 20% improvement over GTFock with Global Arrays at scales up to 24,000 cores.« less
A fundamental study of suction for Laminar Flow Control (LFC)

NASA Technical Reports Server (NTRS)

Watmuff, Jonathan H.

1992-01-01

This report covers the period forming the first year of the project. The aim is to experimentally investigate the effects of suction as a technique for Laminar Flow Control. Experiments are to be performed which require substantial modifications to be made to the experimental facility. Considerable effort has been spent developing new high performance constant temperature hot-wire anemometers for general purpose use in the Fluid Mechanics Laboratory. Twenty instruments have been delivered. An important feature of the facility is that it is totally automated under computer control. Unprecedently large quantities of data can be acquired and the results examined using the visualization tools developed specifically for studying the results of numerical simulations on graphics works stations. The experiment must be run for periods of up to a month at a time since the data is collected on a point-by-point basis. Several techniques were implemented to reduce the experimental run-time by a significant factor. Extra probes have been constructed and modifications have been made to the traverse hardware and to the real-time experimental code to enable multiple probes to be used. This will reduce the experimental run-time by the appropriate factor. Hot-wire calibration drift has been a frustrating problem owing to the large range of ambient temperatures experienced in the laboratory. The solution has been to repeat the calibrations at frequent intervals. However the calibration process has consumed up to 40 percent of the run-time. A new method of correcting the drift is very nearly finalized and when implemented it will also lead to a significant reduction in the experimental run-time.
A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rajbhandari, Samyam; Kim, Jinsung; Krishnamoorthy, Sriram

This paper describes the design and implementation of a layered domain-specific compiler to support MADNESS---Multiresolution ADaptive Numerical Environment for Scientific Simulation. MADNESS is a high-level software environment for the solution of integral and differential equations in many dimensions, using adaptive and fast harmonic analysis methods with guaranteed precision. MADNESS uses k-d trees to represent spatial functions and implements operators like addition, multiplication, differentiation, and integration on the numerical representation of functions. The MADNESS runtime system provides global namespace support and a task-based execution model including futures. MADNESS is currently deployed on massively parallel supercomputers and has enabled many science advances.more » Due to the highly irregular and statically unpredictable structure of the k-d trees representing the spatial functions encountered in MADNESS applications, only purely runtime approaches to optimization have previously been implemented in the MADNESS framework. This paper describes a layered domain-specific compiler developed to address some performance bottlenecks in MADNESS. The newly developed static compile-time optimizations, in conjunction with the MADNESS runtime support, enable significant performance improvement for the MADNESS framework.« less
Mentat/A: Medium grain parallel processing

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.

1992-01-01

The objective of this project is to test the Algorithm to Architecture Mapping Model (ATAMM) firing rules using the Mentat run-time system and the Mentat Programming Language (MPL). A special version of Mentat, Mentat/A (Mentat/ATAMM) was constructed. This required changes to: (1) modify the run-time system to control queue length and inhibit actor firing until required data tokens are available and space is available in the input queues of all of the direct descendent actors; (2) disallow the specification of persistent object classes in the MPL; and (3) permit only decision free graphs in the MPL. We were successful in implementing the spirit of the plan, although some goals changed as we came to better understand the problem. We report on what we accomplished and the lessons we learned. The Mentat/A run-time system is discussed, and we briefly present the compiler. We present results for three applications and conclude with a summary and some observations. Appendix A contains a list of technical reports and published papers partially supported by the grant. Appendix B contains listings for the three applications.
Optimization and Control of Cyber-Physical Vehicle Systems

PubMed Central

Bradley, Justin M.; Atkins, Ella M.

2015-01-01

A cyber-physical system (CPS) is composed of tightly-integrated computation, communication and physical elements. Medical devices, buildings, mobile devices, robots, transportation and energy systems can benefit from CPS co-design and optimization techniques. Cyber-physical vehicle systems (CPVSs) are rapidly advancing due to progress in real-time computing, control and artificial intelligence. Multidisciplinary or multi-objective design optimization maximizes CPS efficiency, capability and safety, while online regulation enables the vehicle to be responsive to disturbances, modeling errors and uncertainties. CPVS optimization occurs at design-time and at run-time. This paper surveys the run-time cooperative optimization or co-optimization of cyber and physical systems, which have historically been considered separately. A run-time CPVS is also cooperatively regulated or co-regulated when cyber and physical resources are utilized in a manner that is responsive to both cyber and physical system requirements. This paper surveys research that considers both cyber and physical resources in co-optimization and co-regulation schemes with applications to mobile robotic and vehicle systems. Time-varying sampling patterns, sensor scheduling, anytime control, feedback scheduling, task and motion planning and resource sharing are examined. PMID:26378541
Optimization and Control of Cyber-Physical Vehicle Systems.

PubMed

Bradley, Justin M; Atkins, Ella M

2015-09-11

A cyber-physical system (CPS) is composed of tightly-integrated computation, communication and physical elements. Medical devices, buildings, mobile devices, robots, transportation and energy systems can benefit from CPS co-design and optimization techniques. Cyber-physical vehicle systems (CPVSs) are rapidly advancing due to progress in real-time computing, control and artificial intelligence. Multidisciplinary or multi-objective design optimization maximizes CPS efficiency, capability and safety, while online regulation enables the vehicle to be responsive to disturbances, modeling errors and uncertainties. CPVS optimization occurs at design-time and at run-time. This paper surveys the run-time cooperative optimization or co-optimization of cyber and physical systems, which have historically been considered separately. A run-time CPVS is also cooperatively regulated or co-regulated when cyber and physical resources are utilized in a manner that is responsive to both cyber and physical system requirements. This paper surveys research that considers both cyber and physical resources in co-optimization and co-regulation schemes with applications to mobile robotic and vehicle systems. Time-varying sampling patterns, sensor scheduling, anytime control, feedback scheduling, task and motion planning and resource sharing are examined.
A Concept for Run-Time Support of the Chapel Language

NASA Technical Reports Server (NTRS)

James, Mark

2006-01-01

A document presents a concept for run-time implementation of other concepts embodied in the Chapel programming language. (Now undergoing development, Chapel is intended to become a standard language for parallel computing that would surpass older such languages in both computational performance in the efficiency with which pre-existing code can be reused and new code written.) The aforementioned other concepts are those of distributions, domains, allocations, and access, as defined in a separate document called "A Semantic Framework for Domains and Distributions in Chapel" and linked to a language specification defined in another separate document called "Chapel Specification 0.3." The concept presented in the instant report is recognition that a data domain that was invented for Chapel offers a novel approach to distributing and processing data in a massively parallel environment. The concept is offered as a starting point for development of working descriptions of functions and data structures that would be necessary to implement interfaces to a compiler for transforming the aforementioned other concepts from their representations in Chapel source code to their run-time implementations.
Final Project Report. Scalable fault tolerance runtime technology for petascale computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krishnamoorthy, Sriram; Sadayappan, P

With the massive number of components comprising the forthcoming petascale computer systems, hardware failures will be routinely encountered during execution of large-scale applications. Due to the multidisciplinary, multiresolution, and multiscale nature of scientific problems that drive the demand for high end systems, applications place increasingly differing demands on the system resources: disk, network, memory, and CPU. In addition to MPI, future applications are expected to use advanced programming models such as those developed under the DARPA HPCS program as well as existing global address space programming models such as Global Arrays, UPC, and Co-Array Fortran. While there has been amore » considerable amount of work in fault tolerant MPI with a number of strategies and extensions for fault tolerance proposed, virtually none of advanced models proposed for emerging petascale systems is currently fault aware. To achieve fault tolerance, development of underlying runtime and OS technologies able to scale to petascale level is needed. This project has evaluated range of runtime techniques for fault tolerance for advanced programming models.« less
Power-constrained supercomputing

NASA Astrophysics Data System (ADS)

Bailey, Peter E.

As we approach exascale systems, power is turning from an optimization goal to a critical operating constraint. With power bounds imposed by both stakeholders and the limitations of existing infrastructure, achieving practical exascale computing will therefore rely on optimizing performance subject to a power constraint. However, this requirement should not add to the burden of application developers; optimizing the runtime environment given restricted power will primarily be the job of high-performance system software. In this dissertation, we explore this area and develop new techniques that extract maximum performance subject to a particular power constraint. These techniques include a method to find theoretical optimal performance, a runtime system that shifts power in real time to improve performance, and a node-level prediction model for selecting power-efficient operating points. We use a linear programming (LP) formulation to optimize application schedules under various power constraints, where a schedule consists of a DVFS state and number of OpenMP threads for each section of computation between consecutive message passing events. We also provide a more flexible mixed integer-linear (ILP) formulation and show that the resulting schedules closely match schedules from the LP formulation. Across four applications, we use our LP-derived upper bounds to show that current approaches trail optimal, power-constrained performance by up to 41%. This demonstrates limitations of current systems, and our LP formulation provides future optimization approaches with a quantitative optimization target. We also introduce Conductor, a run-time system that intelligently distributes available power to nodes and cores to improve performance. The key techniques used are configuration space exploration and adaptive power balancing. Configuration exploration dynamically selects the optimal thread concurrency level and DVFS state subject to a hardware-enforced power bound. Adaptive power balancing efficiently predicts where critical paths are likely to occur and distributes power to those paths. Greater power, in turn, allows increased thread concurrency levels, CPU frequency/voltage, or both. We describe these techniques in detail and show that, compared to the state-of-the-art technique of using statically predetermined, per-node power caps, Conductor leads to a best-case performance improvement of up to 30%, and an average improvement of 19.1%. At the node level, an accurate power/performance model will aid in selecting the right configuration from a large set of available configurations. We present a novel approach to generate such a model offline using kernel clustering and multivariate linear regression. Our model requires only two iterations to select a configuration, which provides a significant advantage over exhaustive search-based strategies. We apply our model to predict power and performance for different applications using arbitrary configurations, and show that our model, when used with hardware frequency-limiting in a runtime system, selects configurations with significantly higher performance at a given power limit than those chosen by frequency-limiting alone. When applied to a set of 36 computational kernels from a range of applications, our model accurately predicts power and performance; our runtime system based on the model maintains 91% of optimal performance while meeting power constraints 88% of the time. When the runtime system violates a power constraint, it exceeds the constraint by only 6% in the average case, while simultaneously achieving 54% more performance than an oracle. Through the combination of the above contributions, we hope to provide guidance and inspiration to research practitioners working on runtime systems for power-constrained environments. We also hope this dissertation will draw attention to the need for software and runtime-controlled power management under power constraints at various levels, from the processor level to the cluster level.
Evolving binary classifiers through parallel computation of multiple fitness cases.

PubMed

Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni

2005-06-01

This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Runtime Analysis of Linear Temporal Logic Specifications

NASA Technical Reports Server (NTRS)

Giannakopoulou, Dimitra; Havelund, Klaus

2001-01-01

This report presents an approach to checking a running program against its Linear Temporal Logic (LTL) specifications. LTL is a widely used logic for expressing properties of programs viewed as sets of executions. Our approach consists of translating LTL formulae to finite-state automata, which are used as observers of the program behavior. The translation algorithm we propose modifies standard LTL to B chi automata conversion techniques to generate automata that check finite program traces. The algorithm has been implemented in a tool, which has been integrated with the generic JPaX framework for runtime analysis of Java programs.
An Adaptive Cross-Architecture Combination Method for Graph Traversal

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Song, Shuaiwen; Kerbyson, Darren J.

2014-06-18

Breadth-First Search (BFS) is widely used in many real-world applications including computational biology, social networks, and electronic design automation. The combination method, using both top-down and bottom-up techniques, is the most effective BFS approach. However, current combination methods rely on trial-and-error and exhaustive search to locate the optimal switching point, which may cause significant runtime overhead. To solve this problem, we design an adaptive method based on regression analysis to predict an optimal switching point for the combination method at runtime within less than 0.1% of the BFS execution time.
Distributed memory compiler design for sparse problems

NASA Technical Reports Server (NTRS)

Wu, Janet; Saltz, Joel; Berryman, Harry; Hiranandani, Seema

1991-01-01

A compiler and runtime support mechanism is described and demonstrated. The methods presented are capable of solving a wide range of sparse and unstructured problems in scientific computing. The compiler takes as input a FORTRAN 77 program enhanced with specifications for distributing data, and the compiler outputs a message passing program that runs on a distributed memory computer. The runtime support for this compiler is a library of primitives designed to efficiently support irregular patterns of distributed array accesses and irregular distributed array partitions. A variety of Intel iPSC/860 performance results obtained through the use of this compiler are presented.
A diagnostic interface for the ICOsahedral Non-hydrostatic (ICON) modelling framework based on the Modular Earth Submodel System (MESSy v2.50)

NASA Astrophysics Data System (ADS)

Kern, Bastian; Jöckel, Patrick

2016-10-01

Numerical climate and weather models have advanced to finer scales, accompanied by large amounts of output data. The model systems hit the input and output (I/O) bottleneck of modern high-performance computing (HPC) systems. We aim to apply diagnostic methods online during the model simulation instead of applying them as a post-processing step to written output data, to reduce the amount of I/O. To include diagnostic tools into the model system, we implemented a standardised, easy-to-use interface based on the Modular Earth Submodel System (MESSy) into the ICOsahedral Non-hydrostatic (ICON) modelling framework. The integration of the diagnostic interface into the model system is briefly described. Furthermore, we present a prototype implementation of an advanced online diagnostic tool for the aggregation of model data onto a user-defined regular coarse grid. This diagnostic tool will be used to reduce the amount of model output in future simulations. Performance tests of the interface and of two different diagnostic tools show, that the interface itself introduces no overhead in form of additional runtime to the model system. The diagnostic tools, however, have significant impact on the model system's runtime. This overhead strongly depends on the characteristics and implementation of the diagnostic tool. A diagnostic tool with high inter-process communication introduces large overhead, whereas the additional runtime of a diagnostic tool without inter-process communication is low. We briefly describe our efforts to reduce the additional runtime from the diagnostic tools, and present a brief analysis of memory consumption. Future work will focus on optimisation of the memory footprint and the I/O operations of the diagnostic interface.

Impact of backwashing procedures on deep bed filtration productivity in drinking water treatment.

PubMed

Slavik, Irene; Jehmlich, Alexander; Uhl, Wolfgang

2013-10-15

Backwash procedures for deep bed filters were evaluated and compared by means of a new integrated approach based on productivity. For this, different backwash procedures were experimentally evaluated by using a pilot plant for direct filtration. A standard backwash mode as applied in practice served as a reference and effluent turbidity was used as the criterion for filter run termination. The backwash water volumes needed, duration of the filter-to-waste period, time out of operation, total volume discharged and filter run-time were determined and used to calculate average filtration velocity and average productivity. Results for filter run-times, filter backwash volumes, and filter-to-waste volumes showed considerable differences between the backwash procedures. Thus, backwash procedures with additional clear flushing phases were characterised by an increased need for backwash water. However, this additional water consumption could not be compensated by savings during filter ripening. Compared to the reference backwash procedure, filter run-times were longer for both single-media and dual-media filters when air scour and air/water flush were optimised with respect to flow rates and the proportion of air and water. This means that drinking water production time is longer and less water is needed for filter bed cleaning. Also, backwashing with additional clear flushing phases resulted in longer filter run-times before turbidity breakthrough. However, regarding the productivity of the filtration process, it was shown that it was almost the same for all of the backwash procedures investigated in this study. Due to this unexpected finding, the relationships between filter bed cleaning, filter ripening and filtration performance were considered and important conclusions and new approaches for process optimisation and resource savings were derived. Copyright © 2013 Elsevier Ltd. All rights reserved.
Active Storage with Analytics Capabilities and I/O Runtime System for Petascale Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Choudhary, Alok

Computational scientists must understand results from experimental, observational and computational simulation generated data to gain insights and perform knowledge discovery. As systems approach the petascale range, problems that were unimaginable a few years ago are within reach. With the increasing volume and complexity of data produced by ultra-scale simulations and high-throughput experiments, understanding the science is largely hampered by the lack of comprehensive I/O, storage, acceleration of data manipulation, analysis, and mining tools. Scientists require techniques, tools and infrastructure to facilitate better understanding of their data, in particular the ability to effectively perform complex data analysis, statistical analysis and knowledgemore » discovery. The goal of this work is to enable more effective analysis of scientific datasets through the integration of enhancements in the I/O stack, from active storage support at the file system layer to MPI-IO and high-level I/O library layers. We propose to provide software components to accelerate data analytics, mining, I/O, and knowledge discovery for large-scale scientific applications, thereby increasing productivity of both scientists and the systems. Our approaches include 1) design the interfaces in high-level I/O libraries, such as parallel netCDF, for applications to activate data mining operations at the lower I/O layers; 2) Enhance MPI-IO runtime systems to incorporate the functionality developed as a part of the runtime system design; 3) Develop parallel data mining programs as part of runtime library for server-side file system in PVFS file system; and 4) Prototype an active storage cluster, which will utilize multicore CPUs, GPUs, and FPGAs to carry out the data mining workload.« less
Highlights of X-Stack ExM Deliverable: MosaStore

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ripeanu, Matei

2016-07-20

This brief report highlights the experience gained with MosaStore, an exploratory part of the X-Stack project “ExM: System support for extreme-scale, many-task applications”. The ExM project proposed to use concurrent workflows supported by the Swift language and runtime as an innovative programming model to exploit parallelism in exascale computers. MosaStore aims to support this endeavor by improving storage support for workflow-based applications, more precisely by exploring the gains that can be obtained from co-designing the storage system and the workflow runtime engine. MosaStore has been developed primarily at the University of British Columbia.
Instrumentation of Java Bytecode for Runtime Analysis

NASA Technical Reports Server (NTRS)

Goldberg, Allen; Haveland, Klaus

2003-01-01

This paper describes JSpy, a system for high-level instrumentation of Java bytecode and its use with JPaX, OUT system for runtime analysis of Java programs. JPaX monitors the execution of temporal logic formulas and performs predicative analysis of deadlocks and data races. JSpy s input is an instrumentation specification, which consists of a collection of rules, where a rule is a predicate/action pair The predicate is a conjunction of syntactic constraints on a Java statement, and the action is a description of logging information to be inserted in the bytecode corresponding to the statement. JSpy is built using JTrek an instrumentation package at a lower level of abstraction.
A new order-theoretic characterisation of the polytime computable functions☆

PubMed Central

Avanzini, Martin; Eguchi, Naohi; Moser, Georg

2015-01-01

We propose a new order-theoretic characterisation of the class of polytime computable functions. To this avail we define the small polynomial path order (sPOP⁎ for short). This termination order entails a new syntactic method to analyse the innermost runtime complexity of term rewrite systems fully automatically: for any rewrite system compatible with sPOP⁎ that employs recursion up to depth d, the (innermost) runtime complexity is polynomially bounded of degree d. This bound is tight. Thus we obtain a direct correspondence between a syntactic (and easily verifiable) condition of a program and the asymptotic worst-case complexity of the program. PMID:26412933
Monitoring Object Library Usage and Changes

NASA Technical Reports Server (NTRS)

Owen, R. K.; Craw, James M. (Technical Monitor)

1995-01-01

The NASA Ames Numerical Aerodynamic Simulation program Aeronautics Consolidated Supercomputing Facility (NAS/ACSF) supercomputing center services over 1600 users, and has numerous analysts with root access. Several tools have been developed to monitor object library usage and changes. Some of the tools do "noninvasive" monitoring and other tools implement run-time logging even for object-only libraries. The run-time logging identifies who, when, and what is being used. The benefits are that real usage can be measured, unused libraries can be discontinued, training and optimization efforts can be focused at those numerical methods that are actually used. An overview of the tools will be given and the results will be discussed.
Onboard Run-Time Goal Selection for Autonomous Operations

NASA Technical Reports Server (NTRS)

Rabideau, Gregg; Chien, Steve; McLaren, David

2010-01-01

We describe an efficient, online goal selection algorithm for use onboard spacecraft and its use for selecting goals at runtime. Our focus is on the re-planning that must be performed in a timely manner on the embedded system where computational resources are limited. In particular, our algorithm generates near optimal solutions to problems with fully specified goal requests that oversubscribe available resources but have no temporal flexibility. By using a fast, incremental algorithm, goal selection can be postponed in a "just-in-time" fashion allowing requests to be changed or added at the last minute. This enables shorter response cycles and greater autonomy for the system under control.
AEOSS runtime manual for system analysis on Advanced Earth-Orbital Spacecraft Systems

NASA Technical Reports Server (NTRS)

Lee, Hwa-Ping

1990-01-01

Advanced earth orbital spacecraft system (AEOSS) enables users to project the required power, weight, and cost for a generic earth-orbital spacecraft system. These variables are calculated on the component and subsystem levels, and then the system level. The included six subsystems are electric power, thermal control, structure, auxiliary propulsion, attitude control, and communication, command, and data handling. The costs are computed using statistically determined models that were derived from the flown spacecraft in the past and were categorized into classes according to their functions and structural complexity. Selected design and performance analyses for essential components and subsystems are also provided. AEOSS has the feature permitting a user to enter known values of these parameters, totally and partially, at all levels. All information is of vital importance to project managers of subsystems or a spacecraft system. AEOSS is a specially tailored software coded from the relational database program of the Acius' 4th Dimension with a Macintosh version. Because of the licensing agreements, two versions of the AEOSS documents were prepared. This version, AEOSS Runtime Manual, is permitted to be distributed with a finite number of the restrictive 4D Runtime version. It can perform all contained applications without any programming alterations.
Holistic Context-Sensitivity for Run-Time Optimization of Flexible Manufacturing Systems.

PubMed

Scholze, Sebastian; Barata, Jose; Stokic, Dragan

2017-02-24

Highly flexible manufacturing systems require continuous run-time (self-) optimization of processes with respect to diverse parameters, e.g., efficiency, availability, energy consumption etc. A promising approach for achieving (self-) optimization in manufacturing systems is the usage of the context sensitivity approach based on data streaming from high amount of sensors and other data sources. Cyber-physical systems play an important role as sources of information to achieve context sensitivity. Cyber-physical systems can be seen as complex intelligent sensors providing data needed to identify the current context under which the manufacturing system is operating. In this paper, it is demonstrated how context sensitivity can be used to realize a holistic solution for (self-) optimization of discrete flexible manufacturing systems, by making use of cyber-physical systems integrated in manufacturing systems/processes. A generic approach for context sensitivity, based on self-learning algorithms, is proposed aiming at a various manufacturing systems. The new solution encompasses run-time context extractor and optimizer. Based on the self-learning module both context extraction and optimizer are continuously learning and improving their performance. The solution is following Service Oriented Architecture principles. The generic solution is developed and then applied to two very different manufacturing processes.
Holistic Context-Sensitivity for Run-Time Optimization of Flexible Manufacturing Systems

PubMed Central

Scholze, Sebastian; Barata, Jose; Stokic, Dragan

2017-01-01

Highly flexible manufacturing systems require continuous run-time (self-) optimization of processes with respect to diverse parameters, e.g., efficiency, availability, energy consumption etc. A promising approach for achieving (self-) optimization in manufacturing systems is the usage of the context sensitivity approach based on data streaming from high amount of sensors and other data sources. Cyber-physical systems play an important role as sources of information to achieve context sensitivity. Cyber-physical systems can be seen as complex intelligent sensors providing data needed to identify the current context under which the manufacturing system is operating. In this paper, it is demonstrated how context sensitivity can be used to realize a holistic solution for (self-) optimization of discrete flexible manufacturing systems, by making use of cyber-physical systems integrated in manufacturing systems/processes. A generic approach for context sensitivity, based on self-learning algorithms, is proposed aiming at a various manufacturing systems. The new solution encompasses run-time context extractor and optimizer. Based on the self-learning module both context extraction and optimizer are continuously learning and improving their performance. The solution is following Service Oriented Architecture principles. The generic solution is developed and then applied to two very different manufacturing processes. PMID:28245564
MCR Container Tools

DOE Office of Scientific and Technical Information (OSTI.GOV)

Haas, Nicholas Q; Gillen, Robert E; Karnowski, Thomas P

MathWorks' MATLAB is widely used in academia and industry for prototyping, data analysis, data processing, etc. Many users compile their programs using the MATLAB Compiler to run on workstations/computing clusters via the free MATLAB Compiler Runtime (MCR). The MCR facilitates the execution of code calling Application Programming Interfaces (API) functions from both base MATLAB and MATLAB toolboxes. In a Linux environment, a sizable number of third-party runtime dependencies (i.e. shared libraries) are necessary. Unfortunately, to the MTLAB community's knowledge, these dependencies are not documented, leaving system administrators and/or end-users to find/install the necessary libraries either as runtime errors resulting frommore » them missing or by inspecting the header information of Executable and Linkable Format (ELF) libraries of the MCR to determine which ones are missing from the system. To address various shortcomings, Docker Images based on Community Enterprise Operating System (CentOS) 7, a derivative of Redhat Enterprise Linux (RHEL) 7, containing recent (2015-2017) MCR releases and their dependencies were created. These images, along with a provided sample Docker Compose YAML Script, can be used to create a simulated computing cluster where MATLAB Compiler created binaries can be executed using a sample Slurm Workload Manager script.« less
Environment Modeling Using Runtime Values for JPF-Android

NASA Technical Reports Server (NTRS)

van der Merwe, Heila; Tkachuk, Oksana; Nel, Seal; van der Merwe, Brink; Visser, Willem

2015-01-01

Software applications are developed to be executed in a specific environment. This environment includes external native libraries to add functionality to the application and drivers to fire the application execution. For testing and verification, the environment of an application is simplified abstracted using models or stubs. Empty stubs, returning default values, are simple to generate automatically, but they do not perform well when the application expects specific return values. Symbolic execution is used to find input parameters for drivers and return values for library stubs, but it struggles to detect the values of complex objects. In this work-in-progress paper, we explore an approach to generate drivers and stubs based on values collected during runtime instead of using default values. Entry-points and methods that need to be modeled are instrumented to log their parameters and return values. The instrumented applications are then executed using a driver and instrumented libraries. The values collected during runtime are used to generate driver and stub values on- the-fly that improve coverage during verification by enabling the execution of code that previously crashed or was missed. We are implementing this approach to improve the environment model of JPF-Android, our model checking and analysis tool for Android applications.
Efficiency of analytical and sampling-based uncertainty propagation in intensity-modulated proton therapy

NASA Astrophysics Data System (ADS)

Wahl, N.; Hennig, P.; Wieser, H. P.; Bangert, M.

2017-07-01

The sensitivity of intensity-modulated proton therapy (IMPT) treatment plans to uncertainties can be quantified and mitigated with robust/min-max and stochastic/probabilistic treatment analysis and optimization techniques. Those methods usually rely on sparse random, importance, or worst-case sampling. Inevitably, this imposes a trade-off between computational speed and accuracy of the uncertainty propagation. Here, we investigate analytical probabilistic modeling (APM) as an alternative for uncertainty propagation and minimization in IMPT that does not rely on scenario sampling. APM propagates probability distributions over range and setup uncertainties via a Gaussian pencil-beam approximation into moments of the probability distributions over the resulting dose in closed form. It supports arbitrary correlation models and allows for efficient incorporation of fractionation effects regarding random and systematic errors. We evaluate the trade-off between run-time and accuracy of APM uncertainty computations on three patient datasets. Results are compared against reference computations facilitating importance and random sampling. Two approximation techniques to accelerate uncertainty propagation and minimization based on probabilistic treatment plan optimization are presented. Runtimes are measured on CPU and GPU platforms, dosimetric accuracy is quantified in comparison to a sampling-based benchmark (5000 random samples). APM accurately propagates range and setup uncertainties into dose uncertainties at competitive run-times (GPU ≤slant {5} min). The resulting standard deviation (expectation value) of dose show average global γ{3% / {3}~mm} pass rates between 94.2% and 99.9% (98.4% and 100.0%). All investigated importance sampling strategies provided less accuracy at higher run-times considering only a single fraction. Considering fractionation, APM uncertainty propagation and treatment plan optimization was proven to be possible at constant time complexity, while run-times of sampling-based computations are linear in the number of fractions. Using sum sampling within APM, uncertainty propagation can only be accelerated at the cost of reduced accuracy in variance calculations. For probabilistic plan optimization, we were able to approximate the necessary pre-computations within seconds, yielding treatment plans of similar quality as gained from exact uncertainty propagation. APM is suited to enhance the trade-off between speed and accuracy in uncertainty propagation and probabilistic treatment plan optimization, especially in the context of fractionation. This brings fully-fledged APM computations within reach of clinical application.
Efficiency of analytical and sampling-based uncertainty propagation in intensity-modulated proton therapy.

PubMed

Wahl, N; Hennig, P; Wieser, H P; Bangert, M

2017-06-26

The sensitivity of intensity-modulated proton therapy (IMPT) treatment plans to uncertainties can be quantified and mitigated with robust/min-max and stochastic/probabilistic treatment analysis and optimization techniques. Those methods usually rely on sparse random, importance, or worst-case sampling. Inevitably, this imposes a trade-off between computational speed and accuracy of the uncertainty propagation. Here, we investigate analytical probabilistic modeling (APM) as an alternative for uncertainty propagation and minimization in IMPT that does not rely on scenario sampling. APM propagates probability distributions over range and setup uncertainties via a Gaussian pencil-beam approximation into moments of the probability distributions over the resulting dose in closed form. It supports arbitrary correlation models and allows for efficient incorporation of fractionation effects regarding random and systematic errors. We evaluate the trade-off between run-time and accuracy of APM uncertainty computations on three patient datasets. Results are compared against reference computations facilitating importance and random sampling. Two approximation techniques to accelerate uncertainty propagation and minimization based on probabilistic treatment plan optimization are presented. Runtimes are measured on CPU and GPU platforms, dosimetric accuracy is quantified in comparison to a sampling-based benchmark (5000 random samples). APM accurately propagates range and setup uncertainties into dose uncertainties at competitive run-times (GPU [Formula: see text] min). The resulting standard deviation (expectation value) of dose show average global [Formula: see text] pass rates between 94.2% and 99.9% (98.4% and 100.0%). All investigated importance sampling strategies provided less accuracy at higher run-times considering only a single fraction. Considering fractionation, APM uncertainty propagation and treatment plan optimization was proven to be possible at constant time complexity, while run-times of sampling-based computations are linear in the number of fractions. Using sum sampling within APM, uncertainty propagation can only be accelerated at the cost of reduced accuracy in variance calculations. For probabilistic plan optimization, we were able to approximate the necessary pre-computations within seconds, yielding treatment plans of similar quality as gained from exact uncertainty propagation. APM is suited to enhance the trade-off between speed and accuracy in uncertainty propagation and probabilistic treatment plan optimization, especially in the context of fractionation. This brings fully-fledged APM computations within reach of clinical application.
Execution time supports for adaptive scientific algorithms on distributed memory machines

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey

1990-01-01

Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
Rapid Processing of Radio Interferometer Data for Transient Surveys

NASA Astrophysics Data System (ADS)

Bourke, S.; Mooley, K.; Hallinan, G.

2014-05-01

We report on a software infrastructure and pipeline developed to process large radio interferometer datasets. The pipeline is implemented using a radical redesign of the AIPS processing model. An infrastructure we have named AIPSlite is used to spawn, at runtime, minimal AIPS environments across a cluster. The pipeline then distributes and processes its data in parallel. The system is entirely free of the traditional AIPS distribution and is self configuring at runtime. This software has so far been used to process a EVLA Stripe 82 transient survey, the data for the JVLA-COSMOS project, and has been used to process most of the EVLA L-Band data archive imaging each integration to search for short duration transients.
The Concert system - Compiler and runtime technology for efficient concurrent object-oriented programming

NASA Technical Reports Server (NTRS)

Chien, Andrew A.; Karamcheti, Vijay; Plevyak, John; Sahrawat, Deepak

1993-01-01

Concurrent object-oriented languages, particularly fine-grained approaches, reduce the difficulty of large scale concurrent programming by providing modularity through encapsulation while exposing large degrees of concurrency. Despite these programmability advantages, such languages have historically suffered from poor efficiency. This paper describes the Concert project whose goal is to develop portable, efficient implementations of fine-grained concurrent object-oriented languages. Our approach incorporates aggressive program analysis and program transformation with careful information management at every stage from the compiler to the runtime system. The paper discusses the basic elements of the Concert approach along with a description of the potential payoffs. Initial performance results and specific plans for system development are also detailed.
Execution time support for scientific programs on distributed memory machines

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey

1990-01-01

Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
An extension of the OpenModelica compiler for using Modelica models in a discrete event simulation

DOE PAGES

Nutaro, James

2014-11-03

In this article, a new back-end and run-time system is described for the OpenModelica compiler. This new back-end transforms a Modelica model into a module for the adevs discrete event simulation package, thereby extending adevs to encompass complex, hybrid dynamical systems. The new run-time system that has been built within the adevs simulation package supports models with state-events and time-events and that comprise differential-algebraic systems with high index. Finally, although the procedure for effecting this transformation is based on adevs and the Discrete Event System Specification, it can be adapted to any discrete event simulation package.
SERENITY in e-Business and Smart Item Scenarios

NASA Astrophysics Data System (ADS)

Benameur, Azzedine; Khoury, Paul El; Seguran, Magali; Sinha, Smriti Kumar

SERENITY Artefacts, like Class, Patterns, Implementations and Executable Components for Security & Dependability (S&D) in addition to Serenity Runtime Framework (SRF) are discussed in previous chapters. How to integrate these artefacts with applications in Serenity approach is discussed here with two scenarios. The e-Business scenario is a standard loan origination process in a bank. The Smart Item scenario is an Ambient intelligence case study where we take advantage of Smart Items to provide an electronic healthcare infrastructure for remote healthcare assistance. In both cases, we detail how the prototype implementations of the scenarios select proper executable components through Serenity Runtime Framework and then demonstrate how these executable components of the S&D Patterns are deployed.

Implications of Responsive Space on the Flight Software Architecture

NASA Technical Reports Server (NTRS)

Wilmot, Jonathan

2006-01-01

The Responsive Space initiative has several implications for flight software that need to be addressed not only within the run-time element, but the development infrastructure and software life-cycle process elements as well. The runtime element must at a minimum support Plug & Play, while the development and process elements need to incorporate methods to quickly generate the needed documentation, code, tests, and all of the artifacts required of flight quality software. Very rapid response times go even further, and imply little or no new software development, requiring instead, using only predeveloped and certified software modules that can be integrated and tested through automated methods. These elements have typically been addressed individually with significant benefits, but it is when they are combined that they can have the greatest impact to Responsive Space. The Flight Software Branch at NASA's Goddard Space Flight Center has been developing the runtime, infrastructure and process elements needed for rapid integration with the Core Flight software System (CFS) architecture. The CFS architecture consists of three main components; the core Flight Executive (cFE), the component catalog, and the Integrated Development Environment (DE). This paper will discuss the design of the components, how they facilitate rapid integration, and lessons learned as the architecture is utilized for an upcoming spacecraft.
Traleika Glacier X-Stack Extension Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fryman, Joshua

The XStack Extension Project continued along the direction of the XStack program in exploring the software tools and frameworks to support a task-based community runtime towards the goal of Exascale programming. The momentum built as part of the XStack project, with the development of the task-based Open Community Runtime (OCR) and related tools, was carried through during the XStack Extension with the focus areas of easing application development, improving performance and supporting more features. The infrastructure set up for a community-driven open-source development continued to be used towards these areas, with continued co-development of runtime and applications. A variety ofmore » OCR programming environments were studied, as described in Sections Revolutionary Programming Environments & Applications – to assist with application development on OCR, and we develop OCR Translator, a ROSE-based source-to-source compiler that parses high-level annotations in an MPI program to generate equivalent OCR code. Figure 2 compares the number of OCR objects needed to generate the 2D stencil workload using the translator, against manual approaches based on SPMD library or native coding. The rate of increase with the translator, with an increase in number of ranks, is consistent with other approaches. This is explored further in Section OCR Translator.« less
Highly accurate fast lung CT registration

NASA Astrophysics Data System (ADS)

Rühaak, Jan; Heldmann, Stefan; Kipshagen, Till; Fischer, Bernd

2013-03-01

Lung registration in thoracic CT scans has received much attention in the medical imaging community. Possible applications range from follow-up analysis, motion correction for radiation therapy, monitoring of air flow and pulmonary function to lung elasticity analysis. In a clinical environment, runtime is always a critical issue, ruling out quite a few excellent registration approaches. In this paper, a highly efficient variational lung registration method based on minimizing the normalized gradient fields distance measure with curvature regularization is presented. The method ensures diffeomorphic deformations by an additional volume regularization. Supplemental user knowledge, like a segmentation of the lungs, may be incorporated as well. The accuracy of our method was evaluated on 40 test cases from clinical routine. In the EMPIRE10 lung registration challenge, our scheme ranks third, with respect to various validation criteria, out of 28 algorithms with an average landmark distance of 0.72 mm. The average runtime is about 1:50 min on a standard PC, making it by far the fastest approach of the top-ranking algorithms. Additionally, the ten publicly available DIR-Lab inhale-exhale scan pairs were registered to subvoxel accuracy at computation times of only 20 seconds. Our method thus combines very attractive runtimes with state-of-the-art accuracy in a unique way.
Optimized distributed computing environment for mask data preparation

NASA Astrophysics Data System (ADS)

Ahn, Byoung-Sup; Bang, Ju-Mi; Ji, Min-Kyu; Kang, Sun; Jang, Sung-Hoon; Choi, Yo-Han; Ki, Won-Tai; Choi, Seong-Woon; Han, Woo-Sung

2005-11-01

As the critical dimension (CD) becomes smaller, various resolution enhancement techniques (RET) are widely adopted. In developing sub-100nm devices, the complexity of optical proximity correction (OPC) is severely increased and applied OPC layers are expanded to non-critical layers. The transformation of designed pattern data by OPC operation causes complexity, which cause runtime overheads to following steps such as mask data preparation (MDP), and collapse of existing design hierarchy. Therefore, many mask shops exploit the distributed computing method in order to reduce the runtime of mask data preparation rather than exploit the design hierarchy. Distributed computing uses a cluster of computers that are connected to local network system. However, there are two things to limit the benefit of the distributing computing method in MDP. First, every sequential MDP job, which uses maximum number of available CPUs, is not efficient compared to parallel MDP job execution due to the input data characteristics. Second, the runtime enhancement over input cost is not sufficient enough since the scalability of fracturing tools is limited. In this paper, we will discuss optimum load balancing environment that is useful in increasing the uptime of distributed computing system by assigning appropriate number of CPUs for each input design data. We will also describe the distributed processing (DP) parameter optimization to obtain maximum throughput in MDP job processing.
A Cross-Platform Infrastructure for Scalable Runtime Application Performance Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jack Dongarra; Shirley Moore; Bart Miller, Jeffrey Hollingsworth

2005-03-15

The purpose of this project was to build an extensible cross-platform infrastructure to facilitate the development of accurate and portable performance analysis tools for current and future high performance computing (HPC) architectures. Major accomplishments include tools and techniques for multidimensional performance analysis, as well as improved support for dynamic performance monitoring of multithreaded and multiprocess applications. Previous performance tool development has been limited by the burden of having to re-write a platform-dependent low-level substrate for each architecture/operating system pair in order to obtain the necessary performance data from the system. Manual interpretation of performance data is not scalable for large-scalemore » long-running applications. The infrastructure developed by this project provides a foundation for building portable and scalable performance analysis tools, with the end goal being to provide application developers with the information they need to analyze, understand, and tune the performance of terascale applications on HPC architectures. The backend portion of the infrastructure provides runtime instrumentation capability and access to hardware performance counters, with thread-safety for shared memory environments and a communication substrate to support instrumentation of multiprocess and distributed programs. Front end interfaces provides tool developers with a well-defined, platform-independent set of calls for requesting performance data. End-user tools have been developed that demonstrate runtime data collection, on-line and off-line analysis of performance data, and multidimensional performance analysis. The infrastructure is based on two underlying performance instrumentation technologies. These technologies are the PAPI cross-platform library interface to hardware performance counters and the cross-platform Dyninst library interface for runtime modification of executable images. The Paradyn and KOJAK projects have made use of this infrastructure to build performance measurement and analysis tools that scale to long-running programs on large parallel and distributed systems and that automate much of the search for performance bottlenecks.« less
Bayesian Model Selection under Time Constraints

NASA Astrophysics Data System (ADS)

Hoege, M.; Nowak, W.; Illman, W. A.

2017-12-01

Bayesian model selection (BMS) provides a consistent framework for rating and comparing models in multi-model inference. In cases where models of vastly different complexity compete with each other, we also face vastly different computational runtimes of such models. For instance, time series of a quantity of interest can be simulated by an autoregressive process model that takes even less than a second for one run, or by a partial differential equations-based model with runtimes up to several hours or even days. The classical BMS is based on a quantity called Bayesian model evidence (BME). It determines the model weights in the selection process and resembles a trade-off between bias of a model and its complexity. However, in practice, the runtime of models is another weight relevant factor for model selection. Hence, we believe that it should be included, leading to an overall trade-off problem between bias, variance and computing effort. We approach this triple trade-off from the viewpoint of our ability to generate realizations of the models under a given computational budget. One way to obtain BME values is through sampling-based integration techniques. We argue with the fact that more expensive models can be sampled much less under time constraints than faster models (in straight proportion to their runtime). The computed evidence in favor of a more expensive model is statistically less significant than the evidence computed in favor of a faster model, since sampling-based strategies are always subject to statistical sampling error. We present a straightforward way to include this misbalance into the model weights that are the basis for model selection. Our approach follows directly from the idea of insufficient significance. It is based on a computationally cheap bootstrapping error estimate of model evidence and is easy to implement. The approach is illustrated in a small synthetic modeling study.
Malware detection and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chiang, Ken; Lloyd, Levi; Crussell, Jonathan

Embodiments of the invention describe systems and methods for malicious software detection and analysis. A binary executable comprising obfuscated malware on a host device may be received, and incident data indicating a time when the binary executable was received and identifying processes operating on the host device may be recorded. The binary executable is analyzed via a scalable plurality of execution environments, including one or more non-virtual execution environments and one or more virtual execution environments, to generate runtime data and deobfuscation data attributable to the binary executable. At least some of the runtime data and deobfuscation data attributable tomore » the binary executable is stored in a shared database, while at least some of the incident data is stored in a private, non-shared database.« less
Generalizing the extensibility of a dynamic geometry software

NASA Astrophysics Data System (ADS)

Herceg, Đorđe; Radaković, Davorka; Herceg, Dejana

2012-09-01

Plug-and-play visual components in a Dynamic Geometry Software (DGS) enable development of visually attractive, rich and highly interactive dynamic drawings. We are developing SLGeometry, a DGS that contains a custom programming language, a computer algebra system (CAS engine) and a graphics subsystem. The basic extensibility framework on SLGeometry supports dynamic addition of new functions from attribute annotated classes that implement runtime metadata registration in code. We present a general plug-in framework for dynamic importing of arbitrary Silverlight user interface (UI) controls into SLGeometry at runtime. The CAS engine maintains a metadata storage that describes each imported visual component and enables two-way communication between the expressions stored in the engine and the UI controls on the screen.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Collins, Benjamin S.; Hamilton, Steven P.; Jarrett, Michael G.

This report describes the performance improvements made to the VERA Core Simulator (VERA-CS) during FY2016. The development of the VERA Core Simulator has focused on the capability needed to deplete physical reactors and help solve various problems; this capability required the accurate simulation of many operating cycles of a nuclear power plant. The first section of this report introduces two test problems used to assess the run-time performance of VERA-CS using a source dated February 2016. The next section provides a brief overview of the major modifications made to decrease the computational cost. Following the descriptions of the major improvements,more » the run-time for each improvement is shown. Conclusions on the work are presented, and further follow-on performance improvements are suggested.« less
Fiia: A Model-Based Approach to Engineering Collaborative Augmented Reality

NASA Astrophysics Data System (ADS)

Wolfe, Christopher; Smith, J. David; Phillips, W. Greg; Graham, T. C. Nicholas

Augmented reality systems often involve collaboration among groups of people. While there are numerous toolkits that aid the development of such augmented reality groupware systems (e.g., ARToolkit and Groupkit), there remains an enormous gap between the specification of an AR groupware application and its implementation. In this chapter, we present Fiia, a toolkit which simplifies the development of collaborative AR applications. Developers specify the structure of their applications using the Fiia modeling language, which abstracts details of networking and provides high-level support for specifying adapters between the physical and virtual world. The Fiia.Net runtime system then maps this conceptual model to a runtime implementation. We illustrate Fiia via Raptor, an augmented reality application used to help small groups collaboratively prototype video games.
GPU accelerated FDTD solver and its application in MRI.

PubMed

Chi, J; Liu, F; Jin, J; Mason, D G; Crozier, S

2010-01-01

The finite difference time domain (FDTD) method is a popular technique for computational electromagnetics (CEM). The large computational power often required, however, has been a limiting factor for its applications. In this paper, we will present a graphics processing unit (GPU)-based parallel FDTD solver and its successful application to the investigation of a novel B1 shimming scheme for high-field magnetic resonance imaging (MRI). The optimized shimming scheme exhibits considerably improved transmit B(1) profiles. The GPU implementation dramatically shortened the runtime of FDTD simulation of electromagnetic field compared with its CPU counterpart. The acceleration in runtime has made such investigation possible, and will pave the way for other studies of large-scale computational electromagnetic problems in modern MRI which were previously impractical.
Parallel Clustering Algorithm for Large-Scale Biological Data Sets

PubMed Central

Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

2014-01-01

Backgrounds Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Methods Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. Result A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. PMID:24705246
Flow-Centric, Back-in-Time Debugging

NASA Astrophysics Data System (ADS)

Lienhard, Adrian; Fierz, Julien; Nierstrasz, Oscar

Conventional debugging tools present developers with means to explore the run-time context in which an error has occurred. In many cases this is enough to help the developer discover the faulty source code and correct it. However, rather often errors occur due to code that has executed in the past, leaving certain objects in an inconsistent state. The actual run-time error only occurs when these inconsistent objects are used later in the program. So-called back-in-time debuggers help developers step back through earlier states of the program and explore execution contexts not available to conventional debuggers. Nevertheless, even Back-in-Time Debuggers do not help answer the question, “Where did this object come from?” The Object-Flow Virtual Machine, which we have proposed in previous work, tracks the flow of objects to answer precisely such questions, but this VM does not provide dedicated debugging support to explore faulty programs. In this paper we present a novel debugger, called Compass, to navigate between conventional run-time stack-oriented control flow views and object flows. Compass enables a developer to effectively navigate from an object contributing to an error back-in-time through all the code that has touched the object. We present the design and implementation of Compass, and we demonstrate how flow-centric, back-in-time debugging can be used to effectively locate the source of hard-to-find bugs.
SARANA: language, compiler and run-time system support for spatially aware and resource-aware mobile computing.

PubMed

Hari, Pradip; Ko, Kevin; Koukoumidis, Emmanouil; Kremer, Ulrich; Martonosi, Margaret; Ottoni, Desiree; Peh, Li-Shiuan; Zhang, Pei

2008-10-28

Increasingly, spatial awareness plays a central role in many distributed and mobile computing applications. Spatially aware applications rely on information about the geographical position of compute devices and their supported services in order to support novel functionality. While many spatial application drivers already exist in mobile and distributed computing, very little systems research has explored how best to program these applications, to express their spatial and temporal constraints, and to allow efficient implementations on highly dynamic real-world platforms. This paper proposes the SARANA system architecture, which includes language and run-time system support for spatially aware and resource-aware applications. SARANA allows users to express spatial regions of interest, as well as trade-offs between quality of result (QoR), latency and cost. The goal is to produce applications that use resources efficiently and that can be run on diverse resource-constrained platforms ranging from laptops to personal digital assistants and to smart phones. SARANA's run-time system manages QoR and cost trade-offs dynamically by tracking resource availability and locations, brokering usage/pricing agreements and migrating programs to nodes accordingly. A resource cost model permeates the SARANA system layers, permitting users to express their resource needs and QoR expectations in units that make sense to them. Although we are still early in the system development, initial versions have been demonstrated on a nine-node system prototype.
Detecting Payload Attacks on Programmable Logic Controllers (PLCs)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Huan

Programmable logic controllers (PLCs) play critical roles in industrial control systems (ICS). Providing hardware peripherals and firmware support for control programs (i.e., a PLC’s “payload”) written in languages such as ladder logic, PLCs directly receive sensor readings and control ICS physical processes. An attacker with access to PLC development software (e.g., by compromising an engineering workstation) can modify the payload program and cause severe physical damages to the ICS. To protect critical ICS infrastructure, we propose to model runtime behaviors of legitimate PLC payload program and use runtime behavior monitoring in PLC firmware to detect payload attacks. By monitoring themore » I/O access patterns, network access patterns, as well as payload program timing characteristics, our proposed firmware-level detection mechanism can detect abnormal runtime behaviors of malicious PLC payload. Using our proof-of-concept implementation, we evaluate the memory and execution time overhead of implementing our proposed method and find that it is feasible to incorporate our method into existing PLC firmware. In addition, our evaluation results show that a wide variety of payload attacks can be effectively detected by our proposed approach. The proposed firmware-level payload attack detection scheme complements existing bumpin- the-wire solutions (e.g., external temporal-logic-based model checkers) in that it can detect payload attacks that violate realtime requirements of ICS operations and does not require any additional apparatus.« less
Empirical Evaluation of Conservative and Optimistic Discrete Event Execution on Cloud and VM Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoginath, Srikanth B; Perumalla, Kalyan S

2013-01-01

Virtual machine (VM) technologies, especially those offered via Cloud platforms, present new dimensions with respect to performance and cost in executing parallel discrete event simulation (PDES) applications. Due to the introduction of overall cost as a metric, the choice of the highest-end computing configuration is no longer the most economical one. Moreover, runtime dynamics unique to VM platforms introduce new performance characteristics, and the variety of possible VM configurations give rise to a range of choices for hosting a PDES run. Here, an empirical study of these issues is undertaken to guide an understanding of the dynamics, trends and trade-offsmore » in executing PDES on VM/Cloud platforms. Performance results and cost measures are obtained from actual execution of a range of scenarios in two PDES benchmark applications on the Amazon Cloud offerings and on a high-end VM host machine. The data reveals interesting insights into the new VM-PDES dynamics that come into play and also leads to counter-intuitive guidelines with respect to choosing the best and second-best configurations when overall cost of execution is considered. In particular, it is found that choosing the highest-end VM configuration guarantees neither the best runtime nor the least cost. Interestingly, choosing a (suitably scaled) low-end VM configuration provides the least overall cost without adversely affecting the total runtime.« less
Nonpreemptive run-time scheduling issues on a multitasked, multiprogrammed multiprocessor with dependencies, bidimensional tasks, folding and dynamic graphs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, Allan Ray

1987-05-01

Increases in high speed hardware have mandated studies in software techniques to exploit the parallel capabilities. This thesis examines the effects a run-time scheduler has on a multiprocessor. The model consists of directed, acyclic graphs, generated from serial FORTRAN benchmark programs by the parallel compiler Parafrase. A multitasked, multiprogrammed environment is created. Dependencies are generated by the compiler. Tasks are bidimensional, i.e., they may specify both time and processor requests. Processor requests may be folded into execution time by the scheduler. The graphs may arrive at arbitrary time intervals. The general case is NP-hard, thus, a variety of heuristics aremore » examined by a simulator. Multiprogramming demonstrates a greater need for a run-time scheduler than does monoprogramming for a variety of reasons, e.g., greater stress on the processors, a larger number of independent control paths, more variety in the task parameters, etc. The dynamic critical path series of algorithms perform well. Dynamic critical volume did not add much. Unfortunately, dynamic critical path maximizes turnaround time as well as throughput. Two schedulers are presented which balance throughput and turnaround time. The first requires classification of jobs by type; the second requires selection of a ratio value which is dependent upon system parameters. 45 refs., 19 figs., 20 tabs.« less
Faster search by lackadaisical quantum walk

NASA Astrophysics Data System (ADS)

Wong, Thomas G.

2018-03-01

In the typical model, a discrete-time coined quantum walk searching the 2D grid for a marked vertex achieves a success probability of O(1/log N) in O(√{N log N}) steps, which with amplitude amplification yields an overall runtime of O(√{N} log N). We show that making the quantum walk lackadaisical or lazy by adding a self-loop of weight 4 / N to each vertex speeds up the search, causing the success probability to reach a constant near 1 in O(√{N log N}) steps, thus yielding an O(√{log N}) improvement over the typical, loopless algorithm. This improved runtime matches the best known quantum algorithms for this search problem. Our results are based on numerical simulations since the algorithm is not an instance of the abstract search algorithm.
Ultrafast adiabatic quantum algorithm for the NP-complete exact cover problem

PubMed Central

Wang, Hefeng; Wu, Lian-Ao

2016-01-01

An adiabatic quantum algorithm may lose quantumness such as quantum coherence entirely in its long runtime, and consequently the expected quantum speedup of the algorithm does not show up. Here we present a general ultrafast adiabatic quantum algorithm. We show that by applying a sequence of fast random or regular signals during evolution, the runtime can be reduced substantially, whereas advantages of the adiabatic algorithm remain intact. We also propose a randomized Trotter formula and show that the driving Hamiltonian and the proposed sequence of fast signals can be implemented simultaneously. We illustrate the algorithm by solving the NP-complete 3-bit exact cover problem (EC3), where NP stands for nondeterministic polynomial time, and put forward an approach to implementing the problem with trapped ions. PMID:26923834
Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S; Seal, Sudip K

2010-01-01

The spatial scale, runtime speed and behavioral detail of epidemic outbreak simulations together require the use of large-scale parallel processing. In this paper, an optimistic parallel discrete event execution of a reaction-diffusion simulation model of epidemic outbreaks is presented, with an implementation over themore » $$\\mu$$sik simulator. Rollback support is achieved with the development of a novel reversible model that combines reverse computation with a small amount of incremental state saving. Parallel speedup and other runtime performance metrics of the simulation are tested on a small (8,192-core) Blue Gene / P system, while scalability is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes (up to several hundred million individuals in the largest case) are exercised.« less

The Basis System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dubois, P.F.

1989-05-16

This paper discusses the basis system. Basis is a program development system for scientific programs. It has been developed over the last five years at Lawrence Livermore National Laboratory (LLNL), where it is now used in about twenty major programming efforts. The Basis System includes two major components, a program development system and a run-time package. The run-time package provides the Basis Language interpreter, through which the user does input, output, plotting, and control of the program's subroutines and functions. Variables in the scientific packages are known to this interpreter, so that the user may arbitrarily print, plot, and calculatemore » with, any major program variables. Also provided are facilities for dynamic memory management, terminal logs, error recovery, text-file i/o, and the attachment of non-Basis-developed packages.« less
Real-Time MENTAT programming language and architecture

NASA Technical Reports Server (NTRS)

Grimshaw, Andrew S.; Silberman, Ami; Liu, Jane W. S.

1989-01-01

Real-time MENTAT, a programming environment designed to simplify the task of programming real-time applications in distributed and parallel environments, is described. It is based on the same data-driven computation model and object-oriented programming paradigm as MENTAT. It provides an easy-to-use mechanism to exploit parallelism, language constructs for the expression and enforcement of timing constraints, and run-time support for scheduling and exciting real-time programs. The real-time MENTAT programming language is an extended C++. The extensions are added to facilitate automatic detection of data flow and generation of data flow graphs, to express the timing constraints of individual granules of computation, and to provide scheduling directives for the runtime system. A high-level view of the real-time MENTAT system architecture and programming language constructs is provided.
Organisational Pattern Driven Recovery Mechanisms

NASA Astrophysics Data System (ADS)

Giacomo, Valentina Di; Presenza, Domenico; Riccucci, Carlo

The process of reaction to system failures and security attacks is strongly influenced by its infrastructural, procedural and organisational settings. Analysis of reaction procedures and practices from different domains (Air Traffic Management, Response to Computer Security Incident, Response to emergencies, recovery in Chemical Process Industry) highlight three key requirements for this activity: smooth collaboration and coordination among responders, accurate monitoring and management of resources and ability to adapt pre-established reaction plans to the actual context. The SERENITY Reaction Mechanisms (SRM) is the subsystem of the SERENITY Run-time Framework aimed to provide SERENITY aware AmI settings (i.e. socio-technical systems with highly distributed dynamic services) with functionalities to implement applications specific reaction strategies. The SRM uses SERENITY Organisational S&D Patterns as run-time models to drive these three key functionalities.
Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres

In this paper, we have developed a novel methodology that takes into consideration multithreaded many-core designs to better utilize memory/processing resources and improve memory residence on tileable applications. It takes advantage of polyhedral analysis and transformation in the form of PLUTO, combined with a highly optimized finegrain tile runtime to exploit parallelism at all levels. The main contributions of this paper include the introduction of multi-hierarchical tiling techniques that increases intra tile parallelism; and a data-flow inspired runtime library that allows the expression of parallel tiles with an efficient synchronization registry. Our current implementation shows performance improvements on an Intelmore » Xeon Phi board up to 32.25% against instances produced by state-of-the-art compiler frameworks for selected stencil applications.« less
Automated Run-Time Mission and Dialog Generation

DTIC Science & Technology

2007-03-01

Processing, Social Network Analysis, Simulation, Automated Scenario Generation 16. PRICE CODE 17. SECURITY CLASSIFICATION OF REPORT Unclassified...9 D. SOCIAL NETWORKS...13 B. MISSION AND DIALOG GENERATION.................................................13 C. SOCIAL NETWORKS
Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences.

PubMed

Rideout, Jai Ram; He, Yan; Navas-Molina, Jose A; Walters, William A; Ursell, Luke K; Gibbons, Sean M; Chase, John; McDonald, Daniel; Gonzalez, Antonio; Robbins-Pianka, Adam; Clemente, Jose C; Gilbert, Jack A; Huse, Susan M; Zhou, Hong-Wei; Knight, Rob; Caporaso, J Gregory

2014-01-01

We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to "classic" open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, "classic" open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of "classic" open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by "classic" open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME's uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME's OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.
Convergence Acceleration and Documentation of CFD Codes for Turbomachinery Applications

NASA Technical Reports Server (NTRS)

Marquart, Jed E.

2005-01-01

The development and analysis of turbomachinery components for industrial and aerospace applications has been greatly enhanced in recent years through the advent of computational fluid dynamics (CFD) codes and techniques. Although the use of this technology has greatly reduced the time required to perform analysis and design, there still remains much room for improvement in the process. In particular, there is a steep learning curve associated with most turbomachinery CFD codes, and the computation times need to be reduced in order to facilitate their integration into standard work processes. Two turbomachinery codes have recently been developed by Dr. Daniel Dorney (MSFC) and Dr. Douglas Sondak (Boston University). These codes are entitled Aardvark (for 2-D and quasi 3-D simulations) and Phantom (for 3-D simulations). The codes utilize the General Equation Set (GES), structured grid methodology, and overset O- and H-grids. The codes have been used with success by Drs. Dorney and Sondak, as well as others within the turbomachinery community, to analyze engine components and other geometries. One of the primary objectives of this study was to establish a set of parametric input values which will enhance convergence rates for steady state simulations, as well as reduce the runtime required for unsteady cases. The goal is to reduce the turnaround time for CFD simulations, thus permitting more design parametrics to be run within a given time period. In addition, other code enhancements to reduce runtimes were investigated and implemented. The other primary goal of the study was to develop enhanced users manuals for Aardvark and Phantom. These manuals are intended to answer most questions for new users, as well as provide valuable detailed information for the experienced user. The existence of detailed user s manuals will enable new users to become proficient with the codes, as well as reducing the dependency of new users on the code authors. In order to achieve the objectives listed, the following tasks were accomplished: 1) Parametric Study Of Preconditioning Parameters And Other Code Inputs; 2) Code Modifications To Reduce Runtimes; 3) Investigation Of Compiler Options To Reduce Code Runtime; and 4) Development/Enhancement of Users Manuals for Aardvark and Phantom
Technology for Space Station Evolution: the Data Management System

NASA Technical Reports Server (NTRS)

Abbott, L.

1990-01-01

Viewgraphs on the data management system (DMS) for the space station evolution are presented. Topics covered include DMS architecture and implementation approach; and an overview of the runtime object database.
MPI Runtime Error Detection with MUST: Advances in Deadlock Detection

DOE PAGES

Hilbrich, Tobias; Protze, Joachim; Schulz, Martin; ...

2013-01-01

The widely used Message Passing Interface (MPI) is complex and rich. As a result, application developers require automated tools to avoid and to detect MPI programming errors. We present the Marmot Umpire Scalable Tool (MUST) that detects such errors with significantly increased scalability. We present improvements to our graph-based deadlock detection approach for MPI, which cover future MPI extensions. Our enhancements also check complex MPI constructs that no previous graph-based detection approach handled correctly. Finally, we present optimizations for the processing of MPI operations that reduce runtime deadlock detection overheads. Existing approaches often require ( p ) analysis time permore » MPI operation, for p processes. We empirically observe that our improvements lead to sub-linear or better analysis time per operation for a wide range of real world applications.« less
Experiments with Test Case Generation and Runtime Analysis

NASA Technical Reports Server (NTRS)

Artho, Cyrille; Drusinsky, Doron; Goldberg, Allen; Havelund, Klaus; Lowry, Mike; Pasareanu, Corina; Rosu, Grigore; Visser, Willem; Koga, Dennis (Technical Monitor)

2003-01-01

Software testing is typically an ad hoc process where human testers manually write many test inputs and expected test results, perhaps automating their execution in a regression suite. This process is cumbersome and costly. This paper reports preliminary results on an approach to further automate this process. The approach consists of combining automated test case generation based on systematically exploring the program's input domain, with runtime analysis, where execution traces are monitored and verified against temporal logic specifications, or analyzed using advanced algorithms for detecting concurrency errors such as data races and deadlocks. The approach suggests to generate specifications dynamically per input instance rather than statically once-and-for-all. The paper describes experiments with variants of this approach in the context of two examples, a planetary rover controller and a space craft fault protection system.
A Core Plug and Play Architecture for Reusable Flight Software Systems

NASA Technical Reports Server (NTRS)

Wilmot, Jonathan

2006-01-01

The Flight Software Branch, at Goddard Space Flight Center (GSFC), has been working on a run-time approach to facilitate a formal software reuse process. The reuse process is designed to enable rapid development and integration of high-quality software systems and to more accurately predict development costs and schedule. Previous reuse practices have been somewhat successful when the same teams are moved from project to project. But this typically requires taking the software system in an all-or-nothing approach where useful components cannot be easily extracted from the whole. As a result, the system is less flexible and scalable with limited applicability to new projects. This paper will focus on the rationale behind, and implementation of the run-time executive. This executive is the core for the component-based flight software commonality and reuse process adopted at Goddard.
Estimating job runtime for CMS analysis jobs

NASA Astrophysics Data System (ADS)

Sfiligoi, I.

2014-06-01

The basic premise of pilot systems is to create an overlay scheduling system on top of leased resources. And by definition, leases have a limited lifetime, so any job that is scheduled on such resources must finish before the lease is over, or it will be killed and all the computation is wasted. In order to effectively schedule jobs to resources, the pilot system thus requires the expected runtime of the users' jobs. Past studies have shown that relying on user provided estimates is not a valid strategy, so the system should try to make an estimate by itself. This paper provides a study of the historical data obtained from the Compact Muon Solenoid (CMS) experiment's Analysis Operations submission system. Clear patterns are observed, suggesting that making prediction of an expected job lifetime range is achievable with high confidence level in this environment.
Runtime support for parallelizing data mining algorithms

NASA Astrophysics Data System (ADS)

Jin, Ruoming; Agrawal, Gagan

2002-03-01

With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.

PubMed

Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan

2017-01-01

Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.
BFL: a node and edge betweenness based fast layout algorithm for large scale networks

PubMed Central

Hashimoto, Tatsunori B; Nagasaki, Masao; Kojima, Kaname; Miyano, Satoru

2009-01-01

Background Network visualization would serve as a useful first step for analysis. However, current graph layout algorithms for biological pathways are insensitive to biologically important information, e.g. subcellular localization, biological node and graph attributes, or/and not available for large scale networks, e.g. more than 10000 elements. Results To overcome these problems, we propose the use of a biologically important graph metric, betweenness, a measure of network flow. This metric is highly correlated with many biological phenomena such as lethality and clusters. We devise a new fast parallel algorithm calculating betweenness to minimize the preprocessing cost. Using this metric, we also invent a node and edge betweenness based fast layout algorithm (BFL). BFL places the high-betweenness nodes to optimal positions and allows the low-betweenness nodes to reach suboptimal positions. Furthermore, BFL reduces the runtime by combining a sequential insertion algorim with betweenness. For a graph with n nodes, this approach reduces the expected runtime of the algorithm to O(n2) when considering edge crossings, and to O(n log n) when considering only density and edge lengths. Conclusion Our BFL algorithm is compared against fast graph layout algorithms and approaches requiring intensive optimizations. For gene networks, we show that our algorithm is faster than all layout algorithms tested while providing readability on par with intensive optimization algorithms. We achieve a 1.4 second runtime for a graph with 4000 nodes and 12000 edges on a standard desktop computer. PMID:19146673
Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale

PubMed Central

Huang, Muhuan; Wu, Di; Yu, Cody Hao; Fang, Zhenman; Interlandi, Matteo; Condie, Tyson; Cong, Jason

2017-01-01

With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microsoft’s FPGA deployment in its Bing search engine and Intel’s 16.7 billion acquisition of Altera, integrating FPGAs into datacenters is considered one of the most promising approaches to sustain future datacenter growth. However, it is quite challenging for existing big data computing systems—like Apache Spark and Hadoop—to access the performance and energy benefits of FPGA accelerators. In this paper we design and implement Blaze to provide programming and runtime support for enabling easy and efficient deployments of FPGA accelerators in datacenters. In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators. Our Blaze runtime implements an FaaS framework to efficiently share FPGA accelerators among multiple heterogeneous threads on a single node, and extends Hadoop YARN with accelerator-centric scheduling to efficiently share them among multiple computing tasks in the cluster. Experimental results using four representative big data applications demonstrate that Blaze greatly reduces the programming efforts to access FPGA accelerators in systems like Apache Spark and YARN, and improves the system throughput by 1.7 × to 3× (and energy efficiency by 1.5× to 2.7×) compared to a conventional CPU-only cluster. PMID:28317049
Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale.

PubMed

Huang, Muhuan; Wu, Di; Yu, Cody Hao; Fang, Zhenman; Interlandi, Matteo; Condie, Tyson; Cong, Jason

2016-10-01

With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microsoft's FPGA deployment in its Bing search engine and Intel's 16.7 billion acquisition of Altera, integrating FPGAs into datacenters is considered one of the most promising approaches to sustain future datacenter growth. However, it is quite challenging for existing big data computing systems-like Apache Spark and Hadoop-to access the performance and energy benefits of FPGA accelerators. In this paper we design and implement Blaze to provide programming and runtime support for enabling easy and efficient deployments of FPGA accelerators in datacenters. In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators. Our Blaze runtime implements an FaaS framework to efficiently share FPGA accelerators among multiple heterogeneous threads on a single node, and extends Hadoop YARN with accelerator-centric scheduling to efficiently share them among multiple computing tasks in the cluster. Experimental results using four representative big data applications demonstrate that Blaze greatly reduces the programming efforts to access FPGA accelerators in systems like Apache Spark and YARN, and improves the system throughput by 1.7 × to 3× (and energy efficiency by 1.5× to 2.7×) compared to a conventional CPU-only cluster.
Semantic Web Infrastructure Supporting NextFrAMES Modeling Platform

NASA Astrophysics Data System (ADS)

Lakhankar, T.; Fekete, B. M.; Vörösmarty, C. J.

2008-12-01

Emerging modeling frameworks offer new ways to modelers to develop model applications by offering a wide range of software components to handle common modeling tasks such as managing space and time, distributing computational tasks in parallel processing environment, performing input/output and providing diagnostic facilities. NextFrAMES, the next generation updates to the Framework for Aquatic Modeling of the Earth System originally developed at University of New Hampshire and currently hosted at The City College of New York takes a step further by hiding most of these services from modeler behind a platform agnostic modeling platform that allows scientists to focus on the implementation of scientific concepts in the form of a new modeling markup language and through a minimalist application programming interface that provide means to implement model processes. At the core of the NextFrAMES modeling platform there is a run-time engine that interprets the modeling markup language loads the module plugins establishes the model I/O and executes the model defined by the modeling XML and the accompanying plugins. The current implementation of the run-time engine is designed for single processor or symmetric multi processing (SMP) systems but future implementation of the run-time engine optimized for different hardware architectures are anticipated. The modeling XML and the accompanying plugins define the model structure and the computational processes in a highly abstract manner, which is not only suitable for the run-time engine, but has the potential to integrate into semantic web infrastructure, where intelligent parsers can extract information about the model configurations such as input/output requirements applicable space and time scales and underlying modeling processes. The NextFrAMES run-time engine itself is also designed to tap into web enabled data services directly, therefore it can be incorporated into complex workflow to implement End-to-End application from observation to the delivery of highly aggregated information. Our presentation will discuss the web services ranging from OpenDAP and WaterOneFlow data services to metadata provided through catalog services that could serve NextFrAMES modeling applications. We will also discuss the support infrastructure needed to streamline the integration of NextFrAMES into an End-to-End application to deliver highly processed information to end users. The End-to-End application will be demonstrated through examples from the State-of-the Global Water System effort that builds on data services provided through WMO's Global Terrestrial Network for Hydrology to deliver water resources related information to policy makers for better water management. Key components of this E2E system are promoted as Community of Practice examples for the Global Observing System of Systems therefore the State-of-the Global Water System can be viewed as test case for the interoperability of the incorporated web service components.
PC graphics generation and management tool for real-time applications

NASA Technical Reports Server (NTRS)

Truong, Long V.

1992-01-01

A graphics tool was designed and developed for easy generation and management of personal computer graphics. It also provides methods and 'run-time' software for many common artificial intelligence (AI) or expert system (ES) applications.
Distributed memory compiler methods for irregular problems: Data copy reuse and runtime partitioning

NASA Technical Reports Server (NTRS)

Das, Raja; Ponnusamy, Ravi; Saltz, Joel; Mavriplis, Dimitri

1991-01-01

Outlined here are two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on iPSC/860 to demonstrate the usefulness of our methods.

Solving large-scale fixed cost integer linear programming models for grid-based location problems with heuristic techniques

NASA Astrophysics Data System (ADS)

Noor-E-Alam, Md.; Doucette, John

2015-08-01

Grid-based location problems (GBLPs) can be used to solve location problems in business, engineering, resource exploitation, and even in the field of medical sciences. To solve these decision problems, an integer linear programming (ILP) model is designed and developed to provide the optimal solution for GBLPs considering fixed cost criteria. Preliminary results show that the ILP model is efficient in solving small to moderate-sized problems. However, this ILP model becomes intractable in solving large-scale instances. Therefore, a decomposition heuristic is proposed to solve these large-scale GBLPs, which demonstrates significant reduction of solution runtimes. To benchmark the proposed heuristic, results are compared with the exact solution via ILP. The experimental results show that the proposed method significantly outperforms the exact method in runtime with minimal (and in most cases, no) loss of optimality.
Pybus -- A Python Software Bus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lavrijsen, Wim T.L.P.

2004-10-14

A software bus, just like its hardware equivalent, allows for the discovery, installation, configuration, loading, unloading, and run-time replacement of software components, as well as channeling of inter-component communication. Python, a popular open-source programming language, encourages a modular design on software written in it, but it offers little or no component functionality. However, the language and its interpreter provide sufficient hooks to implement a thin, integral layer of component support. This functionality can be presented to the developer in the form of a module, making it very easy to use. This paper describes a Pythonmodule, PyBus, with which the conceptmore » of a ''software bus'' can be realized in Python. It demonstrates, within the context of the ATLAS software framework Athena, how PyBus can be used for the installation and (run-time) configuration of software, not necessarily Python modules, from a Python application in a way that is transparent to the end-user.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Janjusic, Tommy; Kartsaklis, Christos

Application analysis is facilitated through a number of program profiling tools. The tools vary in their complexity, ease of deployment, design, and profiling detail. Specifically, understand- ing, analyzing, and optimizing is of particular importance for scientific applications where minor changes in code paths and data-structure layout can have profound effects. Understanding how intricate data-structures are accessed and how a given memory system responds is a complex task. In this paper we describe a trace profiling tool, Glprof, specifically aimed to lessen the burden of the programmer to pin-point heavily involved data-structures during an application's run-time, and understand data-structure run-time usage.more » Moreover, we showcase the tool's modularity using additional cache simulation components. We elaborate on the tool's design, and features. Finally we demonstrate the application of our tool in the context of Spec bench- marks using the Glprof profiler and two concurrently running cache simulators, PPC440 and AMD Interlagos.« less
R2U2: Monitoring and Diagnosis of Security Threats for Unmanned Aerial Systems

NASA Technical Reports Server (NTRS)

Schumann, Johann; Moosbruger, Patrick; Rozier, Kristin Y.

2015-01-01

We present R2U2, a novel framework for runtime monitoring of security properties and diagnosing of security threats on-board Unmanned Aerial Systems (UAS). R2U2, implemented in FPGA hardware, is a real-time, REALIZABLE, RESPONSIVE, UNOBTRUSIVE Unit for security threat detection. R2U2 is designed to continuously monitor inputs from the GPS and the ground control station, sensor readings, actuator outputs, and flight software status. By simultaneously monitoring and performing statistical reasoning, attack patterns and post-attack discrepancies in the UAS behavior can be detected. R2U2 uses runtime observer pairs for linear and metric temporal logics for property monitoring and Bayesian networks for diagnosis of security threats. We discuss the design and implementation that now enables R2U2 to handle security threats and present simulation results of several attack scenarios on the NASA DragonEye UAS.
Methodology for fast detection of false sharing in threaded scientific codes

DOEpatents

Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang

2014-11-25

A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
Particle Laden Turbulence in a Radiation Environment Using a Portable High Preformace Solver Based on the Legion Runtime System

NASA Astrophysics Data System (ADS)

Torres, Hilario; Iaccarino, Gianluca

2017-11-01

Soleil-X is a multi-physics solver being developed at Stanford University as a part of the Predictive Science Academic Alliance Program II. Our goal is to conduct high fidelity simulations of particle laden turbulent flows in a radiation environment for solar energy receiver applications as well as to demonstrate our readiness to effectively utilize next generation Exascale machines. The novel aspect of Soleil-X is that it is built upon the Legion runtime system to enable easy portability to different parallel distributed heterogeneous architectures while also being written entirely in high-level/high-productivity languages (Ebb and Regent). An overview of the Soleil-X software architecture will be given. Results from coupled fluid flow, Lagrangian point particle tracking, and thermal radiation simulations will be presented. Performance diagnostic tools and metrics corresponding the the same cases will also be discussed. US Department of Energy, National Nuclear Security Administration.
FLASH Interface; a GUI for managing runtime parameters in FLASH simulations

NASA Astrophysics Data System (ADS)

Walker, Christopher; Tzeferacos, Petros; Weide, Klaus; Lamb, Donald; Flocke, Norbert; Feister, Scott

2017-10-01

We present FLASH Interface, a novel graphical user interface (GUI) for managing runtime parameters in simulations performed with the FLASH code. FLASH Interface supports full text search of available parameters; provides descriptions of each parameter's role and function; allows for the filtering of parameters based on categories; performs input validation; and maintains all comments and non-parameter information already present in existing parameter files. The GUI can be used to edit existing parameter files or generate new ones. FLASH Interface is open source and was implemented with the Electron framework, making it available on Mac OSX, Windows, and Linux operating systems. The new interface lowers the entry barrier for new FLASH users and provides an easy-to-use tool for experienced FLASH simulators. U.S. Department of Energy (DOE), NNSA ASC/Alliances Center for Astrophysical Thermonuclear Flashes, U.S. DOE NNSA ASC through the Argonne Institute for Computing in Science, U.S. National Science Foundation.
A Mediator-Based Approach to Resolving Interface Heterogeneity of Web Services

NASA Astrophysics Data System (ADS)

Leitner, Philipp; Rosenberg, Florian; Michlmayr, Anton; Huber, Andreas; Dustdar, Schahram

In theory, service-oriented architectures are based on the idea of increasing flexibility in the selection of internal and external business partners using loosely-coupled services. However, in practice this flexibility is limited by the fact that partners need not only to provide the same service, but to do so via virtually the same interface in order to actually be interchangeable easily. Invocation-level mediation may be used to overcome this issue — by using mediation interface differences can be resolved transparently at runtime. In this chapter we discuss the basic ideas of mediation, with a focus on interface-level mediation. We show how interface mediation is integrated into our dynamic Web service invocation framework DAIOS, and present three different mediation strategies, one based on structural message similarity, one based on semantically annotated WSDL, and one which is embedded into the VRESCo SOA runtime, a larger research project with explicit support for service mediation.
Case Study: Mobile Photovoltaic System at Bechler Meadows Ranger Station, Yellowstone National Park (Brochure)

DOE Office of Scientific and Technical Information (OSTI.GOV)

None, None

The mobile PV/generator hybrid system deployed at Bechler Meadows provides a number of advantages. It reduces on-site air emissions from the generator. Batteries allow the generator to operate only at its rated power, reducing run-time and fuel consumption. Energy provided by the solar array reduces fuel consumption and run-time of the generator. The generator is off for most hours providing peace and quiet at the site. Maintenance trips from Mammoth Hot Springs to the remote site are reduced. The frequency of intrusive fuel deliveries to the pristine site is reduced. And the system gives rangers a chance to interpret Greenmore » Park values to the visiting public. As an added bonus, the system provides all these benefits at a lower cost than the basecase of using only a propane-fueled generator, reducing life cycle cost by about 26%.« less
Case Study: Mobile Photovoltaic System at Bechler Meadows Ranger Station, Yellowstone National Park

DOE Office of Scientific and Technical Information (OSTI.GOV)

Andy Walker

The mobile PV/generator hybrid system deployed at Bechler Meadows provides a number of advantages. It reduces on-site air emissions from the generator. Batteries allow the generator to operate only at its rated power, reducing run-time and fuel consumption. Energy provided by the solar array reduces fuel consumption and run-time of the generator. The generator is off for most hours providing peace and quiet at the site. Maintenance trips from Mammoth Hot Springs to the remote site are reduced. The frequency of intrusive fuel deliveries to the pristine site is reduced. And the system gives rangers a chance to interpret Greenmore » Park values to the visiting public. As an added bonus, the system provides all these benefits at a lower cost than the basecase of using only a propane-fueled generator, reducing life cycle cost by about 26%.« less
Declarative language design for interactive visualization.

PubMed

Heer, Jeffrey; Bostock, Michael

2010-01-01

We investigate the design of declarative, domain-specific languages for constructing interactive visualizations. By separating specification from execution, declarative languages can simplify development, enable unobtrusive optimization, and support retargeting across platforms. We describe the design of the Protovis specification language and its implementation within an object-oriented, statically-typed programming language (Java). We demonstrate how to support rich visualizations without requiring a toolkit-specific data model and extend Protovis to enable declarative specification of animated transitions. To support cross-platform deployment, we introduce rendering and event-handling infrastructures decoupled from the runtime platform, letting designers retarget visualization specifications (e.g., from desktop to mobile phone) with reduced effort. We also explore optimizations such as runtime compilation of visualization specifications, parallelized execution, and hardware-accelerated rendering. We present benchmark studies measuring the performance gains provided by these optimizations and compare performance to existing Java-based visualization tools, demonstrating scalability improvements exceeding an order of magnitude.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Luszczek, Piotr R; Tomov, Stanimire Z; Dongarra, Jack J

We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments. The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA). Examples are given as the basis for solving linear systems' algorithms - the LU, QR, and Cholesky factorizations. To generate the extreme level of parallelism needed for the efficient use of coprocessors, algorithms of interest are redesigned and then split into well-chosen computational tasks. The tasks execution is scheduled over the computational components of a hybrid system of multi-core CPUs andmore » coprocessors using a light-weight runtime system. The use of lightweight runtime systems keeps scheduling overhead low, while enabling the expression of parallelism through otherwise sequential code. This simplifies the development efforts and allows the exploration of the unique strengths of the various hardware components.« less
Shape prior modeling using sparse representation and online dictionary learning.

PubMed

Zhang, Shaoting; Zhan, Yiqiang; Zhou, Yan; Uzunbas, Mustafa; Metaxas, Dimitris N

2012-01-01

The recently proposed sparse shape composition (SSC) opens a new avenue for shape prior modeling. Instead of assuming any parametric model of shape statistics, SSC incorporates shape priors on-the-fly by approximating a shape instance (usually derived from appearance cues) by a sparse combination of shapes in a training repository. Theoretically, one can increase the modeling capability of SSC by including as many training shapes in the repository. However, this strategy confronts two limitations in practice. First, since SSC involves an iterative sparse optimization at run-time, the more shape instances contained in the repository, the less run-time efficiency SSC has. Therefore, a compact and informative shape dictionary is preferred to a large shape repository. Second, in medical imaging applications, training shapes seldom come in one batch. It is very time consuming and sometimes infeasible to reconstruct the shape dictionary every time new training shapes appear. In this paper, we propose an online learning method to address these two limitations. Our method starts from constructing an initial shape dictionary using the K-SVD algorithm. When new training shapes come, instead of re-constructing the dictionary from the ground up, we update the existing one using a block-coordinates descent approach. Using the dynamically updated dictionary, sparse shape composition can be gracefully scaled up to model shape priors from a large number of training shapes without sacrificing run-time efficiency. Our method is validated on lung localization in X-Ray and cardiac segmentation in MRI time series. Compared to the original SSC, it shows comparable performance while being significantly more efficient.
Dynamic analysis methods for detecting anomalies in asynchronously interacting systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kumar, Akshat; Solis, John Hector; Matschke, Benjamin

2014-01-01

Detecting modifications to digital system designs, whether malicious or benign, is problematic due to the complexity of the systems being analyzed. Moreover, static analysis techniques and tools can only be used during the initial design and implementation phases to verify safety and liveness properties. It is computationally intractable to guarantee that any previously verified properties still hold after a system, or even a single component, has been produced by a third-party manufacturer. In this paper we explore new approaches for creating a robust system design by investigating highly-structured computational models that simplify verification and analysis. Our approach avoids the needmore » to fully reconstruct the implemented system by incorporating a small verification component that dynamically detects for deviations from the design specification at run-time. The first approach encodes information extracted from the original system design algebraically into a verification component. During run-time this component randomly queries the implementation for trace information and verifies that no design-level properties have been violated. If any deviation is detected then a pre-specified fail-safe or notification behavior is triggered. Our second approach utilizes a partitioning methodology to view liveness and safety properties as a distributed decision task and the implementation as a proposed protocol that solves this task. Thus the problem of verifying safety and liveness properties is translated to that of verifying that the implementation solves the associated decision task. We develop upon results from distributed systems and algebraic topology to construct a learning mechanism for verifying safety and liveness properties from samples of run-time executions.« less
Decaf: Decoupled Dataflows for In Situ High-Performance Workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dreher, M.; Peterka, T.

Decaf is a dataflow system for the parallel communication of coupled tasks in an HPC workflow. The dataflow can perform arbitrary data transformations ranging from simply forwarding data to complex data redistribution. Decaf does this by allowing the user to allocate resources and execute custom code in the dataflow. All communication through the dataflow is efficient parallel message passing over MPI. The runtime for calling tasks is entirely message-driven; Decaf executes a task when all messages for the task have been received. Such a messagedriven runtime allows cyclic task dependencies in the workflow graph, for example, to enact computational steeringmore » based on the result of downstream tasks. Decaf includes a simple Python API for describing the workflow graph. This allows Decaf to stand alone as a complete workflow system, but Decaf can also be used as the dataflow layer by one or more other workflow systems to form a heterogeneous task-based computing environment. In one experiment, we couple a molecular dynamics code with a visualization tool using the FlowVR and Damaris workflow systems and Decaf for the dataflow. In another experiment, we test the coupling of a cosmology code with Voronoi tessellation and density estimation codes using MPI for the simulation, the DIY programming model for the two analysis codes, and Decaf for the dataflow. Such workflows consisting of heterogeneous software infrastructures exist because components are developed separately with different programming models and runtimes, and this is the first time that such heterogeneous coupling of diverse components was demonstrated in situ on HPC systems.« less
A History-based Estimation for LHCb job requirements

NASA Astrophysics Data System (ADS)

Rauschmayr, Nathalie

2015-12-01

The main goal of a Workload Management System (WMS) is to find and allocate resources for the given tasks. The more and better job information the WMS receives, the easier will be to accomplish its task, which directly translates into higher utilization of resources. Traditionally, the information associated with each job, like expected runtime, is defined beforehand by the Production Manager in best case and fixed arbitrary values by default. In the case of LHCb's Workload Management System no mechanisms are provided which automate the estimation of job requirements. As a result, much more CPU time is normally requested than actually needed. Particularly, in the context of multicore jobs this presents a major problem, since single- and multicore jobs shall share the same resources. Consequently, grid sites need to rely on estimations given by the VOs in order to not decrease the utilization of their worker nodes when making multicore job slots available. The main reason for going to multicore jobs is the reduction of the overall memory footprint. Therefore, it also needs to be studied how memory consumption of jobs can be estimated. A detailed workload analysis of past LHCb jobs is presented. It includes a study of job features and their correlation with runtime and memory consumption. Following the features, a supervised learning algorithm is developed based on a history based prediction. The aim is to learn over time how jobs’ runtime and memory evolve influenced due to changes in experiment conditions and software versions. It will be shown that estimation can be notably improved if experiment conditions are taken into account.
Petascale Simulation Initiative Tech Base: FY2007 Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

May, J; Chen, R; Jefferson, D

The Petascale Simulation Initiative began as an LDRD project in the middle of Fiscal Year 2004. The goal of the project was to develop techniques to allow large-scale scientific simulation applications to better exploit the massive parallelism that will come with computers running at petaflops per second. One of the major products of this work was the design and prototype implementation of a programming model and a runtime system that lets applications extend data-parallel applications to use task parallelism. By adopting task parallelism, applications can use processing resources more flexibly, exploit multiple forms of parallelism, and support more sophisticated multiscalemore » and multiphysics models. Our programming model was originally called the Symponents Architecture but is now known as Cooperative Parallelism, and the runtime software that supports it is called Coop. (However, we sometimes refer to the programming model as Coop for brevity.) We have documented the programming model and runtime system in a submitted conference paper [1]. This report focuses on the specific accomplishments of the Cooperative Parallelism project (as we now call it) under Tech Base funding in FY2007. Development and implementation of the model under LDRD funding alone proceeded to the point of demonstrating a large-scale materials modeling application using Coop on more than 1300 processors by the end of FY2006. Beginning in FY2007, the project received funding from both LDRD and the Computation Directorate Tech Base program. Later in the year, after the three-year term of the LDRD funding ended, the ASC program supported the project with additional funds. The goal of the Tech Base effort was to bring Coop from a prototype to a production-ready system that a variety of LLNL users could work with. Specifically, the major tasks that we planned for the project were: (1) Port SARS [former name of the Coop runtime system] to another LLNL platform, probably Thunder or Peloton (depending on when Peloton becomes available); (2) Improve SARS's robustness and ease-of-use, and develop user documentation; and (3) Work with LLNL code teams to help them determine how Symponents could benefit their applications. The original funding request was $296,000 for the year, and we eventually received $252,000. The remainder of this report describes our efforts and accomplishments for each of the goals listed above.« less
Evaluation of bus transit reliability in the District of Columbia.

DOT National Transportation Integrated Search

2013-11-01

Several performance metrics can be used to assess the reliability of a transit system. These include on-time arrivals, travel-time : adherence, run-time adherence, and customer satisfaction, among others. On-time arrival at bus stops is one of the pe...
Static Verification for Code Contracts

NASA Astrophysics Data System (ADS)

Fähndrich, Manuel

The Code Contracts project [3] at Microsoft Research enables programmers on the .NET platform to author specifications in existing languages such as C# and VisualBasic. To take advantage of these specifications, we provide tools for documentation generation, runtime contract checking, and static contract verification.
Dynamic Assembly, Assessment, Assurance, and Adaptation via Heterogeneous Software Connectors

DTIC Science & Technology

2004-10-01

Versioning Connectors (MVC) Representative of runtime monitoring gauges are multiversioning gauges, which monitor and analyze different versions of...multiple versions of the same component must be merged by the connector before they are forwarded to their target components. The multiversioning

Authoritative Authoring: Software That Makes Multimedia Happen.

ERIC Educational Resources Information Center

Florio, Chris; Murie, Michael

1996-01-01

Compares seven mid- to high-end multimedia authoring software systems that combine graphics, sound, animation, video, and text for Windows and Macintosh platforms. A run-time project was created with each program using video, animation, graphics, sound, formatted text, hypertext, and buttons. (LRW)
Updates to In-Line Calculation of Photolysis Rates

EPA Science Inventory

How photolysis rates are calculated affects ozone and aerosol concentrations predicted by the CMAQ model and the model?s run-time. The standard configuration of CMAQ uses the inline option that calculates photolysis rates by solving the radiative transfer equation for the needed ...
DOE Office of Scientific and Technical Information (OSTI.GOV)

Marquez, Andres; Manzano Franco, Joseph B.; Song, Shuaiwen

With Exascale performance and its challenges in mind, one ubiquitous concern among architects is energy efficiency. Petascale systems projected to Exascale systems are unsustainable at current power consumption rates. One major contributor to system-wide power consumption is the number of memory operations leading to data movement and management techniques applied by the runtime system. To address this problem, we present the concept of the Architected Composite Data Types (ACDT) framework. The framework is made aware of data composites, assigning them a specific layout, transformations and operators. Data manipulation overhead is amortized over a larger number of elements and program performancemore » and power efficiency can be significantly improved. We developed the fundamentals of an ACDT framework on a massively multithreaded adaptive runtime system geared towards Exascale clusters. Showcasing the capability of ACDT, we exercised the framework with two representative processing kernels - Matrix Vector Multiply and the Cholesky Decomposition – applied to sparse matrices. As transformation modules, we applied optimized compress/decompress engines and configured invariant operators for maximum energy/performance efficiency. Additionally, we explored two different approaches based on transformation opaqueness in relation to the application. Under the first approach, the application is agnostic to compression and decompression activity. Such approach entails minimal changes to the original application code, but leaves out potential applicationspecific optimizations. The second approach exposes the decompression process to the application, hereby exposing optimization opportunities that can only be exploited with application knowledge. The experimental results show that the two approaches have their strengths in HW and SW respectively, where the SW approach can yield performance and power improvements that are an order of magnitude better than ACDT-oblivious, hand-optimized implementations.We consider the ACDT runtime framework an important component of compute nodes that will lead towards power efficient Exascale clusters.« less
A Case for Application Oblivious Energy-Efficient MPI Runtime

DOE Office of Scientific and Technical Information (OSTI.GOV)

Venkatesh, Akshay; Vishnu, Abhinav; Hamidouche, Khaled

Power has become the major impediment in designing large scale high-end systems. Message Passing Interface (MPI) is the {\\em de facto} communication interface used as the back-end for designing applications, programming models and runtime for these systems. Slack --- the time spent by an MPI process in a single MPI call --- provides a potential for energy and power savings, if an appropriate power reduction technique such as core-idling/Dynamic Voltage and Frequency Scaling (DVFS) can be applied without perturbing application's execution time. Existing techniques that exploit slack for power savings assume that application behavior repeats across iterations/executions. However, an increasingmore » use of adaptive, data-dependent workloads combined with system factors (OS noise, congestion) makes this assumption invalid. This paper proposes and implements Energy Aware MPI (EAM) --- an application-oblivious energy-efficient MPI runtime. EAM uses a combination of communication models of common MPI primitives (point-to-point, collective, progress, blocking/non-blocking) and an online observation of slack for maximizing energy efficiency. Each power lever incurs time overhead, which must be amortized over slack to minimize degradation. When predicted communication time exceeds a lever overhead, the lever is used {\\em as soon as possible} --- to maximize energy efficiency. When mis-prediction occurs, the lever(s) are used automatically at specific intervals for amortization. We implement EAM using MVAPICH2 and evaluate it on ten applications using up to 4096 processes. Our performance evaluation on an InfiniBand cluster indicates that EAM can reduce energy consumption by 5--41\\% in comparison to the default approach, with negligible (less than 4\\% in all cases) performance loss.« less
Library API for Z-Order Memory Layout

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bethel, E. Wes

This library provides a simple-to-use API for implementing an altnerative to traditional row-major order in-memory layout, one based on a Morton- order space filling curve (SFC) , specifically, a Z-order variant of the Morton order curve. The library enables programmers to, after a simple initialization step, to convert a multidimensional array from row-major to Z- order layouts, then use a single, generic API call to access data from any arbitrary (i,j,k) location from within the array, whether it it be stored in row- major or z-order format. The motivation for using a SFC in-memory layout is for improved spatial locality,more » which results in increased use of local high speed cache memory. The basic idea is that with row-major order layouts, a data access to some location that is nearby in index space is likely far away in physical memory, resulting in poor spatial locality and slow runtime. On the other hand, with a SFC-based layout, accesses that are nearby in index space are much more likely to also be nearby in physical memory, resulting in much better spatial locality, and better runtime performance. Numerous studies over the years have shown significant runtime performance gains are realized by using a SFC-based memory layout compared to a row-major layout, sometimes by as much as 50%, which result from the better use of the memory and cache hierarchy that are attendant with a SFC-based layout (see, for example, [Beth2012]). This library implementation is intended for use with codes that work with structured, array-based data in 2 or 3 dimensions. It is not appropriate for use with unstructured or point-based data.« less
Interactive Scripting for Analysis and Visualization of Arbitrarily Large, Disparately Located Climate Data Ensembles Using a Progressive Runtime Server

NASA Astrophysics Data System (ADS)

Christensen, C.; Summa, B.; Scorzelli, G.; Lee, J. W.; Venkat, A.; Bremer, P. T.; Pascucci, V.

2017-12-01

Massive datasets are becoming more common due to increasingly detailed simulations and higher resolution acquisition devices. Yet accessing and processing these huge data collections for scientific analysis is still a significant challenge. Solutions that rely on extensive data transfers are increasingly untenable and often impossible due to lack of sufficient storage at the client side as well as insufficient bandwidth to conduct such large transfers, that in some cases could entail petabytes of data. Large-scale remote computing resources can be useful, but utilizing such systems typically entails some form of offline batch processing with long delays, data replications, and substantial cost for any mistakes. Both types of workflows can severely limit the flexible exploration and rapid evaluation of new hypotheses that are crucial to the scientific process and thereby impede scientific discovery. In order to facilitate interactivity in both analysis and visualization of these massive data ensembles, we introduce a dynamic runtime system suitable for progressive computation and interactive visualization of arbitrarily large, disparately located spatiotemporal datasets. Our system includes an embedded domain-specific language (EDSL) that allows users to express a wide range of data analysis operations in a simple and abstract manner. The underlying runtime system transparently resolves issues such as remote data access and resampling while at the same time maintaining interactivity through progressive and interruptible processing. Computations involving large amounts of data can be performed remotely in an incremental fashion that dramatically reduces data movement, while the client receives updates progressively thereby remaining robust to fluctuating network latency or limited bandwidth. This system facilitates interactive, incremental analysis and visualization of massive remote datasets up to petabytes in size. Our system is now available for general use in the community through both docker and anaconda.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sadayappan, Ponnuswamy

Exascale computing systems will provide a thousand-fold increase in parallelism and a proportional increase in failure rate relative to today's machines. Systems software for exascale machines must provide the infrastructure to support existing applications while simultaneously enabling efficient execution of new programming models that naturally express dynamic, adaptive, irregular computation; coupled simulations; and massive data analysis in a highly unreliable hardware environment with billions of threads of execution. We propose a new approach to the data and work distribution model provided by system software based on the unifying formalism of an abstract file system. The proposed hierarchical data model providesmore » simple, familiar visibility and access to data structures through the file system hierarchy, while providing fault tolerance through selective redundancy. The hierarchical task model features work queues whose form and organization are represented as file system objects. Data and work are both first class entities. By exposing the relationships between data and work to the runtime system, information is available to optimize execution time and provide fault tolerance. The data distribution scheme provides replication (where desirable and possible) for fault tolerance and efficiency, and it is hierarchical to make it possible to take advantage of locality. The user, tools, and applications, including legacy applications, can interface with the data, work queues, and one another through the abstract file model. This runtime environment will provide multiple interfaces to support traditional Message Passing Interface applications, languages developed under DARPA's High Productivity Computing Systems program, as well as other, experimental programming models. We will validate our runtime system with pilot codes on existing platforms and will use simulation to validate for exascale-class platforms. In this final report, we summarize research results from the work done at the Ohio State University towards the larger goals of the project listed above.« less
Generation of large scale urban environments to support advanced sensor and seeker simulation

NASA Astrophysics Data System (ADS)

Giuliani, Joseph; Hershey, Daniel; McKeown, David, Jr.; Willis, Carla; Van, Tan

2009-05-01

One of the key aspects for the design of a next generation weapon system is the need to operate in cluttered and complex urban environments. Simulation systems rely on accurate representation of these environments and require automated software tools to construct the underlying 3D geometry and associated spectral and material properties that are then formatted for various objective seeker simulation systems. Under an Air Force Small Business Innovative Research (SBIR) contract, we have developed an automated process to generate 3D urban environments with user defined properties. These environments can be composed from a wide variety of source materials, including vector source data, pre-existing 3D models, and digital elevation models, and rapidly organized into a geo-specific visual simulation database. This intermediate representation can be easily inspected in the visible spectrum for content and organization and interactively queried for accuracy. Once the database contains the required contents, it can then be exported into specific synthetic scene generation runtime formats, preserving the relationship between geometry and material properties. To date an exporter for the Irma simulation system developed and maintained by AFRL/Eglin has been created and a second exporter to Real Time Composite Hardbody and Missile Plume (CHAMP) simulation system for real-time use is currently being developed. This process supports significantly more complex target environments than previous approaches to database generation. In this paper we describe the capabilities for content creation for advanced seeker processing algorithms simulation and sensor stimulation, including the overall database compilation process and sample databases produced and exported for the Irma runtime system. We also discuss the addition of object dynamics and viewer dynamics within the visual simulation into the Irma runtime environment.
High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms

PubMed Central

Teodoro, George; Pan, Tony; Kurc, Tahsin M.; Kong, Jun; Cooper, Lee A. D.; Podhorszki, Norbert; Klasky, Scott; Saltz, Joel H.

2014-01-01

Analysis of large pathology image datasets offers significant opportunities for the investigation of disease morphology, but the resource requirements of analysis pipelines limit the scale of such studies. Motivated by a brain cancer study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high resolution pathology tissue images on distributed CPU-GPU platforms. To achieve efficient execution on these hybrid systems, we have built runtime support that allows us to express the cancer image analysis application as a hierarchical data processing pipeline. The application is implemented as a coarse-grain pipeline of stages, where each stage may be further partitioned into another pipeline of fine-grain operations. The fine-grain operations are efficiently managed and scheduled for computation on CPUs and GPUs using performance aware scheduling techniques along with several optimizations, including architecture aware process placement, data locality conscious task assignment, data prefetching, and asynchronous data copy. These optimizations are employed to maximize the utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. Our experimental evaluation shows that the cooperative use of CPUs and GPUs achieves significant improvements on top of GPU-only versions (up to 1.6×) and that the execution of the application as a set of fine-grain operations provides more opportunities for runtime optimizations and attains better performance than coarser-grain, monolithic implementations used in other works. An implementation of the cancer image analysis pipeline using the runtime support was able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles (about 1.8TB uncompressed) in less than 4 minutes (150 tiles/second) on 100 nodes of a state-of-the-art hybrid cluster system. PMID:25419546
A Formal Methodology to Design and Deploy Dependable Wireless Sensor Networks

PubMed Central

Testa, Alessandro; Cinque, Marcello; Coronato, Antonio; Augusto, Juan Carlos

2016-01-01

Wireless Sensor Networks (WSNs) are being increasingly adopted in critical applications, where verifying the correct operation of sensor nodes is a major concern. Undesired events may undermine the mission of the WSNs. Hence, their effects need to be properly assessed before deployment, to obtain a good level of expected performance; and during the operation, in order to avoid dangerous unexpected results. In this paper, we propose a methodology that aims at assessing and improving the dependability level of WSNs by means of an event-based formal verification technique. The methodology includes a process to guide designers towards the realization of a dependable WSN and a tool (“ADVISES”) to simplify its adoption. The tool is applicable to homogeneous WSNs with static routing topologies. It allows the automatic generation of formal specifications used to check correctness properties and evaluate dependability metrics at design time and at runtime for WSNs where an acceptable percentage of faults can be defined. During the runtime, we can check the behavior of the WSN accordingly to the results obtained at design time and we can detect sudden and unexpected failures, in order to trigger recovery procedures. The effectiveness of the methodology is shown in the context of two case studies, as proof-of-concept, aiming to illustrate how the tool is helpful to drive design choices and to check the correctness properties of the WSN at runtime. Although the method scales up to very large WSNs, the applicability of the methodology may be compromised by the state space explosion of the reasoning model, which must be faced by partitioning large topologies into sub-topologies. PMID:28025568
Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoginath, Srikanth B; Perumalla, Kalyan S

2013-01-01

With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results frommore » experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.« less
The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hall, Clifford; School of Physics, Astronomy, and Computational Sciences, George Mason University, 4400 University Dr., Fairfax, VA 22030; Ji, Weixiao

2014-02-01

We present a CPU–GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-point code target, providing an effective means for the manipulation of atomistic or molecular data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform atomistic and molecular simulations are essential. Our system implements a versatile molecular engine, including inter-molecule interactions and orientational variables for performing the Metropolis Monte Carlo (MMC) algorithm,more » which is one type of Markov chain Monte Carlo. By combining memory objects with floating-point code fragments we have implemented an MMC parallel engine that entirely avoids the communication time of molecular data at runtime. Our runtime acceleration system is a forerunner of a new class of CPU–GPU algorithms exploiting memory concepts combined with threading for avoiding bus bandwidth and communication. The testbed molecular system used here is a condensed phase system of oligopyrrole chains. A benchmark shows a size scaling speedup of 60 for systems with 210,000 pyrrole monomers. Our implementation can easily be combined with MPI to connect in parallel several CPU–GPU duets. -- Highlights: •We parallelize the Metropolis Monte Carlo (MMC) algorithm on one CPU—GPU duet. •The Adaptive Tempering Monte Carlo employs MMC and profits from this CPU—GPU implementation. •Our benchmark shows a size scaling-up speedup of 62 for systems with 225,000 particles. •The testbed involves a polymeric system of oligopyrroles in the condensed phase. •The CPU—GPU parallelization includes dipole—dipole and Mie—Jones classic potentials.« less
Analog Input Data Acquisition Software

NASA Technical Reports Server (NTRS)

Arens, Ellen

2009-01-01

DAQ Master Software allows users to easily set up a system to monitor up to five analog input channels and save the data after acquisition. This program was written in LabVIEW 8.0, and requires the LabVIEW runtime engine 8.0 to run the executable.
Runtime Simulation for Post-Disaster Data Fusion Visualization

DTIC Science & Technology

2006-10-01

Center for Multisource Information Fusion ( CMIF ) The State University of New York at Buffalo Buffalo, NY 14260 USA kesh@eng.buffalo.edu ABSTRACT...Fusion ( CMIF ) The State University of New York at Buffalo Buffalo, NY 14260 USA 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING
A Type System For Certified Runtime Type Analysis

DTIC Science & Technology

2002-12-01

1999 ACM SIGPLAN International Conf. on Functional Pro- gramming (ICFP’99), pages 183–196. ACM Press, September 1999. [Min97] Yasuhiko Minamide. Full...lifting of type parameters. Technical report, RIMS, Kyoto University, 1997. [MMH96] Yasuhiko Minamide, Greg Morrisett, and Robert Harper. Typed
Planning And Reasoning For A Telerobot

NASA Technical Reports Server (NTRS)

Peters, Stephen F.; Mittman, David S.; Collins, Carol E.; O'Meara Callahan, Jacquelyn S.; Rokey, Mark J.

1992-01-01

Document discusses research and development of Telerobot Interactive Planning System (TIPS). Goal in development of TIPS is to enable it to accept instructions from operator, then command run-time controller to execute operations to execute instructions. Challenges in transferring technology from testbed to operational system discussed.
Efficient processing of two-dimensional arrays with C or C++

USGS Publications Warehouse

Donato, David I.

2017-07-20

Because fast and efficient serial processing of raster-graphic images and other two-dimensional arrays is a requirement in land-change modeling and other applications, the effects of 10 factors on the runtimes for processing two-dimensional arrays with C and C++ are evaluated in a comparative factorial study. This study’s factors include the choice among three C or C++ source-code techniques for array processing; the choice of Microsoft Windows 7 or a Linux operating system; the choice of 4-byte or 8-byte array elements and indexes; and the choice of 32-bit or 64-bit memory addressing. This study demonstrates how programmer choices can reduce runtimes by 75 percent or more, even after compiler optimizations. Ten points of practical advice for faster processing of two-dimensional arrays are offered to C and C++ programmers. Further study and the development of a C and C++ software test suite are recommended.Key words: array processing, C, C++, compiler, computational speed, land-change modeling, raster-graphic image, two-dimensional array, software efficiency
Jeagle: a JAVA Runtime Verification Tool

NASA Technical Reports Server (NTRS)

DAmorim, Marcelo; Havelund, Klaus

2005-01-01

We introduce the temporal logic Jeagle and its supporting tool for runtime verification of Java programs. A monitor for an Jeagle formula checks if a finite trace of program events satisfies the formula. Jeagle is a programming oriented extension of the rule-based powerful Eagle logic that has been shown to be capable of defining and implementing a range of finite trace monitoring logics, including future and past time temporal logic, real-time and metric temporal logics, interval logics, forms of quantified temporal logics, and so on. Monitoring is achieved on a state-by-state basis avoiding any need to store the input trace. Jeagle extends Eagle with constructs for capturing parameterized program events such as method calls and method returns. Parameters can be the objects that methods are called upon, arguments to methods, and return values. Jeagle allows one to refer to these in formulas. The tool performs automated program instrumentation using AspectJ. We show the transformational semantics of Jeagle.
Behavior-aware cache hierarchy optimization for low-power multi-core embedded systems

NASA Astrophysics Data System (ADS)

Zhao, Huatao; Luo, Xiao; Zhu, Chen; Watanabe, Takahiro; Zhu, Tianbo

2017-07-01

In modern embedded systems, the increasing number of cores requires efficient cache hierarchies to ensure data throughput, but such cache hierarchies are restricted by their tumid size and interference accesses which leads to both performance degradation and wasted energy. In this paper, we firstly propose a behavior-aware cache hierarchy (BACH) which can optimally allocate the multi-level cache resources to many cores and highly improved the efficiency of cache hierarchy, resulting in low energy consumption. The BACH takes full advantage of the explored application behaviors and runtime cache resource demands as the cache allocation bases, so that we can optimally configure the cache hierarchy to meet the runtime demand. The BACH was implemented on the GEM5 simulator. The experimental results show that energy consumption of a three-level cache hierarchy can be saved from 5.29% up to 27.94% compared with other key approaches while the performance of the multi-core system even has a slight improvement counting in hardware overhead.
Block-Parallel Data Analysis with DIY2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less

Cloudweaver: Adaptive and Data-Driven Workload Manager for Generic Clouds

NASA Astrophysics Data System (ADS)

Li, Rui; Chen, Lei; Li, Wen-Syan

Cloud computing denotes the latest trend in application development for parallel computing on massive data volumes. It relies on clouds of servers to handle tasks that used to be managed by an individual server. With cloud computing, software vendors can provide business intelligence and data analytic services for internet scale data sets. Many open source projects, such as Hadoop, offer various software components that are essential for building a cloud infrastructure. Current Hadoop (and many others) requires users to configure cloud infrastructures via programs and APIs and such configuration is fixed during the runtime. In this chapter, we propose a workload manager (WLM), called CloudWeaver, which provides automated configuration of a cloud infrastructure for runtime execution. The workload management is data-driven and can adapt to dynamic nature of operator throughput during different execution phases. CloudWeaver works for a single job and a workload consisting of multiple jobs running concurrently, which aims at maximum throughput using a minimum set of processors.
PPC750 Performance Monitor

NASA Technical Reports Server (NTRS)

Meyer, Donald; Uchenik, Igor

2007-01-01

The PPC750 Performance Monitor (Perfmon) is a computer program that helps the user to assess the performance characteristics of application programs running under the Wind River VxWorks real-time operating system on a PPC750 computer. Perfmon generates a user-friendly interface and collects performance data by use of performance registers provided by the PPC750 architecture. It processes and presents run-time statistics on a per-task basis over a repeating time interval (typically, several seconds or minutes) specified by the user. When the Perfmon software module is loaded with the user s software modules, it is available for use through Perfmon commands, without any modification of the user s code and at negligible performance penalty. Per-task run-time performance data made available by Perfmon include percentage time, number of instructions executed per unit time, dispatch ratio, stack high water mark, and level-1 instruction and data cache miss rates. The performance data are written to a file specified by the user or to the serial port of the computer
Multitasking runtime systems for the Cedar Multiprocessor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guzzi, M.D.

1986-07-01

The programming of a MIMD machine is more complex than for SISD and SIMD machines. The multiple computational resources of the machine must be made available to the programming language compiler and to the programmer so that multitasking programs may be written. This thesis will explore the additional complexity of programming a MIMD machine, the Cedar Multiprocessor specifically, and the multitasking runtime system necessary to provide multitasking resources to the user. First, the problem will be well defined: the Cedar machine, its operating system, the programming language, and multitasking concepts will be described. Second, a solution to the problem, calledmore » macrotasking, will be proposed. This solution provides multitasking facilities to the programmer at a very coarse level with many visible machine dependencies. Third, an alternate solution, called microtasking, will be proposed. This solution provides multitasking facilities of a much finer grain. This solution does not depend so rigidly on the specific architecture of the machine. Finally, the two solutions will be compared for effectiveness. 12 refs., 16 figs.« less
A performance comparison of the IBM RS/6000 and the Astronautics ZS-1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, W.M.; Abraham, S.G.; Davidson, E.S.

1991-01-01

Concurrent uniprocessor architectures, of which vector and superscalar are two examples, are designed to capitalize on fine-grain parallelism. The authors have developed a performance evaluation method for comparing and improving these architectures, and in this article they present the methodology and a detailed case study of two machines. The runtime of many programs is dominated by time spent in loop constructs - for example, Fortran Do-loops. Loops generally comprise two logical processes: The access process generates addresses for memory operations while the execute process operates on floating-point data. Memory access patterns typically can be generated independently of the data inmore » the execute process. This independence allows the access process to slip ahead, thereby hiding memory latency. The IBM 360/91 was designed in 1967 to achieve slip dynamically, at runtime. One CPU unit executes integer operations while another handles floating-point operations. Other machines, including the VAX 9000 and the IBM RS/6000, use a similar approach.« less
Certification Strategies using Run-Time Safety Assurance for Part 23 Autopilot Systems

NASA Technical Reports Server (NTRS)

Hook, Loyd R.; Clark, Matthew; Sizoo, David; Skoog, Mark A.; Brady, James

2016-01-01

Part 23 aircraft operation, and in particular general aviation, is relatively unsafe when compared to other common forms of vehicle travel. Currently, there exists technologies that could increase safety statistics for these aircraft; however, the high burden and cost of performing the requisite safety critical certification processes for these systems limits their proliferation. For this reason, many entities, including the Federal Aviation Administration, NASA, and the US Air Force, are considering new options for certification for technologies that will improve aircraft safety. Of particular interest, are low cost autopilot systems for general aviation aircraft, as these systems have the potential to positively and significantly affect safety statistics. This paper proposes new systems and techniques, leveraging run-time verification, for the assurance of general aviation autopilot systems, which would be used to supplement the current certification process and provide a viable path for near-term low-cost implementation. In addition, discussions on preliminary experimentation and building the assurance case for a system, based on these principles, is provided.
Reversible polymorphism-aware phylogenetic models and their application to tree inference.

PubMed

Schrempf, Dominik; Minh, Bui Quang; De Maio, Nicola; von Haeseler, Arndt; Kosiol, Carolin

2016-10-21

We present a reversible Polymorphism-Aware Phylogenetic Model (revPoMo) for species tree estimation from genome-wide data. revPoMo enables the reconstruction of large scale species trees for many within-species samples. It expands the alphabet of DNA substitution models to include polymorphic states, thereby, naturally accounting for incomplete lineage sorting. We implemented revPoMo in the maximum likelihood software IQ-TREE. A simulation study and an application to great apes data show that the runtimes of our approach and standard substitution models are comparable but that revPoMo has much better accuracy in estimating trees, divergence times and mutation rates. The advantage of revPoMo is that an increase of sample size per species improves estimations but does not increase runtime. Therefore, revPoMo is a valuable tool with several applications, from speciation dating to species tree reconstruction. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Self-Powered Multiparameter Health Sensor.

PubMed

Tobola, Andreas; Leutheuser, Heike; Pollak, Markus; Spies, Peter; Hofmann, Christian; Weigand, Christian; Eskofier, Bjoern M; Fischer, Georg

2018-01-01

Wearable health sensors are about to change our health system. While several technological improvements have been presented to enhance performance and energy-efficiency, battery runtime is still a critical concern for practical use of wearable biomedical sensor systems. The runtime limitation is directly related to the battery size, which is another concern regarding practicality and customer acceptance. We introduced ULPSEK-Ultra-Low-Power Sensor Evaluation Kit-for evaluation of biomedical sensors and monitoring applications (http://ulpsek.com). ULPSEK includes a multiparameter sensor measuring and processing electrocardiogram, respiration, motion, body temperature, and photoplethysmography. Instead of a battery, ULPSEK is powered using an efficient body heat harvester. The harvester produced 171 W on average, which was sufficient to power the sensor below 25 C ambient temperature. We present design issues regarding the power supply and the power distribution network of the ULPSEK sensor platform. Due to the security aspect of self-powered health sensors, we suggest a hybrid solution consisting of a battery charged by a harvester.
Porting DubaiSat-2 Flight Software to RTEMS: A Feasibility Study

NASA Astrophysics Data System (ADS)

Khoory, Mohammed; Al Shamsi, Zakareyya; Al Midfa, Ibrahim

2015-09-01

This paper details the process taken by EIAST to study RTEMS as a potential real-time operating system for future space missions. The direction was to attempt to run the DubaiSat-2 flight software under RTEMS 4.10.2 with as little modification to the original source as possible. The implementation used a “translation layer” to translate system calls used by the DS-2 flight software into RTEMS system calls. The RTEMS RTL project was integrated to satisfy the run-time loading requirement, and some differences in the filesystem were encountered and worked around. The implementation was tested for performance and stability, and comparisons were made. The conclusion is that RTEMS provides an adequate base for future space missions with certain advantages over other RTOS’s including cost, a smaller executable size, and control over the source. Drawbacks include the slow speed of loading tasks during runtime and some filesystem integrity issues during unexpected reboots.
SPANR planning and scheduling

NASA Astrophysics Data System (ADS)

Freund, Richard F.; Braun, Tracy D.; Kussow, Matthew; Godfrey, Michael; Koyama, Terry

2001-07-01

SPANR (Schedule, Plan, Assess Networked Resources) is (i) a pre-run, off-line planning and (ii) a runtime, just-in-time scheduling mechanism. It is designed to support primarily commercial applications in that it optimizes throughput rather than individual jobs (unless they have highest priority). Thus it is a tool for a commercial production manager to maximize total work. First the SPANR Planner is presented showing the ability to do predictive 'what-if' planning. It can answer such questions as, (i) what is the overall effect of acquiring new hardware or (ii) what would be the effect of a different scheduler. The ability of the SPANR Planner to formulate in advance tree-trimming strategies is useful in several commercial applications, such as electronic design or pharmaceutical simulations. The SPANR Planner is demonstrated using a variety of benchmarks. The SPANR Runtime Scheduler (RS) is briefly presented. The SPANR RS can provide benefit for several commercial applications, such as airframe design and financial applications. Finally a design is shown whereby SPANR can provide scheduling advice to most resource management systems.
Electrophoretic study of enzymes from cereal aphid populations : 4. Detection of hidden genetic variation within populations of the grain aphid Sitobion avenae (F.) (Hemiptera: Aphididae).

PubMed

Loxdale, H D; Rhodes, J A; Fox, J S

1985-07-01

A study of variation in three peptidases (PEP-3 to -5) in a parthenogenetic S. avenae field population at Rothamsted using serial one-dimensional polyacrylamide gel electrophoresis (involving changes of gel concentration and electrophoretic run-time) increased the overall number of "allozymes" (mobility variants) detected from 10 under standard conditions (6% gels, 2 h run-time) to 22, as well as revealing putative heterozygous banding patterns under some test conditions. However, an examination of another enzyme, 6-phosphogluconate dehydrogenase (6-PGD) in a sample collected at Rothamsted the following year failed, using a combination of serial methods (changes of gel concentration) and isoelectric focusing, to increase the total number of 6-PGD bands separated (seven, none of which appeared to be allelic in origin). Nevertheless, some major bands were split into several bands, whilst other infrequent bands were either gained or lost. The findings are briefly discussed.
Computation of indirect nuclear spin-spin couplings with reduced complexity in pure and hybrid density functional approximations.

PubMed

Luenser, Arne; Kussmann, Jörg; Ochsenfeld, Christian

2016-09-28

We present a (sub)linear-scaling algorithm to determine indirect nuclear spin-spin coupling constants at the Hartree-Fock and Kohn-Sham density functional levels of theory. Employing efficient integral algorithms and sparse algebra routines, an overall (sub)linear scaling behavior can be obtained for systems with a non-vanishing HOMO-LUMO gap. Calculations on systems with over 1000 atoms and 20 000 basis functions illustrate the performance and accuracy of our reference implementation. Specifically, we demonstrate that linear algebra dominates the runtime of conventional algorithms for 10 000 basis functions and above. Attainable speedups of our method exceed 6 × in total runtime and 10 × in the linear algebra steps for the tested systems. Furthermore, a convergence study of spin-spin couplings of an aminopyrazole peptide upon inclusion of the water environment is presented: using the new method it is shown that large solvent spheres are necessary to converge spin-spin coupling values.
Large-scale three-dimensional phase-field simulations for phase coarsening at ultrahigh volume fraction on high-performance architectures

NASA Astrophysics Data System (ADS)

Yan, Hui; Wang, K. G.; Jones, Jim E.

2016-06-01

A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
Estimation Accuracy on Execution Time of Run-Time Tasks in a Heterogeneous Distributed Environment.

PubMed

Liu, Qi; Cai, Weidong; Jin, Dandan; Shen, Jian; Fu, Zhangjie; Liu, Xiaodong; Linge, Nigel

2016-08-30

Distributed Computing has achieved tremendous development since cloud computing was proposed in 2006, and played a vital role promoting rapid growth of data collecting and analysis models, e.g., Internet of things, Cyber-Physical Systems, Big Data Analytics, etc. Hadoop has become a data convergence platform for sensor networks. As one of the core components, MapReduce facilitates allocating, processing and mining of collected large-scale data, where speculative execution strategies help solve straggler problems. However, there is still no efficient solution for accurate estimation on execution time of run-time tasks, which can affect task allocation and distribution in MapReduce. In this paper, task execution data have been collected and employed for the estimation. A two-phase regression (TPR) method is proposed to predict the finishing time of each task accurately. Detailed data of each task have drawn interests with detailed analysis report being made. According to the results, the prediction accuracy of concurrent tasks' execution time can be improved, in particular for some regular jobs.
Real-time motion compensated patient positioning and non-rigid deformation estimation using 4-D shape priors.

PubMed

Wasza, Jakob; Bauer, Sebastian; Hornegger, Joachim

2012-01-01

Over the last years, range imaging (RI) techniques have been proposed for patient positioning and respiration analysis in motion compensation. Yet, current RI based approaches for patient positioning employ rigid-body transformations, thus neglecting free-form deformations induced by respiratory motion. Furthermore, RI based respiration analysis relies on non-rigid registration techniques with run-times of several seconds. In this paper we propose a real-time framework based on RI to perform respiratory motion compensated positioning and non-rigid surface deformation estimation in a joint manner. The core of our method are pre-procedurally obtained 4-D shape priors that drive the intra-procedural alignment of the patient to the reference state, simultaneously yielding a rigid-body table transformation and a free-form deformation accounting for respiratory motion. We show that our method outperforms conventional alignment strategies by a factor of 3.0 and 2.3 in the rotation and translation accuracy, respectively. Using a GPU based implementation, we achieve run-times of 40 ms.
Continuous piecewise-linear, reduced-order electrochemical model for lithium-ion batteries in real-time applications

NASA Astrophysics Data System (ADS)

Farag, Mohammed; Fleckenstein, Matthias; Habibi, Saeid

2017-02-01

Model-order reduction and minimization of the CPU run-time while maintaining the model accuracy are critical requirements for real-time implementation of lithium-ion electrochemical battery models. In this paper, an isothermal, continuous, piecewise-linear, electrode-average model is developed by using an optimal knot placement technique. The proposed model reduces the univariate nonlinear function of the electrode's open circuit potential dependence on the state of charge to continuous piecewise regions. The parameterization experiments were chosen to provide a trade-off between extensive experimental characterization techniques and purely identifying all parameters using optimization techniques. The model is then parameterized in each continuous, piecewise-linear, region. Applying the proposed technique cuts down the CPU run-time by around 20%, compared to the reduced-order, electrode-average model. Finally, the model validation against real-time driving profiles (FTP-72, WLTP) demonstrates the ability of the model to predict the cell voltage accurately with less than 2% error.
On the Run-Time Optimization of the Boolean Logic of a Program.

ERIC Educational Resources Information Center

Cadolino, C.; Guazzo, M.

1982-01-01

Considers problem of optimal scheduling of Boolean expression (each Boolean variable represents binary outcome of program module) on single-processor system. Optimization discussed consists of finding operand arrangement that minimizes average execution costs representing consumption of resources (elapsed time, main memory, number of…
Airlift Operation Modeling Using Discrete Event Simulation (DES)

DTIC Science & Technology

2009-12-01

Java ......................................................................................................20 2. Simkit...JRE Java Runtime Environment JVM Java Virtual Machine lbs Pounds LAM Load Allocation Mode LRM Landing Spot Reassignment Mode LEGO Listener Event...SOFTWARE DEVELOPMENT ENVIRONMENT The following are the software tools and development environment used for constructing the models. 1. Java Java
A run-time control architecture for the JPL telerobot

NASA Technical Reports Server (NTRS)

Balaram, J.; Lokshin, A.; Kreutz, K.; Beahan, J.

1987-01-01

An architecture for implementing the process-level decision making for a hierarchically structured telerobot currently being implemented at the Jet Propolusion Laboratory (JPL) is described. Constraints on the architecture design, architecture partitioning concepts, and a detailed description of the existing and proposed implementations are provided.
Lost in Interaction in IMS Learning Design Runtime Environments

ERIC Educational Resources Information Center

Derntl, Michael; Neumann, Susanne; Oberhuemer, Petra

2014-01-01

Educators are exploiting the advantages of advanced web-based collaboration technologies and massive online interactions. Interactions between learners and human or nonhuman resources therefore play an increasingly important pedagogical role, and the way these interactions are expressed in the user interface of virtual learning environments is…
HAL/S-FC compiler system functional specification

NASA Technical Reports Server (NTRS)

1974-01-01

Compiler organization is discussed, including overall compiler structure, internal data transfer, compiler development, and code optimization. The user, system, and SDL interfaces are described, along with compiler system requirements. Run-time software support package and restrictions and dependencies are also considered of the HAL/S-FC system.

Issues Involved in Developing Ada Real-Time Systems

DTIC Science & Technology

1989-02-15

expensive modifications to the compiler or Ada runtime system to fit a particular application. Whether we can solve the problems of programming real - time systems in...lock in solutions to problems that are not yet well understood in standards as rigorous as the Ada language. Moreover, real - time systems typically have
Channels: Runtime System Infrastructure for Security-typed Languages

DTIC Science & Technology

2008-10-01

Milan , Italy, September 2005. Springer-Verlag. [2] D. E. Bell and L. J. LaPadula. Secure computer system: Uni- fied exposition and Multics...Proceedings of the USENIX Annual Technical Conference, Santa Clara, CA, USA, June 2007. To appear. [9] S. Kamara, S. Fahmy, E. Schultz , F. Kerschbaum, and
TENEX SAIL

NASA Technical Reports Server (NTRS)

Smith, R.

1975-01-01

SAIL, a high level ALGOL language for the PDP-10, is extended to operate under the TENEX time sharing system without executing DEC system calls. A large set of TENEX oriented runtime routines are added to allow complete access to TENEX. The emphasis is on compatibility of programs across time sharing systems and integrity of the language.
Notional Machines and Introductory Programming Education

ERIC Educational Resources Information Center

Sorva, Juha

2013-01-01

This article brings together, summarizes, and comments on several threads of research that have contributed to our understanding of the challenges that novice programmers face when learning about the runtime dynamics of programs and the role of the computer in program execution. More specifically, the review covers the literature on programming…
Detecting Runtime Anomalies in AJAX Applications through Trace Analysis

DTIC Science & Technology

2011-08-10

statements by adding the instrumentation to the GWT UI classes, leaving the user code untouched. Some content management frameworks such as Drupal [12...Google web toolkit.” http://code.google.com/webtoolkit/. [12] “Form generation – drupal api.” http://api.drupal.org/api/group/form_api/6. 9
Mobile Authoring of Open Educational Resources as Reusable Learning Objects

ERIC Educational Resources Information Center

Kinshuk; Jesse, Ryan

2013-01-01

E-learning technologies have allowed authoring and playback of standardized reusable learning objects (RLO) for several years. Effective mobile learning requires similar functionality at both design time and runtime. Mobile devices can play RLO using applications like SMILE, mobile access to a learning management system (LMS), or other systems…
On polynomial selection for the general number field sieve

NASA Astrophysics Data System (ADS)

Kleinjung, Thorsten

2006-12-01

The general number field sieve (GNFS) is the asymptotically fastest algorithm for factoring large integers. Its runtime depends on a good choice of a polynomial pair. In this article we present an improvement of the polynomial selection method of Montgomery and Murphy which has been used in recent GNFS records.
Use of a Modern Polymerization Pilot-Plant for Undergraduate Control Projects.

ERIC Educational Resources Information Center

Mendoza-Bustos, S. A.; And Others

1991-01-01

Described is a project where students gain experience in handling large volumes of hazardous materials, process start up and shut down, equipment failures, operational variations, scaling up, equipment cleaning, and run-time scheduling while working in a modern pilot plant. Included are the system design, experimental procedures, and results. (KR)
Three-Dimensional Near Infrared Imaging of Pathophysiological Changes Within the Breast

DTIC Science & Technology

2008-03-01

StO2: Oxygenation Saturatin (in %); H20: Waiter content (in %); a: Scattering Amplitude; b: Scattering Power Typically in these cases of noisy...estimated from Fig. 2(a) for the NN/NM ratio involved. The deviation in run-time that occurs in practice is likely due to the cost of memory management
The Challenge of Content Creation to Facilitate Personalized E-Learning Experiences

ERIC Educational Resources Information Center

Turker, Ali; Gorgun, Ilhami; Conlan, Owen

2006-01-01

The runtime creation of pedagogically coherent learning content for an individual learner's needs and preferences is a considerable challenge. By selecting and combining appropriate learning assets into a new learning object such needs and preferences may be accounted for. However, to assure coherence, these objects should be consumed within…
Remote mission specialist - A study in real-time, adaptive planning

NASA Technical Reports Server (NTRS)

Rokey, Mark J.

1990-01-01

A high-level planning architecture for robotic operations is presented. The remote mission specialist integrates high-level directives with low-level primitives executable by a run-time controller for command of autonomous servicing activities. The planner has been designed to address such issues as adaptive plan generation, real-time performance, and operator intervention.
An Ontology and a Software Framework for Competency Modeling and Management

ERIC Educational Resources Information Center

Paquette, Gilbert

2007-01-01

The importance given to competency management is well justified. Acquiring new competencies is the central goal of any education or knowledge management process. Thus, it must be embedded in any software framework as an instructional engineering tool, to inform the runtime environment of the knowledge that is processed by actors, and their…
AADL and Model-based Engineering

DTIC Science & Technology

2014-10-20

and MBE Feiler, Oct 20, 2014 © 2014 Carnegie Mellon University We Rely on Software for Safe Aircraft Operation Embedded software systems ...D eveloper Compute Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency...confusion Hardware Engineer Why do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software
Regulatory Conformance Checking: Logic and Logical Form

ERIC Educational Resources Information Center

Dinesh, Nikhil

2010-01-01

We consider the problem of checking whether an organization conforms to a body of regulation. Conformance is studied in a runtime verification setting. The regulation is translated to a logic, from which we synthesize monitors. The monitors are evaluated as the state of an organization evolves over time, raising an alarm if a violation is…
Raising the Degree of Service-Orientation of a SOA-based Software System: A Case Study

DTIC Science & Technology

2009-12-01

protocols, as well as executable processes that can be compiled into runtime scripts” [2] The Business Process Modeling Notation ( BPMN ) provides a...Notation ( BPMN ) 1.2. Jan. 2009. URL: http://www.omg.org/spec/ BPMN /1.2/ [25] .NET Framework Developer Center. .NET Remoting Overview. 2003. URL: http
Evaluating COCA--What Do Teachers Think?

ERIC Educational Resources Information Center

Major, Nigel

COCA, which consists of both authoring tools and a runtime shell, is a system intended to provide teachers with genuine access to intelligent tutoring system (ITS) technology and to give them control over domain material and teaching strategies. To evaluate the effectiveness of COCA, 10 subjects (five university teachers and five school teachers)…
NAEFS Products

Science.gov Websites

HOME PAGE Image of NCEP Logo WHERE AMERICA'S CLIMATE AND WEATHER SERVICES BEGIN NCEP Products Inventory Image of horizontal rule North American Ensemble Forecast System (NAEFS) Products Updated: 02/27 /2014 * Products Information about the NAEFS Models CC is the model cycle runtime (i.e. 00, 06, 12, 18
Distributed Memory Compiler Methods for Irregular Problems - Data Copy Reuse and Runtime Partitioning

DTIC Science & Technology

1991-09-01

addition, support for Saltz was provided by NSF from NSF Grant ASC-8819374. i 1, introduction Over the past fewyers, ,we have devoped -methods needed to... network . In Third Conf. on Hypercube Concurrent Computers and Applications, pages 241-27278, 1988. [17] G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel
Crosscutting Runtime Adaptations of LD Execution

ERIC Educational Resources Information Center

Zarraonandia, Telmo; Dodero, Juan Manuel; Fernandez, Camino

2006-01-01

In this paper, the authors describe a mechanism for the introduction of small variations in the original learning design process defined in a particular Unit of Learning (UoL). The objective is to increase the UoL reusability by offering the designers an alternative to introduce slight variations on the original design instead of creating a new…
Temporal Decompostion of a Distribution System Quasi-Static Time-Series Simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mather, Barry A; Hunsberger, Randolph J

This paper documents the first phase of an investigation into reducing runtimes of complex OpenDSS models through parallelization. As the method seems promising, future work will quantify - and further mitigate - errors arising from this process. In this initial report, we demonstrate how, through the use of temporal decomposition, the run times of a complex distribution-system-level quasi-static time series simulation can be reduced roughly proportional to the level of parallelization. Using this method, the monolithic model runtime of 51 hours was reduced to a minimum of about 90 minutes. As expected, this comes at the expense of control- andmore » voltage-errors at the time-slice boundaries. All evaluations were performed using a real distribution circuit model with the addition of 50 PV systems - representing a mock complex PV impact study. We are able to reduce induced transition errors through the addition of controls initialization, though small errors persist. The time savings with parallelization are so significant that we feel additional investigation to reduce control errors is warranted.« less

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barbara Chapman

OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close tomore » DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.« less
Adaptive Impact-Driven Detection of Silent Data Corruption for HPC Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Di, Sheng; Cappello, Franck

For exascale HPC applications, silent data corruption (SDC) is one of the most dangerous problems because there is no indication that there are errors during the execution. We propose an adaptive impact-driven method that can detect SDCs dynamically. The key contributions are threefold. (1) We carefully characterize 18 real-world HPC applications and discuss the runtime data features, as well as the impact of the SDCs on their execution results. (2) We propose an impact-driven detection model that does not blindly improve the prediction accuracy, but instead detects only influential SDCs to guarantee user-acceptable execution results. (3) Our solution can adaptmore » to dynamic prediction errors based on local runtime data and can automatically tune detection ranges for guaranteeing low false alarms. Experiments show that our detector can detect 80-99.99% of SDCs with a false alarm rate less that 1% of iterations for most cases. The memory cost and detection overhead are reduced to 15% and 6.3%, respectively, for a large majority of applications.« less
Advanced Software V&V for Civil Aviation and Autonomy

NASA Technical Reports Server (NTRS)

Brat, Guillaume P.

2017-01-01

With the advances in high-computing platform (e.g., advanced graphical processing units or multi-core processors), computationally-intensive software techniques such as the ones used in artificial intelligence or formal methods have provided us with an opportunity to further increase safety in the aviation industry. Some of these techniques have facilitated building safety at design time, like in aircraft engines or software verification and validation, and others can introduce safety benefits during operations as long as we adapt our processes. In this talk, I will present how NASA is taking advantage of these new software techniques to build in safety at design time through advanced software verification and validation, which can be applied earlier and earlier in the design life cycle and thus help also reduce the cost of aviation assurance. I will then show how run-time techniques (such as runtime assurance or data analytics) offer us a chance to catch even more complex problems, even in the face of changing and unpredictable environments. These new techniques will be extremely useful as our aviation systems become more complex and more autonomous.
An ant colony optimization based feature selection for web page classification.

PubMed

Saraç, Esra; Özel, Selma Ayşe

2014-01-01

The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.
BLESS 2: accurate, memory-efficient and fast error correction method.

PubMed

Heo, Yun; Ramachandran, Anand; Hwu, Wen-Mei; Ma, Jian; Chen, Deming

2016-08-01

The most important features of error correction tools for sequencing data are accuracy, memory efficiency and fast runtime. The previous version of BLESS was highly memory-efficient and accurate, but it was too slow to handle reads from large genomes. We have developed a new version of BLESS to improve runtime and accuracy while maintaining a small memory usage. The new version, called BLESS 2, has an error correction algorithm that is more accurate than BLESS, and the algorithm has been parallelized using hybrid MPI and OpenMP programming. BLESS 2 was compared with five top-performing tools, and it was found to be the fastest when it was executed on two computing nodes using MPI, with each node containing twelve cores. Also, BLESS 2 showed at least 11% higher gain while retaining the memory efficiency of the previous version for large genomes. Freely available at https://sourceforge.net/projects/bless-ec dchen@illinois.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Programmable computing with a single magnetoresistive element

NASA Astrophysics Data System (ADS)

Ney, A.; Pampuch, C.; Koch, R.; Ploog, K. H.

2003-10-01

The development of transistor-based integrated circuits for modern computing is a story of great success. However, the proved concept for enhancing computational power by continuous miniaturization is approaching its fundamental limits. Alternative approaches consider logic elements that are reconfigurable at run-time to overcome the rigid architecture of the present hardware systems. Implementation of parallel algorithms on such `chameleon' processors has the potential to yield a dramatic increase of computational speed, competitive with that of supercomputers. Owing to their functional flexibility, `chameleon' processors can be readily optimized with respect to any computer application. In conventional microprocessors, information must be transferred to a memory to prevent it from getting lost, because electrically processed information is volatile. Therefore the computational performance can be improved if the logic gate is additionally capable of storing the output. Here we describe a simple hardware concept for a programmable logic element that is based on a single magnetic random access memory (MRAM) cell. It combines the inherent advantage of a non-volatile output with flexible functionality which can be selected at run-time to operate as an AND, OR, NAND or NOR gate.
Estimation Accuracy on Execution Time of Run-Time Tasks in a Heterogeneous Distributed Environment

PubMed Central

Liu, Qi; Cai, Weidong; Jin, Dandan; Shen, Jian; Fu, Zhangjie; Liu, Xiaodong; Linge, Nigel

2016-01-01

Distributed Computing has achieved tremendous development since cloud computing was proposed in 2006, and played a vital role promoting rapid growth of data collecting and analysis models, e.g., Internet of things, Cyber-Physical Systems, Big Data Analytics, etc. Hadoop has become a data convergence platform for sensor networks. As one of the core components, MapReduce facilitates allocating, processing and mining of collected large-scale data, where speculative execution strategies help solve straggler problems. However, there is still no efficient solution for accurate estimation on execution time of run-time tasks, which can affect task allocation and distribution in MapReduce. In this paper, task execution data have been collected and employed for the estimation. A two-phase regression (TPR) method is proposed to predict the finishing time of each task accurately. Detailed data of each task have drawn interests with detailed analysis report being made. According to the results, the prediction accuracy of concurrent tasks’ execution time can be improved, in particular for some regular jobs. PMID:27589753
Sailfish: A flexible multi-GPU implementation of the lattice Boltzmann method

NASA Astrophysics Data System (ADS)

Januszewski, M.; Kostur, M.

2014-09-01

We present Sailfish, an open source fluid simulation package implementing the lattice Boltzmann method (LBM) on modern Graphics Processing Units (GPUs) using CUDA/OpenCL. We take a novel approach to GPU code implementation and use run-time code generation techniques and a high level programming language (Python) to achieve state of the art performance, while allowing easy experimentation with different LBM models and tuning for various types of hardware. We discuss the general design principles of the code, scaling to multiple GPUs in a distributed environment, as well as the GPU implementation and optimization of many different LBM models, both single component (BGK, MRT, ELBM) and multicomponent (Shan-Chen, free energy). The paper also presents results of performance benchmarks spanning the last three NVIDIA GPU generations (Tesla, Fermi, Kepler), which we hope will be useful for researchers working with this type of hardware and similar codes. Catalogue identifier: AETA_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AETA_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License, version 3 No. of lines in distributed program, including test data, etc.: 225864 No. of bytes in distributed program, including test data, etc.: 46861049 Distribution format: tar.gz Programming language: Python, CUDA C, OpenCL. Computer: Any with an OpenCL or CUDA-compliant GPU. Operating system: No limits (tested on Linux and Mac OS X). RAM: Hundreds of megabytes to tens of gigabytes for typical cases. Classification: 12, 6.5. External routines: PyCUDA/PyOpenCL, Numpy, Mako, ZeroMQ (for multi-GPU simulations), scipy, sympy Nature of problem: GPU-accelerated simulation of single- and multi-component fluid flows. Solution method: A wide range of relaxation models (LBGK, MRT, regularized LB, ELBM, Shan-Chen, free energy, free surface) and boundary conditions within the lattice Boltzmann method framework. Simulations can be run in single or double precision using one or more GPUs. Restrictions: The lattice Boltzmann method works for low Mach number flows only. Unusual features: The actual numerical calculations run exclusively on GPUs. The numerical code is built dynamically at run-time in CUDA C or OpenCL, using templates and symbolic formulas. The high-level control of the simulation is maintained by a Python process. Additional comments: !!!!! The distribution file for this program is over 45 Mbytes and therefore is not delivered directly when Download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. !!!!! Running time: Problem-dependent, typically minutes (for small cases or short simulations) to hours (large cases or long simulations).
Differential correlation for sequencing data.

PubMed

Siska, Charlotte; Kechris, Katerina

2017-01-19

Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from -omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of counts, often modeled with a negative binomial distribution making it difficult to apply standard correlation metrics. We have developed an R package for identifying DC called Discordant which uses mixture models for correlations between features and the Expectation Maximization (EM) algorithm for fitting parameters of the mixture model. Several correlation metrics for sequencing data are provided and tested using simulations. Other extensions in the Discordant package include additional modeling for different types of differential correlation, and faster implementation, using a subsampling routine to reduce run-time and address the assumption of independence between molecular feature pairs. With simulations and breast cancer miRNA-Seq and RNA-Seq data, we find that Spearman's correlation has the best performance among the tested correlation methods for identifying differential correlation. Application of Spearman's correlation in the Discordant method demonstrated the most power in ROC curves and sensitivity/specificity plots, and improved ability to identify experimentally validated breast cancer miRNA. We also considered including additional types of differential correlation, which showed a slight reduction in power due to the additional parameters that need to be estimated, but more versatility in applications. Finally, subsampling within the EM algorithm considerably decreased run-time with negligible effect on performance. A new method and R package called Discordant is presented for identifying differential correlation with sequencing data. Based on comparisons with different correlation metrics, this study suggests Spearman's correlation is appropriate for sequencing data, but other correlation metrics are available to the user depending on the application and data type. The Discordant method can also be extended to investigate additional DC types and subsampling with the EM algorithm is now available for reduced run-time. These extensions to the R package make Discordant more robust and versatile for multiple -omics studies.
Provenance for Runtime Workflow Steering and Validation in Computational Seismology

NASA Astrophysics Data System (ADS)

Spinuso, A.; Krischer, L.; Krause, A.; Filgueira, R.; Magnoni, F.; Muraleedharan, V.; David, M.

2014-12-01

Provenance systems may be offered by modern workflow engines to collect metadata about the data transformations at runtime. If combined with effective visualisation and monitoring interfaces, these provenance recordings can speed up the validation process of an experiment, suggesting interactive or automated interventions with immediate effects on the lifecycle of a workflow run. For instance, in the field of computational seismology, if we consider research applications performing long lasting cross correlation analysis and high resolution simulations, the immediate notification of logical errors and the rapid access to intermediate results, can produce reactions which foster a more efficient progress of the research. These applications are often executed in secured and sophisticated HPC and HTC infrastructures, highlighting the need for a comprehensive framework that facilitates the extraction of fine grained provenance and the development of provenance aware components, leveraging the scalability characteristics of the adopted workflow engines, whose enactment can be mapped to different technologies (MPI, Storm clusters, etc). This work looks at the adoption of W3C-PROV concepts and data model within a user driven processing and validation framework for seismic data, supporting also computational and data management steering. Validation needs to balance automation with user intervention, considering the scientist as part of the archiving process. Therefore, the provenance data is enriched with community-specific metadata vocabularies and control messages, making an experiment reproducible and its description consistent with the community understandings. Moreover, it can contain user defined terms and annotations. The current implementation of the system is supported by the EU-Funded VERCE (http://verce.eu). It provides, as well as the provenance generation mechanisms, a prototypal browser-based user interface and a web API built on top of a NoSQL storage technology, experimenting ways to ensure a rapid and flexible access to the lineage traces. It supports the users with the visualisation of graphical products and offers combined operations to access and download the data which may be selectively stored at runtime, into dedicated data archives.
Cybathlon experiences of the Graz BCI racing team Mirage91 in the brain-computer interface discipline.

PubMed

Statthaler, Karina; Schwarz, Andreas; Steyrl, David; Kobler, Reinmar; Höller, Maria Katharina; Brandstetter, Julia; Hehenberger, Lea; Bigga, Marvin; Müller-Putz, Gernot

2017-12-28

In this work, we share our experiences made at the world-wide first CYBATHLON, an event organized by the Eidgenössische Technische Hochschule Zürich (ETH Zürich), which took place in Zurich in October 2016. It is a championship for severely motor impaired people using assistive prototype devices to compete against each other. Our team, the Graz BCI Racing Team MIRAGE91 from Graz University of Technology, participated in the discipline "Brain-Computer Interface Race". A brain-computer interface (BCI) is a device facilitating control of applications via the user's thoughts. Prominent applications include assistive technology such as wheelchairs, neuroprostheses or communication devices. In the CYBATHLON BCI Race, pilots compete in a BCI-controlled computer game. We report on setting up our team, the BCI customization to our pilot including long term training and the final BCI system. Furthermore, we describe CYBATHLON participation and analyze our CYBATHLON result. We found that our pilot was compliant over the whole time and that we could significantly reduce the average runtime between start and finish from initially 178 s to 143 s. After the release of the final championship specifications with shorter track length, the average runtime converged to 120 s. We successfully participated in the qualification race at CYBATHLON 2016, but performed notably worse than during training, with a runtime of 196 s. We speculate that shifts in the features, due to the nonstationarities in the electroencephalogram (EEG), but also arousal are possible reasons for the unexpected result. Potential counteracting measures are discussed. The CYBATHLON 2016 was a great opportunity for our student team. We consolidated our theoretical knowledge and turned it into practice, allowing our pilot to play a computer game. However, further research is required to make BCI technology invariant to non-task related changes of the EEG.
Using the Multiplicative Schwarz Alternating Algorithm (MSAA) for Solving the Large Linear System of Equations Related to Global Gravity Field Recovery up to Degree and Order 120

NASA Astrophysics Data System (ADS)

Safari, A.; Sharifi, M. A.; Amjadiparvar, B.

2010-05-01

The GRACE mission has substantiated the low-low satellite-to-satellite tracking (LL-SST) concept. The LL-SST configuration can be combined with the previously realized high-low SST concept in the CHAMP mission to provide a much higher accuracy. The line of sight (LOS) acceleration difference between the GRACE satellite pair is the mostly used observable for mapping the global gravity field of the Earth in terms of spherical harmonic coefficients. In this paper, mathematical formulae for LOS acceleration difference observations have been derived and the corresponding linear system of equations has been set up for spherical harmonic up to degree and order 120. The total number of unknowns is 14641. Such a linear equation system can be solved with iterative solvers or direct solvers. However, the runtime of direct methods or that of iterative solvers without a suitable preconditioner increases tremendously. This is the reason why we need a more sophisticated method to solve the linear system of problems with a large number of unknowns. Multiplicative variant of the Schwarz alternating algorithm is a domain decomposition method, which allows it to split the normal matrix of the system into several smaller overlaped submatrices. In each iteration step the multiplicative variant of the Schwarz alternating algorithm solves linear systems with the matrices obtained from the splitting successively. It reduces both runtime and memory requirements drastically. In this paper we propose the Multiplicative Schwarz Alternating Algorithm (MSAA) for solving the large linear system of gravity field recovery. The proposed algorithm has been tested on the International Association of Geodesy (IAG)-simulated data of the GRACE mission. The achieved results indicate the validity and efficiency of the proposed algorithm in solving the linear system of equations from accuracy and runtime points of view. Keywords: Gravity field recovery, Multiplicative Schwarz Alternating Algorithm, Low-Low Satellite-to-Satellite Tracking
Enforcement of entailment constraints in distributed service-based business processes.

PubMed

Hummer, Waldemar; Gaubatz, Patrick; Strembeck, Mark; Zdun, Uwe; Dustdar, Schahram

2013-11-01

A distributed business process is executed in a distributed computing environment. The service-oriented architecture (SOA) paradigm is a popular option for the integration of software services and execution of distributed business processes. Entailment constraints, such as mutual exclusion and binding constraints, are important means to control process execution. Mutually exclusive tasks result from the division of powerful rights and responsibilities to prevent fraud and abuse. In contrast, binding constraints define that a subject who performed one task must also perform the corresponding bound task(s). We aim to provide a model-driven approach for the specification and enforcement of task-based entailment constraints in distributed service-based business processes. Based on a generic metamodel, we define a domain-specific language (DSL) that maps the different modeling-level artifacts to the implementation-level. The DSL integrates elements from role-based access control (RBAC) with the tasks that are performed in a business process. Process definitions are annotated using the DSL, and our software platform uses automated model transformations to produce executable WS-BPEL specifications which enforce the entailment constraints. We evaluate the impact of constraint enforcement on runtime performance for five selected service-based processes from existing literature. Our evaluation demonstrates that the approach correctly enforces task-based entailment constraints at runtime. The performance experiments illustrate that the runtime enforcement operates with an overhead that scales well up to the order of several ten thousand logged invocations. Using our DSL annotations, the user-defined process definition remains declarative and clean of security enforcement code. Our approach decouples the concerns of (non-technical) domain experts from technical details of entailment constraint enforcement. The developed framework integrates seamlessly with WS-BPEL and the Web services technology stack. Our prototype implementation shows the feasibility of the approach, and the evaluation points to future work and further performance optimizations.
A performance model for GPUs with caches

DOE PAGES

Dao, Thanh Tuan; Kim, Jungwon; Seo, Sangmin; ...

2014-06-24

To exploit the abundant computational power of the world's fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, accurate performance estimation models for modern GPUs do not exist. This paper presents two accurate models for modern GPUs: a sampling-based linear model, and a model based on machine-learning (ML) techniques which improves the accuracy of the linear model and is applicable to modern GPUs with and without caches. We first construct the sampling-based linear model to predict the runtime of an arbitrary OpenCL kernel. Based on anmore » analysis of NVIDIA GPUs' scheduling policies we determine the earliest sampling points that allow an accurate estimation. The linear model cannot capture well the significant effects that memory coalescing or caching as implemented in modern GPUs have on performance. We therefore propose a model based on ML techniques that takes several compiler-generated statistics about the kernel as well as the GPU's hardware performance counters as additional inputs to obtain a more accurate runtime performance estimation for modern GPUs. We demonstrate the effectiveness and broad applicability of the model by applying it to three different NVIDIA GPU architectures and one AMD GPU architecture. On an extensive set of OpenCL benchmarks, on average, the proposed model estimates the runtime performance with less than 7 percent error for a second-generation GTX 280 with no on-chip caches and less than 5 percent for the Fermi-based GTX 580 with hardware caches. On the Kepler-based GTX 680, the linear model has an error of less than 10 percent. On an AMD GPU architecture, Radeon HD 6970, the model estimates with 8 percent of error rates. As a result, the proposed technique outperforms existing models by a factor of 5 to 6 in terms of accuracy.« less
A Context-Aware Self-Adaptive Fractal Based Generalized Pedagogical Agent Framework for Mobile Learning

ERIC Educational Resources Information Center

Boulehouache, Soufiane; Maamri, Ramdane; Sahnoun, Zaidi

2015-01-01

The Pedagogical Agents (PAs) for Mobile Learning (m-learning) must be able not only to adapt the teaching to the learner knowledge level and profile but also to ensure the pedagogical efficiency within unpredictable changing runtime contexts. Therefore, to deal with this issue, this paper proposes a Context-aware Self-Adaptive Fractal Component…
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jain, Atul K.

The overall objectives of this DOE funded project is to combine scientific and computational challenges in climate modeling by expanding our understanding of the biogeophysical-biogeochemical processes and their interactions in the northern high latitudes (NHLs) using an earth system modeling (ESM) approach, and by adopting an adaptive parallel runtime system in an ESM to achieve efficient and scalable climate simulations through improved load balancing algorithms.
A Hybrid Constraint Representation and Reasoning Framework

NASA Technical Reports Server (NTRS)

Golden, Keith; Pang, Wan-Lin

2003-01-01

This paper introduces JNET, a novel constraint representation and reasoning framework that supports procedural constraints and constraint attachments, providing a flexible way of integrating the constraint reasoner with a run- time software environment. Attachments in JNET are constraints over arbitrary Java objects, which are defined using Java code, at runtime, with no changes to the JNET source code.
A Review of Generic Program Visualization Systems for Introductory Programming Education

ERIC Educational Resources Information Center

Sorva, Juha; Karavirta, Ville; Malmi, Lauri

2013-01-01

This article is a survey of program visualization systems intended for teaching beginners about the runtime behavior of computer programs. Our focus is on generic systems that are capable of illustrating many kinds of programs and behaviors. We inclusively describe such systems from the last three decades and review findings from their empirical…
Prototyping distributed simulation networks

NASA Technical Reports Server (NTRS)

Doubleday, Dennis L.

1990-01-01

Durra is a declarative language designed to support application-level programming. The use of Durra is illustrated to describe a simple distributed application: a simulation of a collection of networked vehicle simulators. It is shown how the language is used to describe the application, its components and structure, and how the runtime executive provides for the execution of the application.
Lighting Control Systems

DTIC Science & Technology

2004-02-26

Shorter payback periods After 19 Cost Benefit of Powerlink Rule of Thumb for Powerlink: Powerlink becomes more cost effective beyond 16 controlled...web enabled control (and management software) Increase in level of integration between building systems Increase in new features, functions, benefits ...focus on reducing run-time via Scheduling, Sensing, Switching Growing focus on payback Direct energy cost (with demand) Additional maintenance benefits

Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming.

PubMed

Wang, Haizhou; Song, Mingzhou

2011-12-01

The heuristic k -means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called Ckmeans.1d.dp . We demonstrate its advantage in optimality and runtime over the standard iterative k -means algorithm.
Final report: Compiled MPI. Cost-Effective Exascale Application Development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gropp, William Douglas

2015-12-21

This is the final report on Compiled MPI: Cost-Effective Exascale Application Development, and summarizes the results under this project. The project investigated runtime enviroments that improve the performance of MPI (Message-Passing Interface) programs; work at Illinois in the last period of this project looked at optimizing data access optimizations expressed with MPI datatypes.
Runtime Systems for Extreme Scale Platforms

DTIC Science & Technology

2013-12-01

Programming in Java, PPPJ ’11, (New York, NY, USA), pp. 51–61, ACM, 2011. [76] Y. Guo, R. Barik , R. Raman, and V. Sarkar, “Work-first and help-first...95] R. Barik , J. Zhao, D. Grove, I. Peshansky, Z. Budimlić, and V. Sarkar, “Commu- nication Optimizations for Distributed-Memory X10 Programs,” in
Existing Whole-House Solutions Case Study: Build San Antonio Green, San Antonio, Texas

DOE Office of Scientific and Technical Information (OSTI.GOV)

none,

2013-06-01

PNNL, FSEC, and CalcsPlus provided technical assistance to Build San Antonio Green on three deep energy retrofits. For this gut rehab they replaced the old roof with a steeper roof and replaced drywall while adding insulation, new HVAC, sealed ducts, transfer grilles, outside air run-time ventilation, new lighting and water heater.
Sensor-Free or Sensor-Full: A Comparison of Data Modalities in Multi-Channel Affect Detection

ERIC Educational Resources Information Center

Paquette, Luc; Rowe, Jonathan; Baker, Ryan; Mott, Bradford; Lester, James; DeFalco, Jeanine; Brawner, Keith; Sottilare, Robert; Georgoulas, Vasiliki

2016-01-01

Computational models that automatically detect learners' affective states are powerful tools for investigating the interplay of affect and learning. Over the past decade, affect detectors--which recognize learners' affective states at run-time using behavior logs and sensor data--have advanced substantially across a range of K-12 and postsecondary…
NearFar: A computer program for nearside farside decomposition of heavy-ion elastic scattering amplitude

NASA Astrophysics Data System (ADS)

Cha, Moon Hoe

2007-02-01

The NearFar program is a package for carrying out an interactive nearside-farside decomposition of heavy-ion elastic scattering amplitude. The program is implemented in Java to perform numerical operations on the nearside and farside angular distributions. It contains a graphical display interface for the numerical results. A test run has been applied to the elastic O16+Si28 scattering at E=1503 MeV. Program summaryTitle of program: NearFar Catalogue identifier: ADYP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYP_v1_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions: none Computers: designed for any machine capable of running Java, developed on PC-Pentium-4 Operating systems under which the program has been tested: Microsoft Windows XP (Home Edition) Program language used: Java Number of bits in a word: 64 Memory required to execute with typical data: case dependent No. of lines in distributed program, including test data, etc.: 3484 Number of bytes distributed program, including test data, etc.: 142 051 Distribution format: tar.gz Other software required: A Java runtime interpreter, or the Java Development Kit, version 5.0 Nature of physical problem: Interactive nearside-farside decomposition of heavy-ion elastic scattering amplitude. Method of solution: The user must supply a external data file or PPSM parameters which calculates theoretical values of the quantities to be decomposed. Typical running time: Problem dependent. In a test run, it is about 35 s on a 2.40 GHz Intel P4-processor machine.
Cucheb: A GPU implementation of the filtered Lanczos procedure

NASA Astrophysics Data System (ADS)

Aurentz, Jared L.; Kalantzis, Vassilis; Saad, Yousef

2017-11-01

This paper describes the software package Cucheb, a GPU implementation of the filtered Lanczos procedure for the solution of large sparse symmetric eigenvalue problems. The filtered Lanczos procedure uses a carefully chosen polynomial spectral transformation to accelerate convergence of the Lanczos method when computing eigenvalues within a desired interval. This method has proven particularly effective for eigenvalue problems that arise in electronic structure calculations and density functional theory. We compare our implementation against an equivalent CPU implementation and show that using the GPU can reduce the computation time by more than a factor of 10. Program Summary Program title: Cucheb Program Files doi:http://dx.doi.org/10.17632/rjr9tzchmh.1 Licensing provisions: MIT Programming language: CUDA C/C++ Nature of problem: Electronic structure calculations require the computation of all eigenvalue-eigenvector pairs of a symmetric matrix that lie inside a user-defined real interval. Solution method: To compute all the eigenvalues within a given interval a polynomial spectral transformation is constructed that maps the desired eigenvalues of the original matrix to the exterior of the spectrum of the transformed matrix. The Lanczos method is then used to compute the desired eigenvectors of the transformed matrix, which are then used to recover the desired eigenvalues of the original matrix. The bulk of the operations are executed in parallel using a graphics processing unit (GPU). Runtime: Variable, depending on the number of eigenvalues sought and the size and sparsity of the matrix. Additional comments: Cucheb is compatible with CUDA Toolkit v7.0 or greater.
Static Extraction and Conformance Analysis of Hierarchical Runtime Architectural Structure

DTIC Science & Technology

2010-05-14

Example: CryptoDB 253 Architectural Component Java Class Note CustomerManager cryptodb.test.CustomerManager AKA “ crypto consumer” CustomerManager.Receipts...PROVIDERS PLAIN KEYID KEYMANAGEMENT KEYSTORAGE CRYPTO (+) (+) (+) (+) (+) (+) (+)(+) Figure 7.29: CryptoDB: Level-0 OOG with String objects...better understand this communication, we declared different domains for plain-text (PLAIN), encrypted ( CRYPTO ), alias identifier (ALIASID), and key
Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)

DOE Office of Scientific and Technical Information (OSTI.GOV)

The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.
Using reconfigurable hardware to accelerate multiple sequence alignment with ClustalW.

PubMed

Oliver, Tim; Schmidt, Bertil; Nathan, Darran; Clemens, Ralf; Maskell, Douglas

2005-08-15

Aligning hundreds of sequences using progressive alignment tools such as ClustalW requires several hours on state-of-the-art workstations. We present a new approach to compute multiple sequence alignments in far shorter time using reconfigurable hardware. This results in an implementation of ClustalW with significant runtime savings on a standard off-the-shelf FPGA.
Improving Security in Software Acquisition and Runtime Integration With Data Retention Specifications

DTIC Science & Technology

2016-04-30

Data Retention Specifications Daniel Smullen, Research Assistant, Carnegie Mellon University Travis Breaux, Assistant Professor, Carnegie Mellon... Carnegie Mellon University Travis Breaux, Assistant Professor, Carnegie Mellon University Cybersecurity Figure of Merit CAPT Brian Erickson, USN, SPAWAR...Integration With Data Retention Specifications Daniel Smullen—is a Research Assistant enrolled in the software engineering PhD program at Carnegie Mellon
Language Abstractions for Software-Defined Networks

DTIC Science & Technology

2012-01-01

Academy Christopher Monsanto Princeton University Mark Reitblatt Cornell University Jennifer Rexford Princeton University Alec Story Cornell...Transactions on Networking, 17(4), August 2009. [3] Nate Foster, Rob Harrison, Michael J. Freedman, Christopher Monsanto , Jennifer Rexford, Alec Story...networks. SIGCOMM CCR, 38(2):69–74, 2008. [7] Christopher Monsanto , Nate Foster, Rob Harrison, and David Walker. A compiler and run-time system for
Core Flight System (cFS) a Low Cost Solution for SmallSats

NASA Technical Reports Server (NTRS)

McComas, David; Strege, Susanne; Wilmot, Jonathan

2015-01-01

The cFS is a FSW product line that uses a layered architecture and compile-time configuration parameters which make it portable and scalable for a wide range of platforms. The software layers that defined the application run-time environment are now under a NASA-wide configuration control board with the goal of sustaining an open-source application ecosystem.
Real-time Scheduling for GPUS with Applications in Advanced Automotive Systems

DTIC Science & Technology

2015-01-01

129 3.7 Architecture of GPU tasklet scheduling infrastructure ...throughput. This disparity is even greater when we consider mobile CPUs, such as those designed by ARM. For instance, the ARM Cortex-A15 series processor as...stub library that replaces the GPGPU runtime within each virtual machine. The stub library communicates API calls to a GPGPU backend user-space daemon
An Interface Transformation Strategy for AF-IPPS

DTIC Science & Technology

2012-12-01

Representational State Transfer (REST) and Java Enterprise Edition ( Java EE) to implement a reusable “translation service.” For SOAP and REST protocols, XML and...of best-of-breed open source software. The product baseline is summarized in the following table: Product Function Description Java Language...Compiler & Runtime JBoss Application Server Applications, Messaging, Translation Java EE Application Server Ruby on Rails Applications Ruby Web
Runtime Assurance Framework Development for Highly Adaptive Flight Control Systems

DTIC Science & Technology

2015-12-01

performing a surveillance mission. The demonstration platform consisted of RTA systems for the inner- loop control, outer- loop guidance, ownship flight...For the inner- loop , the concept of employing multiple transition controllers in the reversionary control system was studied. For all feedback levels...5 RTA Protection Applied to Inner- Loop Control Systems .................................................61 5.1 General Description of Morphing Wing
An Incremental Life-cycle Assurance Strategy for Critical System Certification

DTIC Science & Technology

2014-11-04

for Safe Aircraft Operation Embedded software systems introduce a new class of problems not addressed by traditional system modeling & analysis...Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency jitter affects control behavior...do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software system as major source of
Integrated Environment for Development and Assurance

DTIC Science & Technology

2015-01-26

Jan 26, 2015 © 2015 Carnegie Mellon University We Rely on Software for Safe Aircraft Operation Embedded software systems introduce a new class of...eveloper Compute Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency jitter affects...Why do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software system as major source of
Architecting Service-Oriented Systems

DTIC Science & Technology

2011-08-01

Abstract Service orientation is an approach to software systems development that has become a popular way to implement distributed, loosely coupled...runtime. The later you defer binding the more flexibility service providers and service consumers have to develop their software systems independently...Enterprise Service Bus An Enterprise Service Bus (ESB) is a software pattern that can be part of a SOA infrastructure and acts as an intermediary
An Ant Colony Optimization Based Feature Selection for Web Page Classification

PubMed Central

2014-01-01

The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods. PMID:25136678

A Component-Based Approach for Securing Indoor Home Care Applications

PubMed Central

Estévez, Elisabet

2017-01-01

eHealth systems have adopted recent advances on sensing technologies together with advances in information and communication technologies (ICT) in order to provide people-centered services that improve the quality of life of an increasingly elderly population. As these eHealth services are founded on the acquisition and processing of sensitive data (e.g., personal details, diagnosis, treatments and medical history), any security threat would damage the public’s confidence in them. This paper proposes a solution for the design and runtime management of indoor eHealth applications with security requirements. The proposal allows applications definition customized to patient particularities, including the early detection of health deterioration and suitable reaction (events) as well as security needs. At runtime, security support is twofold. A secured component-based platform supervises applications execution and provides events management, whilst the security of the communications among application components is also guaranteed. Additionally, the proposed event management scheme adopts the fog computing paradigm to enable local event related data storage and processing, thus saving communication bandwidth when communicating with the cloud. As a proof of concept, this proposal has been validated through the monitoring of the health status in diabetic patients at a nursing home. PMID:29278370
A portable pattern-based design technology co-optimization flow to reduce optical proximity correction run-time

NASA Astrophysics Data System (ADS)

Chen, Yi-Chieh; Li, Tsung-Han; Lin, Hung-Yu; Chen, Kao-Tun; Wu, Chun-Sheng; Lai, Ya-Chieh; Hurat, Philippe

2018-03-01

Along with process improvement and integrated circuit (IC) design complexity increased, failure rate caused by optical getting higher in the semiconductor manufacture. In order to enhance chip quality, optical proximity correction (OPC) plays an indispensable rule in the manufacture industry. However, OPC, includes model creation, correction, simulation and verification, is a bottleneck from design to manufacture due to the multiple iterations and advanced physical behavior description in math. Thus, this paper presented a pattern-based design technology co-optimization (PB-DTCO) flow in cooperation with OPC to find out patterns which will negatively affect the yield and fixed it automatically in advance to reduce the run-time in OPC operation. PB-DTCO flow can generate plenty of test patterns for model creation and yield gaining, classify candidate patterns systematically and furthermore build up bank includes pairs of match and optimization patterns quickly. Those banks can be used for hotspot fixing, layout optimization and also be referenced for the next technology node. Therefore, the combination of PB-DTCO flow with OPC not only benefits for reducing the time-to-market but also flexible and can be easily adapted to diversity OPC flow.
Fault-Tolerant and Elastic Streaming MapReduce with Decentralized Coordination

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kumbhare, Alok; Frincu, Marc; Simmhan, Yogesh

2015-06-29

The MapReduce programming model, due to its simplicity and scalability, has become an essential tool for processing large data volumes in distributed environments. Recent Stream Processing Systems (SPS) extend this model to provide low-latency analysis of high-velocity continuous data streams. However, integrating MapReduce with streaming poses challenges: first, the runtime variations in data characteristics such as data-rates and key-distribution cause resource overload, that inturn leads to fluctuations in the Quality of the Service (QoS); and second, the stateful reducers, whose state depends on the complete tuple history, necessitates efficient fault-recovery mechanisms to maintain the desired QoS in the presence ofmore » resource failures. We propose an integrated streaming MapReduce architecture leveraging the concept of consistent hashing to support runtime elasticity along with locality-aware data and state replication to provide efficient load-balancing with low-overhead fault-tolerance and parallel fault-recovery from multiple simultaneous failures. Our evaluation on a private cloud shows up to 2:8 improvement in peak throughput compared to Apache Storm SPS, and a low recovery latency of 700 -1500 ms from multiple failures.« less
MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.

PubMed

González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil

2016-12-15

MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Run-time implementation issues for real-time embedded Ada

NASA Technical Reports Server (NTRS)

Maule, Ruth A.

1986-01-01

A motivating factor in the development of Ada as the department of defense standard language was the high cost of embedded system software development. It was with embedded system requirements in mind that many of the features of the language were incorporated. Yet it is the designers of embedded systems that seem to comprise the majority of the Ada community dissatisfied with the language. There are a variety of reasons for this dissatisfaction, but many seem to be related in some way to the Ada run-time support system. Some of the areas in which the inconsistencies were found to have the greatest impact on performance from the standpoint of real-time systems are presented. In particular, a large part of the duties of the tasking supervisor are subject to the design decisions of the implementer. These include scheduling, rendezvous, delay processing, and task activation and termination. Some of the more general issues presented include time and space efficiencies, generic expansions, memory management, pragmas, and tracing features. As validated compilers become available for bare computer targets, it is important for a designer to be aware that, at least for many real-time issues, all validated Ada compilers are not created equal.
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems

DOE PAGES

Song, Fengguang; Dongarra, Jack

2014-10-01

Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
Thread scheduling for GPU-based OPC simulation on multi-thread

NASA Astrophysics Data System (ADS)

Lee, Heejun; Kim, Sangwook; Hong, Jisuk; Lee, Sooryong; Han, Hwansoo

2018-03-01

As semiconductor product development based on shrinkage continues, the accuracy and difficulty required for the model based optical proximity correction (MBOPC) is increasing. OPC simulation time, which is the most timeconsuming part of MBOPC, is rapidly increasing due to high pattern density in a layout and complex OPC model. To reduce OPC simulation time, we attempt to apply graphic processing unit (GPU) to MBOPC because OPC process is good to be programmed in parallel. We address some issues that may typically happen during GPU-based OPC simulation in multi thread system, such as "out of memory" and "GPU idle time". To overcome these problems, we propose a thread scheduling method, which manages OPC jobs in multiple threads in such a way that simulations jobs from multiple threads are alternatively executed on GPU while correction jobs are executed at the same time in each CPU cores. It was observed that the amount of GPU peak memory usage decreases by up to 35%, and MBOPC runtime also decreases by 4%. In cases where out of memory issues occur in a multi-threaded environment, the thread scheduler was used to improve MBOPC runtime up to 23%.
Runtime verification of embedded real-time systems.

PubMed

Reinbacher, Thomas; Függer, Matthias; Brauer, Jörg

We present a runtime verification framework that allows on-line monitoring of past-time Metric Temporal Logic (ptMTL) specifications in a discrete time setting. We design observer algorithms for the time-bounded modalities of ptMTL, which take advantage of the highly parallel nature of hardware designs. The algorithms can be translated into efficient hardware blocks, which are designed for reconfigurability, thus, facilitate applications of the framework in both a prototyping and a post-deployment phase of embedded real-time systems. We provide formal correctness proofs for all presented observer algorithms and analyze their time and space complexity. For example, for the most general operator considered, the time-bounded Since operator, we obtain a time complexity that is doubly logarithmic both in the point in time the operator is executed and the operator's time bounds. This result is promising with respect to a self-contained, non-interfering monitoring approach that evaluates real-time specifications in parallel to the system-under-test. We implement our framework on a Field Programmable Gate Array platform and use extensive simulation and logic synthesis runs to assess the benefits of the approach in terms of resource usage and operating frequency.
Prototyping Tool for Web-Based Multiuser Online Role-Playing Game

NASA Astrophysics Data System (ADS)

Okamoto, Shusuke; Kamada, Masaru; Yonekura, Tatsuhiro

This letter proposes a prototyping tool for Web-based Multiuser Online Role-Playing Game (MORPG). The design goal is to make this tool simple and powerful. The tool is comprised of a GUI editor, a translator and a runtime environment. The GUI editor is used to edit state-transition diagrams, each of which defines the behavior of the fictional characters. The state-transition diagrams are translated into C program codes, which plays the role of a game engine in RPG system. The runtime environment includes PHP, JavaScript with Ajax and HTML. So the prototype system can be played on the usual Web browser, such as Fire-fox, Safari and IE. On a click or key press by a player, the Web browser sends it to the Web server to reflect its consequence on the screens which other players are looking at. Prospected users of this tool include programming novices and schoolchildren. The knowledge or skill of any specific programming languages is not required to create state-transition diagrams. Its structure is not only suitable for the definition of a character behavior but also intuitive to help novices understand. Therefore, the users can easily create Web-based MORPG system with the tool.
NEQAIR96,Nonequilibrium and Equilibrium Radiative Transport and Spectra Program: User's Manual

NASA Technical Reports Server (NTRS)

Whiting, Ellis E.; Park, Chul; Liu, Yen; Arnold, James O.; Paterson, John A.

1996-01-01

This document is the User's Manual for a new version of the NEQAIR computer program, NEQAIR96. The program is a line-by-line and a line-of-sight code. It calculates the emission and absorption spectra for atomic and diatomic molecules and the transport of radiation through a nonuniform gas mixture to a surface. The program has been rewritten to make it easy to use, run faster, and include many run-time options that tailor a calculation to the user's requirements. The accuracy and capability have also been improved by including the rotational Hamiltonian matrix formalism for calculating rotational energy levels and Hoenl-London factors for dipole and spin-allowed singlet, doublet, triplet, and quartet transitions. Three sample cases are also included to help the user become familiar with the steps taken to produce a spectrum. A new user interface is included that uses check location, to select run-time options and to enter selected run data, making NEQAIR96 easier to use than the older versions of the code. The ease of its use and the speed of its algorithms make NEQAIR96 a valuable educational code as well as a practical spectroscopic prediction and diagnostic code.
Generalized concurrence in boson sampling.

PubMed

Chin, Seungbeom; Huh, Joonsuk

2018-04-17

A fundamental question in linear optical quantum computing is to understand the origin of the quantum supremacy in the physical system. It is found that the multimode linear optical transition amplitudes are calculated through the permanents of transition operator matrices, which is a hard problem for classical simulations (boson sampling problem). We can understand this problem by considering a quantum measure that directly determines the runtime for computing the transition amplitudes. In this paper, we suggest a quantum measure named "Fock state concurrence sum" C S , which is the summation over all the members of "the generalized Fock state concurrence" (a measure analogous to the generalized concurrences of entanglement and coherence). By introducing generalized algorithms for computing the transition amplitudes of the Fock state boson sampling with an arbitrary number of photons per mode, we show that the minimal classical runtime for all the known algorithms directly depends on C S . Therefore, we can state that the Fock state concurrence sum C S behaves as a collective measure that controls the computational complexity of Fock state BS. We expect that our observation on the role of the Fock state concurrence in the generalized algorithm for permanents would provide a unified viewpoint to interpret the quantum computing power of linear optics.
Strategies for global optimization in photonics design.

PubMed

Vukovic, Ana; Sewell, Phillip; Benson, Trevor M

2010-10-01

This paper reports on two important issues that arise in the context of the global optimization of photonic components where large problem spaces must be investigated. The first is the implementation of a fast simulation method and associated matrix solver for assessing particular designs and the second, the strategies that a designer can adopt to control the size of the problem design space to reduce runtimes without compromising the convergence of the global optimization tool. For this study an analytical simulation method based on Mie scattering and a fast matrix solver exploiting the fast multipole method are combined with genetic algorithms (GAs). The impact of the approximations of the simulation method on the accuracy and runtime of individual design assessments and the consequent effects on the GA are also examined. An investigation of optimization strategies for controlling the design space size is conducted on two illustrative examples, namely, 60° and 90° waveguide bends based on photonic microstructures, and their effectiveness is analyzed in terms of a GA's ability to converge to the best solution within an acceptable timeframe. Finally, the paper describes some particular optimized solutions found in the course of this work.
Revisiting Parallel Cyclic Reduction and Parallel Prefix-Based Algorithms for Block Tridiagonal System of Equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seal, Sudip K; Perumalla, Kalyan S; Hirshman, Steven Paul

2013-01-01

Simulations that require solutions of block tridiagonal systems of equations rely on fast parallel solvers for runtime efficiency. Leading parallel solvers that are highly effective for general systems of equations, dense or sparse, are limited in scalability when applied to block tridiagonal systems. This paper presents scalability results as well as detailed analyses of two parallel solvers that exploit the special structure of block tridiagonal matrices to deliver superior performance, often by orders of magnitude. A rigorous analysis of their relative parallel runtimes is shown to reveal the existence of a critical block size that separates the parameter space spannedmore » by the number of block rows, the block size and the processor count, into distinct regions that favor one or the other of the two solvers. Dependence of this critical block size on the above parameters as well as on machine-specific constants is established. These formal insights are supported by empirical results on up to 2,048 cores of a Cray XT4 system. To the best of our knowledge, this is the highest reported scalability for parallel block tridiagonal solvers to date.« less
A Component-Based Approach for Securing Indoor Home Care Applications.

PubMed

Agirre, Aitor; Armentia, Aintzane; Estévez, Elisabet; Marcos, Marga

2017-12-26

eHealth systems have adopted recent advances on sensing technologies together with advances in information and communication technologies (ICT) in order to provide people-centered services that improve the quality of life of an increasingly elderly population. As these eHealth services are founded on the acquisition and processing of sensitive data (e.g., personal details, diagnosis, treatments and medical history), any security threat would damage the public's confidence in them. This paper proposes a solution for the design and runtime management of indoor eHealth applications with security requirements. The proposal allows applications definition customized to patient particularities, including the early detection of health deterioration and suitable reaction (events) as well as security needs. At runtime, security support is twofold. A secured component-based platform supervises applications execution and provides events management, whilst the security of the communications among application components is also guaranteed. Additionally, the proposed event management scheme adopts the fog computing paradigm to enable local event related data storage and processing, thus saving communication bandwidth when communicating with the cloud. As a proof of concept, this proposal has been validated through the monitoring of the health status in diabetic patients at a nursing home.
Reliability prediction of ontology-based service compositions using Petri net and time series models.

PubMed

Li, Jia; Xia, Yunni; Luo, Xin

2014-01-01

OWL-S, one of the most important Semantic Web service ontologies proposed to date, provides a core ontological framework and guidelines for describing the properties and capabilities of their web services in an unambiguous, computer interpretable form. Predicting the reliability of composite service processes specified in OWL-S allows service users to decide whether the process meets the quantitative quality requirement. In this study, we consider the runtime quality of services to be fluctuating and introduce a dynamic framework to predict the runtime reliability of services specified in OWL-S, employing the Non-Markovian stochastic Petri net (NMSPN) and the time series model. The framework includes the following steps: obtaining the historical response times series of individual service components; fitting these series with a autoregressive-moving-average-model (ARMA for short) and predicting the future firing rates of service components; mapping the OWL-S process into a NMSPN model; employing the predicted firing rates as the model input of NMSPN and calculating the normal completion probability as the reliability estimate. In the case study, a comparison between the static model and our approach based on experimental data is presented and it is shown that our approach achieves higher prediction accuracy.
A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics.

PubMed

Halloran, John T; Rocke, David M

2018-05-04

Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide-spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l 2 -SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l 2 -SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l 2 -SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade .
Support for User Interfaces for Distributed Systems

NASA Technical Reports Server (NTRS)

Eychaner, Glenn; Niessner, Albert

2005-01-01

An extensible Java(TradeMark) software framework supports the construction and operation of graphical user interfaces (GUIs) for distributed computing systems typified by ground control systems that send commands to, and receive telemetric data from, spacecraft. Heretofore, such GUIs have been custom built for each new system at considerable expense. In contrast, the present framework affords generic capabilities that can be shared by different distributed systems. Dynamic class loading, reflection, and other run-time capabilities of the Java language and JavaBeans component architecture enable the creation of a GUI for each new distributed computing system with a minimum of custom effort. By use of this framework, GUI components in control panels and menus can send commands to a particular distributed system with a minimum of system-specific code. The framework receives, decodes, processes, and displays telemetry data; custom telemetry data handling can be added for a particular system. The framework supports saving and later restoration of users configurations of control panels and telemetry displays with a minimum of effort in writing system-specific code. GUIs constructed within this framework can be deployed in any operating system with a Java run-time environment, without recompilation or code changes.
Transformation as a Design Process and Runtime Architecture for High Integrity Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bespalko, S.J.; Winter, V.L.

1999-04-05

We have discussed two aspects of creating high integrity software that greatly benefit from the availability of transformation technology, which in this case is manifest by the requirement for a sophisticated backtracking parser. First, because of the potential for correctly manipulating programs via small changes, an automated non-procedural transformation system can be a valuable tool for constructing high assurance software. Second, modeling the processing of translating data into information as a, perhaps, context-dependent grammar leads to an efficient, compact implementation. From a practical perspective, the transformation process should begin in the domain language in which a problem is initially expressed.more » Thus in order for a transformation system to be practical it must be flexible with respect to domain-specific languages. We have argued that transformation applied to specification results in a highly reliable system. We also attempted to briefly demonstrate that transformation technology applied to the runtime environment will result in a safe and secure system. We thus believe that the sophisticated multi-lookahead backtracking parsing technology is central to the task of being in a position to demonstrate the existence of HIS.« less
Source and listener directivity for interactive wave-based sound propagation.

PubMed

Mehra, Ravish; Antani, Lakulish; Kim, Sujeong; Manocha, Dinesh

2014-04-01

We present an approach to model dynamic, data-driven source and listener directivity for interactive wave-based sound propagation in virtual environments and computer games. Our directional source representation is expressed as a linear combination of elementary spherical harmonic (SH) sources. In the preprocessing stage, we precompute and encode the propagated sound fields due to each SH source. At runtime, we perform the SH decomposition of the varying source directivity interactively and compute the total sound field at the listener position as a weighted sum of precomputed SH sound fields. We propose a novel plane-wave decomposition approach based on higher-order derivatives of the sound field that enables dynamic HRTF-based listener directivity at runtime. We provide a generic framework to incorporate our source and listener directivity in any offline or online frequency-domain wave-based sound propagation algorithm. We have integrated our sound propagation system in Valve's Source game engine and use it to demonstrate realistic acoustic effects such as sound amplification, diffraction low-passing, scattering, localization, externalization, and spatial sound, generated by wave-based propagation of directional sources and listener in complex scenarios. We also present results from our preliminary user study.
Improved HDRG decoders for qudit and non-Abelian quantum error correction

NASA Astrophysics Data System (ADS)

Hutter, Adrian; Loss, Daniel; Wootton, James R.

2015-03-01

Hard-decision renormalization group (HDRG) decoders are an important class of decoding algorithms for topological quantum error correction. Due to their versatility, they have been used to decode systems with fractal logical operators, color codes, qudit topological codes, and non-Abelian systems. In this work, we develop a method of performing HDRG decoding which combines strengths of existing decoders and further improves upon them. In particular, we increase the minimal number of errors necessary for a logical error in a system of linear size L from \\Theta ({{L}2/3}) to Ω ({{L}1-ε }) for any ε \\gt 0. We apply our algorithm to decoding D({{{Z}}d}) quantum double models and a non-Abelian anyon model with Fibonacci-like fusion rules, and show that it indeed significantly outperforms previous HDRG decoders. Furthermore, we provide the first study of continuous error correction with imperfect syndrome measurements for the D({{{Z}}d}) quantum double models. The parallelized runtime of our algorithm is poly(log L) for the perfect measurement case. In the continuous case with imperfect syndrome measurements, the averaged runtime is O(1) for Abelian systems, while continuous error correction for non-Abelian anyons stays an open problem.

Dynamic Load Balancing for Grid Partitioning on a SP-2 Multiprocessor: A Framework

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

1994-01-01

Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single EBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Dynamic Load Balancing For Grid Partitioning on a SP-2 Multiprocessor: A Framework

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)

1994-01-01

Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single IBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Discrete Event Modeling and Massively Parallel Execution of Epidemic Outbreak Phenomena

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perumalla, Kalyan S; Seal, Sudip K

2011-01-01

In complex phenomena such as epidemiological outbreaks, the intensity of inherent feedback effects and the significant role of transients in the dynamics make simulation the only effective method for proactive, reactive or post-facto analysis. The spatial scale, runtime speed, and behavioral detail needed in detailed simulations of epidemic outbreaks make it necessary to use large-scale parallel processing. Here, an optimistic parallel execution of a new discrete event formulation of a reaction-diffusion simulation model of epidemic propagation is presented to facilitate in dramatically increasing the fidelity and speed by which epidemiological simulations can be performed. Rollback support needed during optimistic parallelmore » execution is achieved by combining reverse computation with a small amount of incremental state saving. Parallel speedup of over 5,500 and other runtime performance metrics of the system are observed with weak-scaling execution on a small (8,192-core) Blue Gene / P system, while scalability with a weak-scaling speedup of over 10,000 is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes exceeding several hundreds of millions of individuals in the largest cases are successfully exercised to verify model scalability.« less
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Song, Fengguang; Dongarra, Jack

Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
Dynamic Distribution and Layouting of Model-Based User Interfaces in Smart Environments

NASA Astrophysics Data System (ADS)

Roscher, Dirk; Lehmann, Grzegorz; Schwartze, Veit; Blumendorf, Marco; Albayrak, Sahin

The developments in computer technology in the last decade change the ways of computer utilization. The emerging smart environments make it possible to build ubiquitous applications that assist users during their everyday life, at any time, in any context. But the variety of contexts-of-use (user, platform and environment) makes the development of such ubiquitous applications for smart environments and especially its user interfaces a challenging and time-consuming task. We propose a model-based approach, which allows adapting the user interface at runtime to numerous (also unknown) contexts-of-use. Based on a user interface modelling language, defining the fundamentals and constraints of the user interface, a runtime architecture exploits the description to adapt the user interface to the current context-of-use. The architecture provides automatic distribution and layout algorithms for adapting the applications also to contexts unforeseen at design time. Designers do not specify predefined adaptations for each specific situation, but adaptation constraints and guidelines. Furthermore, users are provided with a meta user interface to influence the adaptations according to their needs. A smart home energy management system serves as running example to illustrate the approach.
Parallel high-precision orbit propagation using the modified Picard-Chebyshev method

NASA Astrophysics Data System (ADS)

Koblick, Darin C.

2012-03-01

The modified Picard-Chebyshev method, when run in parallel, is thought to be more accurate and faster than the most efficient sequential numerical integration techniques when applied to orbit propagation problems. Previous experiments have shown that the modified Picard-Chebyshev method can have up to a one order magnitude speedup over the 12th order Runge-Kutta-Nystrom method. For this study, the evaluation of the accuracy and computational time of the modified Picard-Chebyshev method, using the Java Astrodynamics Toolkit high-precision force model, is conducted to assess its runtime performance. Simulation results of the modified Picard-Chebyshev method, implemented in MATLAB and the MATLAB Parallel Computing Toolbox, are compared against the most efficient first and second order Ordinary Differential Equation (ODE) solvers. A total of six processors were used to assess the runtime performance of the modified Picard-Chebyshev method. It was found that for all orbit propagation test cases, where the gravity model was simulated to be of higher degree and order (above 225 to increase computational overhead), the modified Picard-Chebyshev method was faster, by as much as a factor of two, than the other ODE solvers which were tested.
Behavioral and Temporal Pattern Detection Within Financial Data With Hidden Information

DTIC Science & Technology

2012-02-01

probabilistic pattern detector to monitor the pattern. 15. SUBJECT TERMS Runtime verification, Hidden data, Hidden Markov models, Formal specifications...sequences in many other fields besides financial systems [L, TV, LC, LZ ]. Rather, the technique suggested in this paper is positioned as a hybrid...operation of the pattern detector . Section 7 describes the operation of the probabilistic pattern-matching monitor, and section 8 describes three
ART/Ada design project, phase 1. Task 1 report: Overall design

NASA Technical Reports Server (NTRS)

Allen, Bradley P.

1988-01-01

The design methodology for the ART/Ada project is introduced, and the selected design for ART/Ada is described in detail. The following topics are included: object-oriented design, reusable software, documentation techniques, impact of Ada, design approach, and differences between ART-IM 1.5 and ART/Ada 1.0 prototype. Also, Ada generator and ART/Ada runtime systems are discussed.
Compiler and Runtime Support for Programming in Adaptive Parallel Environments

DTIC Science & Technology

1998-10-15

noother job is waiting for resources, and use a smaller number of processors when other jobs needresources. Setia et al. [15, 20] have shown that such...15] Vijay K. Naik, Sanjeev Setia , and Mark Squillante. Performance analysis of job scheduling policiesin parallel supercomputing environments. In...on networks ofheterogeneous workstations. Technical Report CSE-94-012, Oregon Graduate Institute of Scienceand Technology, 1994.[20] Sanjeev Setia
A Case Study in Software Adaptation

DTIC Science & Technology

2002-01-01

1 A Case Study in Software Adaptation Giuseppe Valetto Telecom Italia Lab Via Reiss Romoli 274 10148, Turin, Italy +39 011 2288788...configuration of the service; monitoring of database connectivity from within the service; monitoring of crashes and shutdowns of IM servers; monitoring of...of the IM server all share a relational database and a common runtime state repository, which make up the backend tier, and allow replicas to
A PLUG-AND-PLAY ARCHITECTURE FOR PROBABILISTIC PROGRAMMING

DTIC Science & Technology

2017-04-01

programs that use discrete numerical distributions, but even then, the space of possible outcomes may be uncountable (as a solution can be infinite...also identify conditions guaranteeing that all possible outcomes are finite (and then the probability space is discrete ). 2.2.2 The PlogiQL...and not determined at runtime. Nevertheless, the PRAiSE team plans to extend their solution to support numerical (continuous or discrete
Traveler Phase 1A Joint Review

NASA Technical Reports Server (NTRS)

St. John, Clint; Scofield, Jan; Skoog, Mark; Flock, Alex; Williams, Ethan; Guirguis, Luke; Loudon, Kevin; Sutherland, Jeffrey; Lehmann, Richard; Garland, Michael;

2017-01-01

The briefing contains the preliminary findings and suggestions for improvement of methods used in development and evaluation of a multi monitor runtime assurance architecture for autonomous flight vehicles. Initial system design, implementation, verification, and flight testing has been conducted. As of yet detailed data review is incomplete, and flight testing has been limited to initial monitor force fights. Detailed monitor flight evaluations have yet to be performed.

RELIABILITY, AVAILABILITY, AND SERVICEABILITY FOR PETASCALE HIGH-END COMPUTING AND BEYOND

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chokchai "Box" Leangsuksun

2011-05-31

Our project is a multi-institutional research effort that adopts interplay of RELIABILITY, AVAILABILITY, and SERVICEABILITY (RAS) aspects for solving resilience issues in highend scientific computing in the next generation of supercomputers. results lie in the following tracks: Failure prediction in a large scale HPC; Investigate reliability issues and mitigation techniques including in GPGPU-based HPC system; HPC resilience runtime & tools.
Tool Integration Framework for Bio-Informatics

DTIC Science & Technology

2007-04-01

Java NetBeans [11] based Integrated Development Environment (IDE) for developing modules and packaging computational tools. The framework is extremely...integrate an Eclipse front-end for Desktop Integration. Eclipse was chosen over Netbeans owing to a higher acceptance, better infrastructure...5.0. This version of Dashboard ran with NetBeans IDE 3.6 requiring Java Runtime 1.4 on a machine with Windows XP. The toolchain is executed by
DOE Office of Scientific and Technical Information (OSTI.GOV)

Werner, Mike

Why this utility? After years of upgrading the Java Runtime Environment (JRE) or the Java Software Development Kit (JDK/SDK), a Windows computer becomes littered with so many old versions that the machine may become a security risk due to exploits targeted at those older versions. This utility helps mitigate those vulnerabilities by searching for, and removing, versions 1.3.x thru 1.7.x of the Java JRE and/or JDK/SDK.
Rule-Based Runtime Verification

NASA Technical Reports Server (NTRS)

Barringer, Howard; Goldberg, Allen; Havelund, Klaus; Sen, Koushik

2003-01-01

We present a rule-based framework for defining and implementing finite trace monitoring logics, including future and past time temporal logic, extended regular expressions, real-time logics, interval logics, forms of quantified temporal logics, and so on. Our logic, EAGLE, is implemented as a Java library and involves novel techniques for rule definition, manipulation and execution. Monitoring is done on a state-by-state basis, without storing the execution trace.
Generic and Automated Runtime Program Repair

DTIC Science & Technology

2012-09-01

other person or corporation; or convey any rights or permission to manufacture, use, or sell any patented invention that may relate to them... PERSON PATRICK M. HURLEY a. REPORT U b. ABSTRACT U c. THIS PAGE U 19b. TELEPONE NUMBER (Include area code) N/A Standard Form 298...Public Release; Distribution Unlimited. 2. Introduction Software bugs are ubiquitous, and fixing them remains a difficult, time- consuming , and manual
Real-Time Ada Problem Study

DTIC Science & Technology

1989-03-24

Specified Test Verification Matri_ .. 39 3.2.6.5 Test Generation Assistance. .............. . .. ......... 40 3.2.7 Maintenance...lack of intimate knowledge of how the runtime links to the compiler generated code. Furthermore, the runime must meet a rigorous set of tests to insure...projects, and is not provided. Along with the library, a set of tests should be provided to verify the accuracy of the library after changes have been
Detecting Heap-Spraying Code Injection Attacks in Malicious Web Pages Using Runtime Execution

NASA Astrophysics Data System (ADS)

Choi, Younghan; Kim, Hyoungchun; Lee, Donghoon

The growing use of web services is increasing web browser attacks exponentially. Most attacks use a technique called heap spraying because of its high success rate. Heap spraying executes a malicious code without indicating the exact address of the code by copying it into many heap objects. For this reason, the attack has a high potential to succeed if only the vulnerability is exploited. Thus, attackers have recently begun using this technique because it is easy to use JavaScript to allocate the heap memory area. This paper proposes a novel technique that detects heap spraying attacks by executing a heap object in a real environment, irrespective of the version and patch status of the web browser. This runtime execution is used to detect various forms of heap spraying attacks, such as encoding and polymorphism. Heap objects are executed after being filtered on the basis of patterns of heap spraying attacks in order to reduce the overhead of the runtime execution. Patterns of heap spraying attacks are based on analysis of how an web browser accesses benign web sites. The heap objects are executed forcibly by changing the instruction register into the address of them after being loaded into memory. Thus, we can execute the malicious code without having to consider the version and patch status of the browser. An object is considered to contain a malicious code if the execution reaches a call instruction and then the instruction accesses the API of system libraries, such as kernel32.dll and ws_32.dll. To change registers and monitor execution flow, we used a debugger engine. A prototype, named HERAD(HEap spRAying Detector), is implemented and evaluated. In experiments, HERAD detects various forms of exploit code that an emulation cannot detect, and some heap spraying attacks that NOZZLE cannot detect. Although it has an execution overhead, HERAD produces a low number of false alarms. The processing time of several minutes is negligible because our research focuses on detecting heap spraying. This research can be applied to existing systems that collect malicious codes, such as Honeypot.
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor

NASA Astrophysics Data System (ADS)

Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.

2014-03-01

List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.

HARE: Final Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mckie, Jim

2012-01-09

This report documents the results of work done over a 6 year period under the FAST-OS programs. The first effort was called Right-Weight Kernels, (RWK) and was concerned with improving measurements of OS noise so it could be treated quantitatively; and evaluating the use of two operating systems, Linux and Plan 9, on HPC systems and determining how these operating systems needed to be extended or changed for HPC, while still retaining their general-purpose nature. The second program, HARE, explored the creation of alternative runtime models, building on RWK. All of the HARE work was done on Plan 9. Themore » HARE researchers were mindful of the very good Linux and LWK work being done at other labs and saw no need to recreate it. Even given this limited funding, the two efforts had outsized impact: _ Helped Cray decide to use Linux, instead of a custom kernel, and provided the tools needed to make Linux perform well _ Created a successor operating system to Plan 9, NIX, which has been taken in by Bell Labs for further development _ Created a standard system measurement tool, Fixed Time Quantum or FTQ, which is widely used for measuring operating systems impact on applications _ Spurred the use of the 9p protocol in several organizations, including IBM _ Built software in use at many companies, including IBM, Cray, and Google _ Spurred the creation of alternative runtimes for use on HPC systems _ Demonstrated that, with proper modifications, a general purpose operating systems can provide communications up to 3 times as effective as user-level libraries Open source was a key part of this work. The code developed for this project is in wide use and available at many places. The core Blue Gene code is available at https://bitbucket.org/ericvh/hare. We describe details of these impacts in the following sections. The rest of this report is organized as follows: First, we describe commercial impact; next, we describe the FTQ benchmark and its impact in more detail; operating systems and runtime research follows; we discuss infrastructure software; and close with a description of the new NIX operating system, future work, and conclusions.« less
New Approaches For Asteroid Spin State and Shape Modeling From Delay-Doppler Radar Images

NASA Astrophysics Data System (ADS)

Raissi, Chedy; Lamee, Mehdi; Mosiane, Olorato; Vassallo, Corinne; Busch, Michael W.; Greenberg, Adam; Benner, Lance A. M.; Naidu, Shantanu P.; Duong, Nicholas

2016-10-01

Delay-Doppler radar imaging is a powerful technique to characterize the trajectories, shapes, and spin states of near-Earth asteroids; and has yielded detailed models of dozens of objects. Reconstructing objects' shapes and spins from delay-Doppler data is a computationally intensive inversion problem. Since the 1990s, delay-Doppler data has been analyzed using the SHAPE software. SHAPE performs sequential single-parameter fitting, and requires considerable computer runtime and human intervention (Hudson 1993, Magri et al. 2007). Recently, multiple-parameter fitting algorithms have been shown to more efficiently invert delay-Doppler datasets (Greenberg & Margot 2015) - decreasing runtime while improving accuracy. However, extensive human oversight of the shape modeling process is still required. We have explored two new techniques to better automate delay-Doppler shape modeling: Bayesian optimization and a machine-learning neural network.One of the most time-intensive steps of the shape modeling process is to perform a grid search to constrain the target's spin state. We have implemented a Bayesian optimization routine that uses SHAPE to autonomously search the space of spin-state parameters. To test the efficacy of this technique, we compared it to results with human-guided SHAPE for asteroids 1992 UY4, 2000 RS11, and 2008 EV5. Bayesian optimization yielded similar spin state constraints within a factor of 3 less computer runtime.The shape modeling process could be further accelerated using a deep neural network to replace iterative fitting. We have implemented a neural network with a variational autoencoder (VAE), using a subset of known asteroid shapes and a large set of synthetic radar images as inputs to train the network. Conditioning the VAE in this manner allows the user to give the network a set of radar images and get a 3D shape model as an output. Additional development will be required to train a network to reliably render shapes from delay-Doppler images.This work was supported by NASA Ames, NVIDIA, Autodesk and the SETI Institute as part of the NASA Frontier Development Lab program.
g_contacts: Fast contact search in bio-molecular ensemble data

NASA Astrophysics Data System (ADS)

Blau, Christian; Grubmuller, Helmut

2013-12-01

Short-range interatomic interactions govern many bio-molecular processes. Therefore, identifying close interaction partners in ensemble data is an essential task in structural biology and computational biophysics. A contact search can be cast as a typical range search problem for which efficient algorithms have been developed. However, none of those has yet been adapted to the context of macromolecular ensembles, particularly in a molecular dynamics (MD) framework. Here a set-decomposition algorithm is implemented which detects all contacting atoms or residues in maximum O(Nlog(N)) run-time, in contrast to the O(N2) complexity of a brute-force approach. Catalogue identifier: AEQA_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQA_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 8945 No. of bytes in distributed program, including test data, etc.: 981604 Distribution format: tar.gz Programming language: C99. Computer: PC. Operating system: Linux. RAM: ≈Size of input frame Classification: 3, 4.14. External routines: Gromacs 4.6[1] Nature of problem: Finding atoms or residues that are closer to one another than a given cut-off. Solution method: Excluding distant atoms from distance calculations by decomposing the given set of atoms into disjoint subsets. Running time:≤O(Nlog(N)) References: [1] S. Pronk, S. Pall, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. Shirts, J.C. Smith, P. M. Kasson, D. van der Spoel, B. Hess and Erik Lindahl, Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit, Bioinformatics 29 (7) (2013).
Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Younge, Andrew J.; Pedretti, Kevin; Grant, Ryan

While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed com- puting models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging soft- ware ecosystems. In thismore » paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifi- cally, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, ef- fectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.« less
FOSS Tools for Research Data Management

NASA Astrophysics Data System (ADS)

Stender, Vivien; Jankowski, Cedric; Hammitzsch, Martin; Wächter, Joachim

2017-04-01

Established initiatives and organizations, e.g. the Initiative for Scientific Cyberinfrastructures (NSF, 2007) or the European Strategy Forum on Research Infrastructures (ESFRI, 2008), promote and foster the development of sustainable research infrastructures. These infrastructures aim the provision of services supporting scientists to search, visualize and access data, to collaborate and exchange information, as well as to publish data and other results. In this regard, Research Data Management (RDM) gains importance and thus requires the support by appropriate tools integrated in these infrastructures. Different projects provide arbitrary solutions to manage research data. However, within two projects - SUMARIO for land and water management and TERENO for environmental monitoring - solutions to manage research data have been developed based on Free and Open Source Software (FOSS) components. The resulting framework provides essential components for harvesting, storing and documenting research data, as well as for discovering, visualizing and downloading these data on the basis of standardized services stimulated considerably by enhanced data management approaches of Spatial Data Infrastructures (SDI). In order to fully exploit the potentials of these developments for enhancing data management in Geosciences the publication of software components, e.g. via GitHub, is not sufficient. We will use our experience to move these solutions into the cloud e.g. as PaaS or SaaS offerings. Our contribution will present data management solutions for the Geosciences developed in two projects. A sort of construction kit with FOSS components build the backbone for the assembly and implementation of projects specific platforms. Furthermore, an approach is presented to stimulate the reuse of FOSS RDM solutions with cloud concepts. In further projects specific RDM platforms can be set-up much faster, customized to the individual needs and tools can be added during the run-time.
Innovative Active Networking Services

DTIC Science & Technology

2004-03-01

implementation of the ML programming language and runtime system. OCaml offers a programming environment that can be formally analyzed; 3. University... language such as Java or OCaml . A typical PLANet (PLAN Active network) node would look as in Figure 1. The University of Kansas /ITTC 6 Innovative... language . Hence we will be discussing it alone. 2.1.2 OCaml OCaml provides several of the design goals required for a service level language . Some of
Real-time Cooperative Behavior for Tactical Mobile Robot Teams

DTIC Science & Technology

2001-02-01

control of multirobot missions. In particu- lar he used videogame scenarios to develop these skills, which might account for the intuition that those...to develop the following innovative research results for tacti- cal mobile robot teams: 1. A suite of new fault-tolerant reactive behaviors, 2. A...depicts the overall system architecture developed for this effort. It contains 3 major subsystems: Executive, Premission, and Runtime. The executive
Run-Time Support for Rapid Prototyping

DTIC Science & Technology

1988-12-01

prototyping. One such system is the Computer-Aided Proto- typing System (CAPS). It combines rapid prototypng with automatic program generation. Some of the...a design database, and a design management system [Ref. 3:p. 66. By using both rapid prototyping and automatic program genera- tion. CAPS will be...Most proto- typing systems perform these functions. CAPS is different in that it combines rapid prototyping with a variant of automatic program
PROOF BY GAMES

DTIC Science & Technology

2016-03-01

calculated dependency graph, which is used by the game logic to populate the game interface with various “clues”. The math runtime module is shown in...variable dependency data described in Section 3.3. In the game narrative, the “energy signature” table (See Figure 25) that delivers this...introduction of data dependency clues (see “energy signatures” above), which replaced a presentation of the FSAs that was included in the Phase One game . We
Data-Adaptable Modeling and Optimization for Runtime Adaptable Systems

DTIC Science & Technology

2016-06-08

execution scenarios e . Enables model -guided optimization algorithms that outperform state-of-the-art f. Understands the overhead of system...the Data-Adaptable System Model (DASM), that facilitates design by enabling the designer to: 1) specify both an application’s task flow as well as...systems. The MILAN [3] framework specializes in the design, simulation , and synthesis of System On Chip (SoC) applications using model -based techniques
An Overview of ARL’s Multimodal Signatures Database and Web Interface

DTIC Science & Technology

2007-12-01

ActiveX components, which hindered distribution due to license agreements and run-time license software to use such components. g. Proprietary...Overview The database consists of multimodal signature data files in the HDF5 format. Generally, each signature file contains all the ancillary...only contains information in the database, Web interface, and signature files that is releasable to the public. The Web interface consists of static
Countering Insider Threats - Handling Insider Threats Using Dynamic, Run-Time Forensics

DTIC Science & Technology

2007-10-01

able to handle the security policy requirements of a large organization containing many decentralized and diverse users, while being easily managed... contained in the TIF folder. Searching for any text string and sorting is supported also. The cache index file of Internet Explorer is not changed... containing thousands of malware software signatures. Separate datasets can be created for various classifications of malware such as encryption software
Tailoring Configuration to User’s Tasks under Uncertainty

DTIC Science & Technology

2008-04-28

CARISMA is the problem being solved. CARISMA applies microeconom- ics and game theory to make runtime decisions about allocating scarce resources among...scarce resources, these applications are running on be- half of one user. Thus, our problem has no game theoretic aspects. 2.2 Task Oriented...prediction tool [15] is based on the RPS tool and allows prediction of bandwidth online . There is additional evidence (see, for example [49
Using Application-Domain Knowledge in the Runtime Support of Multi-Experiment Computational Studies

DTIC Science & Technology

2009-01-01

PAGES 255 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b . ABSTRACT unclassified c. THIS PAGE unclassified Standard Form 298 (Rev. 8...178 6.4 Scenario A scaling and breakdown . . . . . . . . . . . . . . . . . 191 6.5 Scenario B scaling and breakdown...Scenario A, as a function of the amount of change in the performance metric parameter. . . . . . . . . . . 186 6.5 Response time in Scenario B , as a
Assessing the Potential Value of Semantic Web Technologies in Support of Military Operations

DTIC Science & Technology

2003-09-01

Teleconference). Deitel , P. J. (2002). Java, How to Program , Fourth Edition. Upper Saddle River, New Jersey: Prentice-Hall, Inc. Description Logics... how clients connect with each other to form an impromptu community. Jini™ lets programs use services in a network without knowing anything about the...another runtime program (execution engine) to determine how the computer should do it. Declarative programming is very different from the traditional
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bonachea, Dan; Hargrove, P.

GASNet is a language-independent, low-level networking layer that provides network-independent, high-performance communication primitives tailored for implementing parallel global address space SPMD languages and libraries such as UPC, UPC++, Co-Array Fortran, Legion, Chapel, and many others. The interface is primarily intended as a compilation target and for use by runtime library writers (as opposed to end users), and the primary goals are high performance, interface portability, and expressiveness. GASNet stands for "Global-Address Space Networking".
Runtime Speculative Software-Only Fault Tolerance

DTIC Science & Technology

2012-06-01

reliability of RSFT, a in-depth analysis on its window of vulnerability is also discussed and measured via simulated fault injection. The performance...propagation of faults through the entire program. For optimal performance, these techniques have to use herotic alias analysis to find the minimum set of...affect program output. No program source code or alias analysis is needed to analyze the fault propagation ahead of time. 2.3 Limitations of Existing
Funnel Libraries for Real-Time Robust Feedback Motion Planning

DTIC Science & Technology

2016-07-21

motion plans for a robot that are guaranteed to suc- ceed despite uncertainty in the environment, parametric model uncertainty, and disturbances...resulting funnel library is then used to sequentially compose motion plans at runtime while ensuring the safety of the robot . A major advantage of...the work presented here is that by explicitly taking into account the effect of uncertainty, the robot can evaluate motion plans based on how vulnerable
DOE Office of Scientific and Technical Information (OSTI.GOV)

Stevens, K; Huang, T; Buttler, D

We present the C-Cat Wordnet package, an open source library for using and modifying Wordnet. The package includes four key features: an API for modifying Synsets; implementations of standard similarity metrics, implementations of well known Word Sense Disambiguation algorithms, and an implementation of the Castanet algorithm. The library is easily extendible and usable in many runtime environments. We demonstrate it's use on two standard Word Sense Disambiguation tasks and apply the Castanet algorithm to a corpus.
Final Project Report, DynAX Innovations in Programming Models, Compilers and Runtime Systems for Dynamic Adaptive Event Driven Execution Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gao, Guang; Meister, Benoit; Padua, David

2015-12-31

This report contains a summary of the work done for the Dynax Project from 9/1/2012 to 8/31/2015. Much of the information presented is discussed in further detail in other STI annual and quarterly reports. Additionally, a NCE report was submitted covering work done from the period of 9/1/2015 to 12/31/2015.

Note on the ideal frame formulation

NASA Astrophysics Data System (ADS)

Lara, Martin

2017-09-01

An implementation of the ideal frame formulation of perturbed Keplerian motion is presented which only requires the integration of a differential system of dimension 7, contrary to the 8 variables traditionally integrated with this approach. The new formulation is based on the integration of a scaled version of the Eulerian set of redundant parameters and slightly improves runtime performance with respect to the 8-dimensional case while retaining comparable accuracy.
A Dynamic Security Framework for Ambient Intelligent Systems: A Smart-Home Based eHealth Application

NASA Astrophysics Data System (ADS)

Compagna, Luca; El Khoury, Paul; Massacci, Fabio; Saidane, Ayda

Providing context-dependent security services is an important challenge for ambient intelligent systems. The complexity and the unbounded nature of such systems make it difficult even for the most experienced and knowledgeable security engineers, to foresee all possible situations and interactions when developing the system. In order to solve this problem context based self- diagnosis and reconfiguration at runtime should be provided.
An Investigation of Run-Time Operations in a Heterogeneous Desktop Grid Environment: The Texas Tech University Desktop Grid Case Study

ERIC Educational Resources Information Center

Perez, Jerry F.

2013-01-01

The goal of the dissertation study was to evaluate the existing DG scheduling algorithm. The evaluation was developed through previously explored simulated analyses of DGs performed by researchers in the field of DG scheduling optimization and to improve the current RT framework of the DG at TTU. The author analyzed the RT of an actual DG, thereby…
Developing the Systems Engineering Experience Accelerator (SEEA) Prototype and Roadmap

DTIC Science & Technology

2013-12-31

information to be automatically presented without comment. 2.2.2 NEW FEATURES AND CAPABILITIES A number of new multiplayer capabilities were...2.4.1 OVERVIEW The EA game engine has two components: the runtime engine and the tools suite. The tools suite includes the Experience Development...the Learner. Figure 6: Experience Accelerator Logical Block Diagram The EARTE is a multiuser architecture for internet gaming . It has light
D-VASim: an interactive virtual laboratory environment for the simulation and analysis of genetic circuits.

PubMed

Baig, Hasan; Madsen, Jan

2017-01-15

Simulation and behavioral analysis of genetic circuits is a standard approach of functional verification prior to their physical implementation. Many software tools have been developed to perform in silico analysis for this purpose, but none of them allow users to interact with the model during runtime. The runtime interaction gives the user a feeling of being in the lab performing a real world experiment. In this work, we present a user-friendly software tool named D-VASim (Dynamic Virtual Analyzer and Simulator), which provides a virtual laboratory environment to simulate and analyze the behavior of genetic logic circuit models represented in an SBML (Systems Biology Markup Language). Hence, SBML models developed in other software environments can be analyzed and simulated in D-VASim. D-VASim offers deterministic as well as stochastic simulation; and differs from other software tools by being able to extract and validate the Boolean logic from the SBML model. D-VASim is also capable of analyzing the threshold value and propagation delay of a genetic circuit model. D-VASim is available for Windows and Mac OS and can be downloaded from bda.compute.dtu.dk/downloads/. haba@dtu.dk, jama@dtu.dk. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
On the Impact of Execution Models: A Case Study in Computational Chemistry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chavarría-Miranda, Daniel; Halappanavar, Mahantesh; Krishnamoorthy, Sriram

2015-05-25

Efficient utilization of high-performance computing (HPC) platforms is an important and complex problem. Execution models, abstract descriptions of the dynamic runtime behavior of the execution stack, have significant impact on the utilization of HPC systems. Using a computational chemistry kernel as a case study and a wide variety of execution models combined with load balancing techniques, we explore the impact of execution models on the utilization of an HPC system. We demonstrate a 50 percent improvement in performance by using work stealing relative to a more traditional static scheduling approach. We also use a novel semi-matching technique for load balancingmore » that has comparable performance to a traditional hypergraph-based partitioning implementation, which is computationally expensive. Using this study, we found that execution model design choices and assumptions can limit critical optimizations such as global, dynamic load balancing and finding the correct balance between available work units and different system and runtime overheads. With the emergence of multi- and many-core architectures and the consequent growth in the complexity of HPC platforms, we believe that these lessons will be beneficial to researchers tuning diverse applications on modern HPC platforms, especially on emerging dynamic platforms with energy-induced performance variability.« less
The probabilistic convolution tree: efficient exact Bayesian inference for faster LC-MS/MS protein inference.

PubMed

Serang, Oliver

2014-01-01

Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called "causal independence"). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to O(k log(k)2) and the space to O(k log(k)) where k is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions.
The Probabilistic Convolution Tree: Efficient Exact Bayesian Inference for Faster LC-MS/MS Protein Inference

PubMed Central

Serang, Oliver

2014-01-01

Exact Bayesian inference can sometimes be performed efficiently for special cases where a function has commutative and associative symmetry of its inputs (called “causal independence”). For this reason, it is desirable to exploit such symmetry on big data sets. Here we present a method to exploit a general form of this symmetry on probabilistic adder nodes by transforming those probabilistic adder nodes into a probabilistic convolution tree with which dynamic programming computes exact probabilities. A substantial speedup is demonstrated using an illustration example that can arise when identifying splice forms with bottom-up mass spectrometry-based proteomics. On this example, even state-of-the-art exact inference algorithms require a runtime more than exponential in the number of splice forms considered. By using the probabilistic convolution tree, we reduce the runtime to and the space to where is the number of variables joined by an additive or cardinal operator. This approach, which can also be used with junction tree inference, is applicable to graphs with arbitrary dependency on counting variables or cardinalities and can be used on diverse problems and fields like forward error correcting codes, elemental decomposition, and spectral demixing. The approach also trivially generalizes to multiple dimensions. PMID:24626234
An Energy-Aware Runtime Management of Multi-Core Sensory Swarms.

PubMed

Kim, Sungchan; Yang, Hoeseok

2017-08-24

In sensory swarms, minimizing energy consumption under performance constraint is one of the key objectives. One possible approach to this problem is to monitor application workload that is subject to change at runtime, and to adjust system configuration adaptively to satisfy the performance goal. As today's sensory swarms are usually implemented using multi-core processors with adjustable clock frequency, we propose to monitor the CPU workload periodically and adjust the task-to-core allocation or clock frequency in an energy-efficient way in response to the workload variations. In doing so, we present an online heuristic that determines the most energy-efficient adjustment that satisfies the performance requirement. The proposed method is based on a simple yet effective energy model that is built upon performance prediction using IPC (instructions per cycle) measured online and power equation derived empirically. The use of IPC accounts for memory intensities of a given workload, enabling the accurate prediction of execution time. Hence, the model allows us to rapidly and accurately estimate the effect of the two control knobs, clock frequency adjustment and core allocation. The experiments show that the proposed technique delivers considerable energy saving of up to 45%compared to the state-of-the-art multi-core energy management technique.
An Energy-Aware Runtime Management of Multi-Core Sensory Swarms

PubMed Central

Kim, Sungchan

2017-01-01

In sensory swarms, minimizing energy consumption under performance constraint is one of the key objectives. One possible approach to this problem is to monitor application workload that is subject to change at runtime, and to adjust system configuration adaptively to satisfy the performance goal. As today’s sensory swarms are usually implemented using multi-core processors with adjustable clock frequency, we propose to monitor the CPU workload periodically and adjust the task-to-core allocation or clock frequency in an energy-efficient way in response to the workload variations. In doing so, we present an online heuristic that determines the most energy-efficient adjustment that satisfies the performance requirement. The proposed method is based on a simple yet effective energy model that is built upon performance prediction using IPC (instructions per cycle) measured online and power equation derived empirically. The use of IPC accounts for memory intensities of a given workload, enabling the accurate prediction of execution time. Hence, the model allows us to rapidly and accurately estimate the effect of the two control knobs, clock frequency adjustment and core allocation. The experiments show that the proposed technique delivers considerable energy saving of up to 45%compared to the state-of-the-art multi-core energy management technique. PMID:28837094
Reducing I/O variability using dynamic I/O path characterization in petascale storage systems

DOE PAGES

Son, Seung Woo; Sehrish, Saba; Liao, Wei-keng; ...

2016-11-01

In petascale systems with a million CPU cores, scalable and consistent I/O performance is becoming increasingly difficult to sustain mainly because of I/O variability. Furthermore, the I/O variability is caused by concurrently running processes/jobs competing for I/O or a RAID rebuild when a disk drive fails. We present a mechanism that stripes across a selected subset of I/O nodes with the lightest workload at runtime to achieve the highest I/O bandwidth available in the system. In this paper, we propose a probing mechanism to enable application-level dynamic file striping to mitigate I/O variability. We also implement the proposed mechanism inmore » the high-level I/O library that enables memory-to-file data layout transformation and allows transparent file partitioning using subfiling. Subfiling is a technique that partitions data into a set of files of smaller size and manages file access to them, making data to be treated as a single, normal file to users. Here, we demonstrate that our bandwidth probing mechanism can successfully identify temporally slower I/O nodes without noticeable runtime overhead. Experimental results on NERSC’s systems also show that our approach isolates I/O variability effectively on shared systems and improves overall collective I/O performance with less variation.« less
Software Engineering Support of the Third Round of Scientific Grand Challenge Investigations: An Earth Modeling System Software Framework Strawman Design that Integrates Cactus and UCLA/UCB Distributed Data Broker

NASA Technical Reports Server (NTRS)

Talbot, Bryan; Zhou, Shu-Jia; Higgins, Glenn

2002-01-01

One of the most significant challenges in large-scale climate modeling, as well as in high-performance computing in other scientific fields, is that of effectively integrating many software models from multiple contributors. A software framework facilitates the integration task. both in the development and runtime stages of the simulation. Effective software frameworks reduce the programming burden for the investigators, freeing them to focus more on the science and less on the parallel communication implementation, while maintaining high performance across numerous supercomputer and workstation architectures. This document proposes a strawman framework design for the climate community based on the integration of Cactus, from the relativistic physics community, and UCLA/UCB Distributed Data Broker (DDB) from the climate community. This design is the result of an extensive survey of climate models and frameworks in the climate community as well as frameworks from many other scientific communities. The design addresses fundamental development and runtime needs using Cactus, a framework with interfaces for FORTRAN and C-based languages, and high-performance model communication needs using DDB. This document also specifically explores object-oriented design issues in the context of climate modeling as well as climate modeling issues in terms of object-oriented design.
Reconfigurable Autonomy for Future Planetary Rovers

NASA Astrophysics Data System (ADS)

Burroughes, Guy

Extra-terrestrial Planetary rover systems are uniquely remote, placing constraints in regard to communication, environmental uncertainty, and limited physical resources, and requiring a high level of fault tolerance and resistance to hardware degradation. This thesis presents a novel self-reconfiguring autonomous software architecture designed to meet the needs of extraterrestrial planetary environments. At runtime it can safely reconfigure low-level control systems, high-level decisional autonomy systems, and managed software architecture. The architecture can perform automatic Verification and Validation of self-reconfiguration at run-time, and enables a system to be self-optimising, self-protecting, and self-healing. A novel self-monitoring system, which is non-invasive, efficient, tunable, and autonomously deploying, is also presented. The architecture was validated through the use-case of a highly autonomous extra-terrestrial planetary exploration rover. Three major forms of reconfiguration were demonstrated and tested: first, high level adjustment of system internal architecture and goal; second, software module modification; and third, low level alteration of hardware control in response to degradation of hardware and environmental change. The architecture was demonstrated to be robust and effective in a Mars sample return mission use-case testing the operational aspects of a novel, reconfigurable guidance, navigation, and control system for a planetary rover, all operating in concert through a scenario that required reconfiguration of all elements of the system.
Comfort in High-Performance Homes in a Hot-Humid Climate

DOE Office of Scientific and Technical Information (OSTI.GOV)

Poerschke, A.; Beach, R.

2016-01-22

"9IBACOS monitored 37 homes during the late summer and early fall of 2014 in a hot and humid climate to better understand indoor comfort conditions. These homes were constructed in the last several years by four home builders that offered a comfort and performance guarantee for the homes. The homes were located in one of four cities: Tampa, Florida; Orlando, Florida; Houston, Texas; and San Antonio, Texas. Temperature and humidity data were collected from the thermostat and each room of the house using small, battery-powered data loggers. To understand system runtime and its impact on comfort, supply air temperature alsomore » was measured on a 1-minute interval. Overall, the group of homes only exceeded a room-to-room temperature difference of 6 degrees F for 5% of the time. For 80% of the time, the rooms in each house were within 4 degrees F of each other. Additionally, the impact of system runtime on comfort is discussed. Finally, measurements made at the thermostat were used to better understand the occupant operation of each cooling system's thermostat setpoint. Builders were questioned on their perceived impact of offering a comfort and performance guarantee. Their feedback, which generally indicates a positive perception, has been summarized in the report.« less
Efficient Maximum Likelihood Estimation for Pedigree Data with the Sum-Product Algorithm.

PubMed

Engelhardt, Alexander; Rieger, Anna; Tresch, Achim; Mansmann, Ulrich

2016-01-01

We analyze data sets consisting of pedigrees with age at onset of colorectal cancer (CRC) as phenotype. The occurrence of familial clusters of CRC suggests the existence of a latent, inheritable risk factor. We aimed to compute the probability of a family possessing this risk factor as well as the hazard rate increase for these risk factor carriers. Due to the inheritability of this risk factor, the estimation necessitates a costly marginalization of the likelihood. We propose an improved EM algorithm by applying factor graphs and the sum-product algorithm in the E-step. This reduces the computational complexity from exponential to linear in the number of family members. Our algorithm is as precise as a direct likelihood maximization in a simulation study and a real family study on CRC risk. For 250 simulated families of size 19 and 21, the runtime of our algorithm is faster by a factor of 4 and 29, respectively. On the largest family (23 members) in the real data, our algorithm is 6 times faster. We introduce a flexible and runtime-efficient tool for statistical inference in biomedical event data with latent variables that opens the door for advanced analyses of pedigree data. © 2017 S. Karger AG, Basel.
Vertical Object Layout and Compression for Fixed Heaps

NASA Astrophysics Data System (ADS)

Titzer, Ben L.; Palsberg, Jens

Research into embedded sensor networks has placed increased focus on the problem of developing reliable and flexible software for microcontroller-class devices. Languages such as nesC [10] and Virgil [20] have brought higher-level programming idioms to this lowest layer of software, thereby adding expressiveness. Both languages are marked by the absence of dynamic memory allocation, which removes the need for a runtime system to manage memory. While nesC offers code modules with statically allocated fields, arrays and structs, Virgil allows the application to allocate and initialize arbitrary objects during compilation, producing a fixed object heap for runtime. This paper explores techniques for compressing fixed object heaps with the goal of reducing the RAM footprint of a program. We explore table-based compression and introduce a novel form of object layout called vertical object layout. We provide experimental results that measure the impact on RAM size, code size, and execution time for a set of Virgil programs. Our results show that compressed vertical layout has better execution time and code size than table-based compression while achieving more than 20% heap reduction on 6 of 12 benchmark programs and 2-17% heap reduction on the remaining 6. We also present a formalization of vertical object layout and prove tight relationships between three styles of object layout.
Enhancing knowledge discovery from cancer genomics data with Galaxy

PubMed Central

Albuquerque, Marco A.; Grande, Bruno M.; Ritch, Elie J.; Pararajalingam, Prasath; Jessa, Selin; Krzywinski, Martin; Grewal, Jasleen K.; Shah, Sohrab P.; Boutros, Paul C.

2017-01-01

Abstract The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker. PMID:28327945
Lightweight Adaptation of Classifiers to Users and Contexts: Trends of the Emerging Domain

PubMed Central

Vildjiounaite, Elena; Gimel'farb, Georgy; Kyllönen, Vesa; Peltola, Johannes

2015-01-01

Intelligent computer applications need to adapt their behaviour to contexts and users, but conventional classifier adaptation methods require long data collection and/or training times. Therefore classifier adaptation is often performed as follows: at design time application developers define typical usage contexts and provide reasoning models for each of these contexts, and then at runtime an appropriate model is selected from available ones. Typically, definition of usage contexts and reasoning models heavily relies on domain knowledge. However, in practice many applications are used in so diverse situations that no developer can predict them all and collect for each situation adequate training and test databases. Such applications have to adapt to a new user or unknown context at runtime just from interaction with the user, preferably in fairly lightweight ways, that is, requiring limited user effort to collect training data and limited time of performing the adaptation. This paper analyses adaptation trends in several emerging domains and outlines promising ideas, proposed for making multimodal classifiers user-specific and context-specific without significant user efforts, detailed domain knowledge, and/or complete retraining of the classifiers. Based on this analysis, this paper identifies important application characteristics and presents guidelines to consider these characteristics in adaptation design. PMID:26473165
Performance Analysis, Modeling and Scaling of HPC Applications and Tools

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bhatele, Abhinav

2016-01-13

E cient use of supercomputers at DOE centers is vital for maximizing system throughput, mini- mizing energy costs and enabling science breakthroughs faster. This requires complementary e orts along several directions to optimize the performance of scienti c simulation codes and the under- lying runtimes and software stacks. This in turn requires providing scalable performance analysis tools and modeling techniques that can provide feedback to physicists and computer scientists developing the simulation codes and runtimes respectively. The PAMS project is using time allocations on supercomputers at ALCF, NERSC and OLCF to further the goals described above by performing research alongmore » the following fronts: 1. Scaling Study of HPC applications; 2. Evaluation of Programming Models; 3. Hardening of Performance Tools; 4. Performance Modeling of Irregular Codes; and 5. Statistical Analysis of Historical Performance Data. We are a team of computer and computational scientists funded by both DOE/NNSA and DOE/ ASCR programs such as ECRP, XStack (Traleika Glacier, PIPER), ExaOSR (ARGO), SDMAV II (MONA) and PSAAP II (XPACC). This allocation will enable us to study big data issues when analyzing performance on leadership computing class systems and to assist the HPC community in making the most e ective use of these resources.« less
Enhancing knowledge discovery from cancer genomics data with Galaxy.

PubMed

Albuquerque, Marco A; Grande, Bruno M; Ritch, Elie J; Pararajalingam, Prasath; Jessa, Selin; Krzywinski, Martin; Grewal, Jasleen K; Shah, Sohrab P; Boutros, Paul C; Morin, Ryan D

2017-05-01

The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker. © The Author 2017. Published by Oxford University Press.

Guaranteeing Isochronous Control of Networked Motion Control Systems Using Phase Offset Adjustment

PubMed Central

Kim, Ikhwan; Kim, Taehyoun

2015-01-01

Guaranteeing isochronous transfer of control commands is an essential function for networked motion control systems. The adoption of real-time Ethernet (RTE) technologies may be profitable in guaranteeing deterministic transfer of control messages. However, unpredictable behavior of software in the motion controller often results in unexpectedly large deviation in control message transmission intervals, and thus leads to imprecise motion. This paper presents a simple and efficient heuristic to guarantee the end-to-end isochronous control with very small jitter. The key idea of our approach is to adjust the phase offset of control message transmission time in the motion controller by investigating the behavior of motion control task. In realizing the idea, we performed a pre-runtime analysis to determine a safe and reliable phase offset and applied the phase offset to the runtime code of motion controller by customizing an open-source based integrated development environment (IDE). We also constructed an EtherCAT-based motion control system testbed and performed extensive experiments on the testbed to verify the effectiveness of our approach. The experimental results show that our heuristic is highly effective even for low-end embedded controller implemented in open-source software components under various configurations of control period and the number of motor drives. PMID:26076407
An Overview of the Runtime Verification Tool Java PathExplorer

NASA Technical Reports Server (NTRS)

Havelund, Klaus; Rosu, Grigore; Clancy, Daniel (Technical Monitor)

2002-01-01

We present an overview of the Java PathExplorer runtime verification tool, in short referred to as JPAX. JPAX can monitor the execution of a Java program and check that it conforms with a set of user provided properties formulated in temporal logic. JPAX can in addition analyze the program for concurrency errors such as deadlocks and data races. The concurrency analysis requires no user provided specification. The tool facilitates automated instrumentation of a program's bytecode, which when executed will emit an event stream, the execution trace, to an observer. The observer dispatches the incoming event stream to a set of observer processes, each performing a specialized analysis, such as the temporal logic verification, the deadlock analysis and the data race analysis. Temporal logic specifications can be formulated by the user in the Maude rewriting logic, where Maude is a high-speed rewriting system for equational logic, but here extended with executable temporal logic. The Maude rewriting engine is then activated as an event driven monitoring process. Alternatively, temporal specifications can be translated into efficient automata, which check the event stream. JPAX can be used during program testing to gain increased information about program executions, and can potentially furthermore be applied during operation to survey safety critical systems.
Survey of Verification and Validation Techniques for Small Satellite Software Development

NASA Technical Reports Server (NTRS)

Jacklin, Stephen A.

2015-01-01

The purpose of this paper is to provide an overview of the current trends and practices in small-satellite software verification and validation. This document is not intended to promote a specific software assurance method. Rather, it seeks to present an unbiased survey of software assurance methods used to verify and validate small satellite software and to make mention of the benefits and value of each approach. These methods include simulation and testing, verification and validation with model-based design, formal methods, and fault-tolerant software design with run-time monitoring. Although the literature reveals that simulation and testing has by far the longest legacy, model-based design methods are proving to be useful for software verification and validation. Some work in formal methods, though not widely used for any satellites, may offer new ways to improve small satellite software verification and validation. These methods need to be further advanced to deal with the state explosion problem and to make them more usable by small-satellite software engineers to be regularly applied to software verification. Last, it is explained how run-time monitoring, combined with fault-tolerant software design methods, provides an important means to detect and correct software errors that escape the verification process or those errors that are produced after launch through the effects of ionizing radiation.
Reliability Prediction of Ontology-Based Service Compositions Using Petri Net and Time Series Models

PubMed Central

Li, Jia; Xia, Yunni; Luo, Xin

2014-01-01

OWL-S, one of the most important Semantic Web service ontologies proposed to date, provides a core ontological framework and guidelines for describing the properties and capabilities of their web services in an unambiguous, computer interpretable form. Predicting the reliability of composite service processes specified in OWL-S allows service users to decide whether the process meets the quantitative quality requirement. In this study, we consider the runtime quality of services to be fluctuating and introduce a dynamic framework to predict the runtime reliability of services specified in OWL-S, employing the Non-Markovian stochastic Petri net (NMSPN) and the time series model. The framework includes the following steps: obtaining the historical response times series of individual service components; fitting these series with a autoregressive-moving-average-model (ARMA for short) and predicting the future firing rates of service components; mapping the OWL-S process into a NMSPN model; employing the predicted firing rates as the model input of NMSPN and calculating the normal completion probability as the reliability estimate. In the case study, a comparison between the static model and our approach based on experimental data is presented and it is shown that our approach achieves higher prediction accuracy. PMID:24688429
DOE Office of Scientific and Technical Information (OSTI.GOV)

Hansen, Timothy M.; Palmintier, Bryan; Suryanarayanan, Siddharth

As more Smart Grid technologies (e.g., distributed photovoltaic, spatially distributed electric vehicle charging) are integrated into distribution grids, static distribution simulations are no longer sufficient for performing modeling and analysis. GridLAB-D is an agent-based distribution system simulation environment that allows fine-grained end-user models, including geospatial and network topology detail. A problem exists in that, without outside intervention, once the GridLAB-D simulation begins execution, it will run to completion without allowing the real-time interaction of Smart Grid controls, such as home energy management systems and aggregator control. We address this lack of runtime interaction by designing a flexible communication interface, Bus.pymore » (pronounced bus-dot-pie), that uses Python to pass messages between one or more GridLAB-D instances and a Smart Grid simulator. This work describes the design and implementation of Bus.py, discusses its usefulness in terms of some Smart Grid scenarios, and provides an example of an aggregator-based residential demand response system interacting with GridLAB-D through Bus.py. The small scale example demonstrates the validity of the interface and shows that an aggregator using said interface is able to control residential loads in GridLAB-D during runtime to cause a reduction in the peak load on the distribution system in (a) peak reduction and (b) time-of-use pricing cases.« less
Clustering Millions of Faces by Identity.

PubMed

Otto, Charles; Wang, Dayong; Jain, Anil K

2018-02-01

Given a large collection of unlabeled face images, we address the problem of clustering faces into an unknown number of identities. This problem is of interest in social media, law enforcement, and other applications, where the number of faces can be of the order of hundreds of million, while the number of identities (clusters) can range from a few thousand to millions. To address the challenges of run-time complexity and cluster quality, we present an approximate Rank-Order clustering algorithm that performs better than popular clustering algorithms (k-Means and Spectral). Our experiments include clustering up to 123 million face images into over 10 million clusters. Clustering results are analyzed in terms of external (known face labels) and internal (unknown face labels) quality measures, and run-time. Our algorithm achieves an F-measure of 0.87 on the LFW benchmark (13 K faces of 5,749 individuals), which drops to 0.27 on the largest dataset considered (13 K faces in LFW + 123M distractor images). Additionally, we show that frames in the YouTube benchmark can be clustered with an F-measure of 0.71. An internal per-cluster quality measure is developed to rank individual clusters for manual exploration of high quality clusters that are compact and isolated.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Samuel; Patterson, David; Oliker, Leonid

This article consists of a collection of slides from the authors' conference presentation. The Roofline model is a visually intuitive figure for kernel analysis and optimization. We believe undergraduates will find it useful in assessing performance and scalability limitations. It is easily extended to other architectural paradigms. It is easily extendable to other metrics: performance (sort, graphics, crypto..) bandwidth (L2, PCIe, ..). Furthermore, a performance counters could be used to generate a runtime-specific roofline that would greatly aide the optimization.
Specifications for Managed Strings, Second Edition

DTIC Science & Technology

2010-05-01

const char * cstr , const size_t maxsize, const char *charset); 10 | CMU/SEI-2010-TR-018 Runtime-Constraints s shall not be a null pointer...strcreate_m function creates a managed string, referenced by s, given a conventional string cstr (which may be null or empty). maxsize specifies the...characters to those in the null-terminated byte string cstr (which may be empty). If charset is a null pointer, no restricted character set is defined. If
Ada 9X Project Report: Ada 9X Revision Issues. Release 1

DTIC Science & Technology

1990-04-01

interrupts in Ada. Users are using specialized run-time executives which promote semaphores , monitors , etc ., as well as interrupt support, are using...The focus here is on two specific problems: 1. lack of time-out on operations . 2. no efficient way to program a shared-variable monitor for the... operation . 43 !Issue implementation [3 - Remote Operations for Real-Time Systems ] The real-time implementation standards should define various remote
Detecting Malicious Tweets in Twitter Using Runtime Monitoring With Hidden Information

DTIC Science & Technology

2016-06-01

text mining using Twitter streaming API and python [Online]. Available: http://adilmoujahid.com/posts/2014/07/twitter-analytics/ [22] M. Singh, B...sites with 645,750,000 registered users [3] and has open source public tweets for data mining . 2. Malicious Users and Tweets In the modern world...want to data mine in Twitter, and presents the natural language assertions and corresponding rule patterns. It then describes the steps performed using
UAV Swarm Tactics: An Agent-Based Simulation and Markov Process Analysis

DTIC Science & Technology

2013-06-01

CRN Common Random Numbers CSV Comma Separated Values DoE Design of Experiment GLM Generalized Linear Model HVT High Value Target JAR Java ARchive JMF... Java Media Framework JRE Java runtime environment Mason Multi-Agent Simulator Of Networks MOE Measure Of Effectiveness MOP Measures Of Performance...with every set several times, and to write a CSV file with the results. Rather than scripting the agent behavior deterministically, the agents should
GENASIS Basics: Object-oriented utilitarian functionality for large-scale physics simulations (Version 2)

NASA Astrophysics Data System (ADS)

Cardall, Christian Y.; Budiardja, Reuben D.

2017-05-01

GenASiS Basics provides Fortran 2003 classes furnishing extensible object-oriented utilitarian functionality for large-scale physics simulations on distributed memory supercomputers. This functionality includes physical units and constants; display to the screen or standard output device; message passing; I/O to disk; and runtime parameter management and usage statistics. This revision -Version 2 of Basics - makes mostly minor additions to functionality and includes some simplifying name changes.
Languages for Software-Defined Networks

DTIC Science & Technology

2013-02-01

1 Languages for Software-Defined Networks Nate Foster∗, Michael J. Freedman†, Arjun Guha∗, Rob Harrison‡, Naga Praveen Katta†, Christopher Monsanto ...2012. [12] “The Frenetic project.” http://www.frenetic-lang.org/, Sept. 2012. [13] N. Foster, R. Harrison, M. J. Freedman, C. Monsanto , J. Rexford...performance networks,” in ACM SIGCOMM, pp. 254–265, Aug. 2011. [15] C. Monsanto , N. Foster, R. Harrison, and D. Walker, “A compiler and run-time system for
A Development Testbed for ALPS-Based Systems

DTIC Science & Technology

1988-10-01

alloted to tile application because of size or power constraints). Given an underlying support ALPS architecture such as the d-ALPS architecture, a...resource on which it is assigned at runtime. A second representation problem is that most graph analysis algorithms treat either graphs with weighted links...subtask) associated with it but is treated like other links. In d-ALPS, as a priority precedence link, it would cause the binding of a pro- cessor: as a
Application of Detailed Chemical Kinetics to Combustion Instability Modeling

DTIC Science & Technology

2016-01-04

Modeling 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Harvazinski, Matt; Talley, Doug; Sankaran, Venke 5d. PROJECT...Chemical Kinetics to Combustion Instability Modeling Matt Harvazinski, Doug Talley, Venke Sankaran Air Force Research Laboratory Edwards AFB, CA...distribution unlimited. 3 Prior Work – Kinetics Used • Simulations : 1) 3D real geometry 2) Unsteady 3) Long run-times 4) Coupled physics • 1- 4
High Performance Databases For Scientific Applications

NASA Technical Reports Server (NTRS)

French, James C.; Grimshaw, Andrew S.

1997-01-01

The goal for this task is to develop an Extensible File System (ELFS). ELFS attacks the problem of the following: 1. Providing high bandwidth performance architectures; 2. Reducing the cognitive burden faced by applications programmers when they attempt to optimize; and 3. Seamlessly managing the proliferation of data formats and architectural differences. The approach for ELFS solution consists of language and run-time system support that permits the specification on a hierarchy of file classes.
An Exploratory Analysis of Projected Navy Officer Inventory Strength Using Data Farming

DTIC Science & Technology

2016-09-01

model’s run-time. 3. Base Case In addition to the experimental design, this study includes a base case scenario to serve as a baseline for comparison...47 3. SWO Operating Strength Deviation-Base Case One objective of this study is to determine the risk in operating strength deviation presented by...ANSWERS TO RESEARCH QUESTIONS ................................... 71 B. RECOMMENDATIONS FOR FUTURE STUDIES ......................... 73 1. Continuous
GOTCHA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Poliakoff, David; Legendre, Matt

2017-03-29

GOTCHA is a runtime API intercepting function calls between shared libraries. It is intended to be used by HPC Tools (i.e., performance analysis tools like Open/SpeedShop, HPCToolkit, TAU, etc.). 2:18 PMThese other tools can use Gotch to intercept interesting functions, such as MPI functions, and collect performance metrics about those functions. We intend for this to be open-source software that gets adopted by other open-s0urse tools that are used at LLNL.
Global Deployment Anaylsis System Algorithm Description (With Updates)

DTIC Science & Technology

1998-09-01

Global Deployment Analysis System Algorithm Description (with Updates) By Noetics , Inc. For U.S. Army Concepts Analysis Agency Contract...t "O -Q £5.3 Q 20000224 107 aQU’no-bi-o^f r This Algorithm Description for the Global Deployment Analysis System (GDAS) was prepared by Noetics ...support for Paradox Runtime will be provided by the GDAS developers, CAA and Noetics Inc., and not by Borland International. GDAS for Windows has
The Impact on Quality of Service When Using Security-Enabling Filters to Provide for the Security of Run-Time Virtual Environments

DTIC Science & Technology

2002-09-01

Secure Multicast......................................................................24 i. Message Digests and Message Authentication Codes ( MACs ...that is, the needs of the VE will determine what the design will look like (e.g., reliable vs . unreliable data communications). In general, there...Molva00] and [Abdalla00]. i. Message Digests and Message Authentication Codes ( MACs ) Message digests and MACs are used for data integrity verification

Simple and rapid method for the detection of Filobasidiella neoformans in a probiotic dairy product by using loop-mediated isothermal amplification.

PubMed

Ishikawa, Hiroshi; Kasahara, Kohei; Sato, Sumie; Shimakawa, Yasuhisa; Watanabe, Koichi

2014-05-16

Yeast contamination is a serious problem in the food industry and a major cause of food spoilage. Several yeasts, such as Filobasidiella neoformans, which cause cryptococcosis in humans, are also opportunistic pathogens, so a simple and rapid method for monitoring yeast contamination in food is essential. Here, we developed a simple and rapid method that utilizes loop-mediated isothermal amplification (LAMP) for the detection of F. neoformans. A set of five specific LAMP primers was designed that targeted the 5.8S-26S rDNA internal transcribed spacer 2 region of F. neoformans, and the primer set's specificity was confirmed. In a pure culture of F. neoformans, the LAMP assay had a lower sensitivity threshold of 10(2)cells/mL at a runtime of 60min. In a probiotic dairy product artificially contaminated with F. neoformans, the LAMP assay also had a lower sensitivity threshold of 10(2)cells/mL, which was comparable to the sensitivity of a quantitative PCR (qPCR) assay. We also developed a simple two-step method for the extraction of DNA from a probiotic dairy product that can be performed within 15min. This method involves initial protease treatment of the test sample at 45°C for 3min followed by boiling at 100°C for 5min under alkaline conditions. In a probiotic dairy product artificially contaminated with F. neoformans, analysis by means of our novel DNA extraction method followed by LAMP with our specific primer set had a lower sensitivity threshold of 10(3)cells/mL at a runtime of 60min. In contrast, use of our novel method of DNA extraction followed by qPCR assay had a lower sensitivity threshold of only 10(5)cells/mL at a runtime of 3 to 4h. Therefore, unlike the PCR assay, our LAMP assay can be used to quickly evaluate yeast contamination and is sensitive even for crude samples containing bacteria or background impurities. Our study provides a powerful tool for the primary screening of large numbers of food samples for yeast contamination. Copyright © 2014 Elsevier B.V. All rights reserved.
Compiled MPI: Cost-Effective Exascale Applications Development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bronevetsky, G; Quinlan, D; Lumsdaine, A

2012-04-10

The complexity of petascale and exascale machines makes it increasingly difficult to develop applications that can take advantage of them. Future systems are expected to feature billion-way parallelism, complex heterogeneous compute nodes and poor availability of memory (Peter Kogge, 2008). This new challenge for application development is motivating a significant amount of research and development on new programming models and runtime systems designed to simplify large-scale application development. Unfortunately, DoE has significant multi-decadal investment in a large family of mission-critical scientific applications. Scaling these applications to exascale machines will require a significant investment that will dwarf the costs of hardwaremore » procurement. A key reason for the difficulty in transitioning today's applications to exascale hardware is their reliance on explicit programming techniques, such as the Message Passing Interface (MPI) programming model to enable parallelism. MPI provides a portable and high performance message-passing system that enables scalable performance on a wide variety of platforms. However, it also forces developers to lock the details of parallelization together with application logic, making it very difficult to adapt the application to significant changes in the underlying system. Further, MPI's explicit interface makes it difficult to separate the application's synchronization and communication structure, reducing the amount of support that can be provided by compiler and run-time tools. This is in contrast to the recent research on more implicit parallel programming models such as Chapel, OpenMP and OpenCL, which promise to provide significantly more flexibility at the cost of reimplementing significant portions of the application. We are developing CoMPI, a novel compiler-driven approach to enable existing MPI applications to scale to exascale systems with minimal modifications that can be made incrementally over the application's lifetime. It includes: (1) New set of source code annotations, inserted either manually or automatically, that will clarify the application's use of MPI to the compiler infrastructure, enabling greater accuracy where needed; (2) A compiler transformation framework that leverages these annotations to transform the original MPI source code to improve its performance and scalability; (3) Novel MPI runtime implementation techniques that will provide a rich set of functionality extensions to be used by applications that have been transformed by our compiler; and (4) A novel compiler analysis that leverages simple user annotations to automatically extract the application's communication structure and synthesize most complex code annotations.« less
A comparison of accuracy and computational feasibility of two record linkage algorithms in retrieving vital status information from HIV/AIDS patients registered in Brazilian public databases.

PubMed

de Paula, Adelzon Assis; Pires, Denise Franqueira; Filho, Pedro Alves; de Lemos, Kátia Regina Valente; Barçante, Eduardo; Pacheco, Antonio Guilherme

2018-06-01

While cross-referencing information from people living with HIV/AIDS (PLWHA) to the official mortality database is a critical step in monitoring the HIV/AIDS epidemic in Brazil, the accuracy of the linkage routine may compromise the validity of the final database, yielding to biased epidemiological estimates. We compared the accuracy and the total runtime of two linkage algorithms applied to retrieve vital status information from PLWHA in Brazilian public databases. Nominally identified records from PLWHA were obtained from three distinct government databases. Linkage routines included an algorithm in Python language (PLA) and Reclink software (RlS), a probabilistic software largely utilized in Brazil. Records from PLWHA 1 known to be alive were added to those from patients reported as deceased. Data were then searched into the mortality system. Scenarios where 5% and 50% of patients actually dead were simulated, considering both complete cases and 20% missing maternal names. When complete information was available both algorithms had comparable accuracies. In the scenario of 20% missing maternal names, PLA 2 and RlS 3 had sensitivities of 94.5% and 94.6% (p > 0.5), respectively; after manual reviewing, PLA sensitivity increased to 98.4% (96.6-100.0) exceeding that for RlS (p < 0.01). PLA had higher positive predictive value in 5% death proportion. Manual reviewing was intrinsically required by RlS in up to 14% register for people actually dead, whereas the corresponding proportion ranged from 1.5% to 2% for PLA. The lack of manual inspection did not alter PLA sensitivity when complete information was available. When incomplete data was available PLA sensitivity increased from 94.5% to 98.4%, thus exceeding that presented by RlS (94.6%, p < 0.05). RlS spanned considerably less processing time compared to PLA. Both linkage algorithms presented interchangeable accuracies in retrieving vital status data from PLWHA. RlS had a considerably lesser runtime but intrinsically required manually reviewing a fastidious proportion of the matched registries. On the other hand, PLA spent quite more runtime but spared manual reviewing at no expense of accuracy. Copyright © 2018 Elsevier B.V. All rights reserved.
Performance Comparison of the Digital Neuromorphic Hardware SpiNNaker and the Neural Network Simulation Software NEST for a Full-Scale Cortical Microcircuit Model

PubMed Central

van Albada, Sacha J.; Rowley, Andrew G.; Senk, Johanna; Hopkins, Michael; Schmidt, Maximilian; Stokes, Alan B.; Lester, David R.; Diesmann, Markus; Furber, Steve B.

2018-01-01

The digital neuromorphic hardware SpiNNaker has been developed with the aim of enabling large-scale neural network simulations in real time and with low power consumption. Real-time performance is achieved with 1 ms integration time steps, and thus applies to neural networks for which faster time scales of the dynamics can be neglected. By slowing down the simulation, shorter integration time steps and hence faster time scales, which are often biologically relevant, can be incorporated. We here describe the first full-scale simulations of a cortical microcircuit with biological time scales on SpiNNaker. Since about half the synapses onto the neurons arise within the microcircuit, larger cortical circuits have only moderately more synapses per neuron. Therefore, the full-scale microcircuit paves the way for simulating cortical circuits of arbitrary size. With approximately 80, 000 neurons and 0.3 billion synapses, this model is the largest simulated on SpiNNaker to date. The scale-up is enabled by recent developments in the SpiNNaker software stack that allow simulations to be spread across multiple boards. Comparison with simulations using the NEST software on a high-performance cluster shows that both simulators can reach a similar accuracy, despite the fixed-point arithmetic of SpiNNaker, demonstrating the usability of SpiNNaker for computational neuroscience applications with biological time scales and large network size. The runtime and power consumption are also assessed for both simulators on the example of the cortical microcircuit model. To obtain an accuracy similar to that of NEST with 0.1 ms time steps, SpiNNaker requires a slowdown factor of around 20 compared to real time. The runtime for NEST saturates around 3 times real time using hybrid parallelization with MPI and multi-threading. However, achieving this runtime comes at the cost of increased power and energy consumption. The lowest total energy consumption for NEST is reached at around 144 parallel threads and 4.6 times slowdown. At this setting, NEST and SpiNNaker have a comparable energy consumption per synaptic event. Our results widen the application domain of SpiNNaker and help guide its development, showing that further optimizations such as synapse-centric network representation are necessary to enable real-time simulation of large biological neural networks. PMID:29875620
Efficient voxel navigation for proton therapy dose calculation in TOPAS and Geant4

NASA Astrophysics Data System (ADS)

Schümann, J.; Paganetti, H.; Shin, J.; Faddegon, B.; Perl, J.

2012-06-01

A key task within all Monte Carlo particle transport codes is ‘navigation’, the calculation to determine at each particle step what volume the particle may be leaving and what volume the particle may be entering. Navigation should be optimized to the specific geometry at hand. For patient dose calculation, this geometry generally involves voxelized computed tomography (CT) data. We investigated the efficiency of navigation algorithms on currently available voxel geometry parameterizations in the Monte Carlo simulation package Geant4: G4VPVParameterisation, G4VNestedParameterisation and G4PhantomParameterisation, the last with and without boundary skipping, a method where neighboring voxels with the same Hounsfield unit are combined into one larger voxel. A fourth parameterization approach (MGHParameterization), developed in-house before the latter two parameterizations became available in Geant4, was also included in this study. All simulations were performed using TOPAS, a tool for particle simulations layered on top of Geant4. Runtime comparisons were made on three distinct patient CT data sets: a head and neck, a liver and a prostate patient. We included an additional version of these three patients where all voxels, including the air voxels outside of the patient, were uniformly set to water in the runtime study. The G4VPVParameterisation offers two optimization options. One option has a 60-150 times slower simulation speed. The other is compatible in speed but requires 15-19 times more memory compared to the other parameterizations. We found the average CPU time used for the simulation relative to G4VNestedParameterisation to be 1.014 for G4PhantomParameterisation without boundary skipping and 1.015 for MGHParameterization. The average runtime ratio for G4PhantomParameterisation with and without boundary skipping for our heterogeneous data was equal to 0.97: 1. The calculated dose distributions agreed with the reference distribution for all but the G4PhantomParameterisation with boundary skipping for the head and neck patient. The maximum memory usage ranged from 0.8 to 1.8 GB depending on the CT volume independent of parameterizations, except for the 15-19 times greater memory usage with the G4VPVParameterisation when using the option with a higher simulation speed. The G4VNestedParameterisation was selected as the preferred choice for the patient geometries and treatment plans studied.
Efficient methods for implementation of multi-level nonrigid mass-preserving image registration on GPUs and multi-threaded CPUs.

PubMed

Ellingwood, Nathan D; Yin, Youbing; Smith, Matthew; Lin, Ching-Long

2016-04-01

Faster and more accurate methods for registration of images are important for research involved in conducting population-based studies that utilize medical imaging, as well as improvements for use in clinical applications. We present a novel computation- and memory-efficient multi-level method on graphics processing units (GPU) for performing registration of two computed tomography (CT) volumetric lung images. We developed a computation- and memory-efficient Diffeomorphic Multi-level B-Spline Transform Composite (DMTC) method to implement nonrigid mass-preserving registration of two CT lung images on GPU. The framework consists of a hierarchy of B-Spline control grids of increasing resolution. A similarity criterion known as the sum of squared tissue volume difference (SSTVD) was adopted to preserve lung tissue mass. The use of SSTVD consists of the calculation of the tissue volume, the Jacobian, and their derivatives, which makes its implementation on GPU challenging due to memory constraints. The use of the DMTC method enabled reduced computation and memory storage of variables with minimal communication between GPU and Central Processing Unit (CPU) due to ability to pre-compute values. The method was assessed on six healthy human subjects. Resultant GPU-generated displacement fields were compared against the previously validated CPU counterpart fields, showing good agreement with an average normalized root mean square error (nRMS) of 0.044±0.015. Runtime and performance speedup are compared between single-threaded CPU, multi-threaded CPU, and GPU algorithms. Best performance speedup occurs at the highest resolution in the GPU implementation for the SSTVD cost and cost gradient computations, with a speedup of 112 times that of the single-threaded CPU version and 11 times over the twelve-threaded version when considering average time per iteration using a Nvidia Tesla K20X GPU. The proposed GPU-based DMTC method outperforms its multi-threaded CPU version in terms of runtime. Total registration time reduced runtime to 2.9min on the GPU version, compared to 12.8min on twelve-threaded CPU version and 112.5min on a single-threaded CPU. Furthermore, the GPU implementation discussed in this work can be adapted for use of other cost functions that require calculation of the first derivatives. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Coverage Maximization Using Dynamic Taint Tracing

DTIC Science & Technology

2007-03-28

we do not have source code are handled, incompletely, via models of taint transfer. We use a little language to specify how taint transfers across a...n) 2.3.7 Implementation and Runtime Issues The taint graph instrumentation is a 2K line Ocaml module extending CIL and is supported by 5K lines of...modern scripting languages such as Ruby have taint modes that work similarly; however, all propagate taint at the variable rather than the byte level and
Reduced SWAP-C VICTORY Services Execution and Performance Evaluation

DTIC Science & Technology

2012-08-01

NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) UBT, Inc.,3250 W Big Beaver Rd, Suite 329, Troy ,Mi,48084 8. PERFORMING...Symposium August 14-16 Troy , Michigan 14. ABSTRACT -Executing multiple VICTORY data services, and reading multiple VICTORY-compliant sensors at the...same time resulted in the following performance measurements for the system -0.64 Amps / 3.15 Watts Power Consumption at run-time. -Roughly 0.77% System
A Hybrid Constraint Representation and Reasoning Framework

NASA Technical Reports Server (NTRS)

Golden, Keith; Pang, Wanlin

2004-01-01

In this paper, we introduce JNET, a novel constraint representation and reasoning framework that supports procedural constraints and constraint attachments, providing a flexible way of integrating the constraint system with a runtime software environment and improving its applicability. We describe how JNET is applied to a real-world problem - NASA's Earth-science data processing domain, and demonstrate how JNET can be extended, without any knowledge of how it is implemented, to meet the growing demands of real-world applications.
NPS-NRL-Rice-UIUC Collaboration on Navy Atmosphere-Ocean Coupled Models on Many-Core Computer Architectures Annual Report

DTIC Science & Technology

2015-09-30

DISTRIBUTION STATEMENT A: Distribution approved for public release; distribution is unlimited. NPS-NRL- Rice -UIUC Collaboration on Navy Atmosphere...portability. There is still a gap in the OCCA support for Fortran programmers who do not have accelerator experience. Activities at Rice /Virginia Tech are...for automated data movement and for kernel optimization using source code analysis and run-time detective work. In this quarter the Rice /Virginia
An Approach for Detecting Malicious Emails Using Runtime Monitoring with Hidden Data

DTIC Science & Technology

2016-09-01

demonstrating that a system meets the user’s true requirements--often called ‘building the right system’” [14]. To select a validation and verification...requirements [18]. For example, we give a generalization of how natural language can be ambiguous. No restaurants will allow smoking inside. Here no...can qualify the rest of the sentence, meaning thereby there is not a restaurant that will allow smoking inside. On the other hand, it can qualify only
Runtime Support for Type-Safe Dynamic Java Classes

DTIC Science & Technology

2000-01-01

Section 4.3. For each dynamic class C, we create a proxy class, Cproxy, and an implementation class, Cimp . In order to wrap method calls, Cproxy...wrapper method (W) and a reference to the associated method body (M). W explicitly invokes M, which points to the corresponding method body in Cimp ...When C’s implementation Cimp is switched, M is updated to point to the corresponding method object in the new C imp. Cproxy also contains a reference
Formal Specifications for an Electrical Power Grid System Stability and Reliability

DTIC Science & Technology

2015-09-01

expressed in this thesis are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government. IRB...analyze the power grid system requirements and express the critical runtime behavior using first-order logic. First, we identify observable...Verification System, and Type systems to name a few [5]. Theorem proving’s specification dimension is dependent on the expressive power of the formal
The preliminary SOL (Sizing and Optimization Language) reference manual

NASA Technical Reports Server (NTRS)

Lucas, Stephen H.; Scotti, Stephen J.

1989-01-01

The Sizing and Optimization Language, SOL, a high-level special-purpose computer language has been developed to expedite application of numerical optimization to design problems and to make the process less error-prone. This document is a reference manual for those wishing to write SOL programs. SOL is presently available for DEC VAX/VMS systems. A SOL package is available which includes the SOL compiler and runtime library routines. An overview of SOL appears in NASA TM 100565.
Deferred Compilation: The Automation of Run-Time Code Generation

DTIC Science & Technology

1993-12-01

can bte amortizted over many late computations ’iCPW931. For example, in a itmandard MtL implementation of a network cotmmunications *ystem, Biagioni ...with global variables and abstract data types. Science of Computer Pr"rnMmminq, 16(2):151-195. Septernber 1991. BHL93’ Edoaxdo Biagioni , Robert Harper...16(2):151-195. September 1991. 311L93i Edoardo Biagioni , Robert Harper, and Peter Lee. Standard NIL signatures for a protocol stack. Technical
Proposal to Develop Enhancements and Extensions of Formal Models for Risk Assessment In Software Projects

DTIC Science & Technology

2002-09-01

seconds per minute that the runtime environment was up and running. Defect Categories. The labels of the 5 defect categories. 78 Cosmetic Defects...The name that corresponds to QSM’s cosmetic defects. Cosmetic defects can be described as deferred, such as errors in format of displays or...2002. [Fent00] Fenton , N. E. and Neil, M. Software Metrics: Roadmap. Proceedings of the Conference on the Future of Software Engineering, 2000, pp
Shark: SQL and Rich Analytics at Scale

DTIC Science & Technology

2012-11-26

learning programs up to 100 faster than Hadoop. Unlike previous systems, Shark shows that it is possible to achieve these speedups while retaining a...Shark to run SQL queries up to 100× faster than Apache Hive, and machine learning programs up to 100× faster than Hadoop. Unlike previous systems, Shark...so using a runtime that is optimized for such workloads and a programming model that is designed to express machine learn - ing algorithms. 4.1
Massively Parallel Dantzig-Wolfe Decomposition Applied to Traffic Flow Scheduling

NASA Technical Reports Server (NTRS)

Rios, Joseph Lucio; Ross, Kevin

2009-01-01

Optimal scheduling of air traffic over the entire National Airspace System is a computationally difficult task. To speed computation, Dantzig-Wolfe decomposition is applied to a known linear integer programming approach for assigning delays to flights. The optimization model is proven to have the block-angular structure necessary for Dantzig-Wolfe decomposition. The subproblems for this decomposition are solved in parallel via independent computation threads. Experimental evidence suggests that as the number of subproblems/threads increases (and their respective sizes decrease), the solution quality, convergence, and runtime improve. A demonstration of this is provided by using one flight per subproblem, which is the finest possible decomposition. This results in thousands of subproblems and associated computation threads. This massively parallel approach is compared to one with few threads and to standard (non-decomposed) approaches in terms of solution quality and runtime. Since this method generally provides a non-integral (relaxed) solution to the original optimization problem, two heuristics are developed to generate an integral solution. Dantzig-Wolfe followed by these heuristics can provide a near-optimal (sometimes optimal) solution to the original problem hundreds of times faster than standard (non-decomposed) approaches. In addition, when massive decomposition is employed, the solution is shown to be more likely integral, which obviates the need for an integerization step. These results indicate that nationwide, real-time, high fidelity, optimal traffic flow scheduling is achievable for (at least) 3 hour planning horizons.
From Provenance Standards and Tools to Queries and Actionable Provenance

NASA Astrophysics Data System (ADS)

Ludaescher, B.

2017-12-01

The W3C PROV standard provides a minimal core for sharing retrospective provenance information for scientific workflows and scripts. PROV extensions such as DataONE's ProvONE model are necessary for linking runtime observables in retrospective provenance records with conceptual-level prospective provenance information, i.e., workflow (or dataflow) graphs. Runtime provenance recorders, such as DataONE's RunManager for R, or noWorkflow for Python capture retrospective provenance automatically. YesWorkflow (YW) is a toolkit that allows researchers to declare high-level prospective provenance models of scripts via simple inline comments (YW-annotations), revealing the computational modules and dataflow dependencies in the script. By combining and linking both forms of provenance, important queries and use cases can be supported that neither provenance model can afford on its own. We present existing and emerging provenance tools developed for the DataONE and SKOPE (Synthesizing Knowledge of Past Environments) projects. We show how the different tools can be used individually and in combination to model, capture, share, query, and visualize provenance information. We also present challenges and opportunities for making provenance information more immediately actionable for the researchers who create it in the first place. We argue that such a shift towards "provenance-for-self" is necessary to accelerate the creation, sharing, and use of provenance in support of transparent, reproducible computational and data science.
A Model-Driven Co-Design Framework for Fusing Control and Scheduling Viewpoints.

PubMed

Sundharam, Sakthivel Manikandan; Navet, Nicolas; Altmeyer, Sebastian; Havet, Lionel

2018-02-20

Model-Driven Engineering (MDE) is widely applied in the industry to develop new software functions and integrate them into the existing run-time environment of a Cyber-Physical System (CPS). The design of a software component involves designers from various viewpoints such as control theory, software engineering, safety, etc. In practice, while a designer from one discipline focuses on the core aspects of his field (for instance, a control engineer concentrates on designing a stable controller), he neglects or considers less importantly the other engineering aspects (for instance, real-time software engineering or energy efficiency). This may cause some of the functional and non-functional requirements not to be met satisfactorily. In this work, we present a co-design framework based on timing tolerance contract to address such design gaps between control and real-time software engineering. The framework consists of three steps: controller design, verified by jitter margin analysis along with co-simulation, software design verified by a novel schedulability analysis, and the run-time verification by monitoring the execution of the models on target. This framework builds on CPAL (Cyber-Physical Action Language), an MDE design environment based on model-interpretation, which enforces a timing-realistic behavior in simulation through timing and scheduling annotations. The application of our framework is exemplified in the design of an automotive cruise control system.

Load Index Metrics for an Optimized Management of Web Services: A Systematic Evaluation

PubMed Central

Souza, Paulo S. L.; Santana, Regina H. C.; Santana, Marcos J.; Zaluska, Ed; Faical, Bruno S.; Estrella, Julio C.

2013-01-01

The lack of precision to predict service performance through load indices may lead to wrong decisions regarding the use of web services, compromising service performance and raising platform cost unnecessarily. This paper presents experimental studies to qualify the behaviour of load indices in the web service context. The experiments consider three services that generate controlled and significant server demands, four levels of workload for each service and six distinct execution scenarios. The evaluation considers three relevant perspectives: the capability for representing recent workloads, the capability for predicting near-future performance and finally stability. Eight different load indices were analysed, including the JMX Average Time index (proposed in this paper) specifically designed to address the limitations of the other indices. A systematic approach is applied to evaluate the different load indices, considering a multiple linear regression model based on the stepwise-AIC method. The results show that the load indices studied represent the workload to some extent; however, in contrast to expectations, most of them do not exhibit a coherent correlation with service performance and this can result in stability problems. The JMX Average Time index is an exception, showing a stable behaviour which is tightly-coupled to the service runtime for all executions. Load indices are used to predict the service runtime and therefore their inappropriate use can lead to decisions that will impact negatively on both service performance and execution cost. PMID:23874776
Compliance monitoring in business processes: Functionalities, application, and tool-support.

PubMed

Ly, Linh Thao; Maggi, Fabrizio Maria; Montali, Marco; Rinderle-Ma, Stefanie; van der Aalst, Wil M P

2015-12-01

In recent years, monitoring the compliance of business processes with relevant regulations, constraints, and rules during runtime has evolved as major concern in literature and practice. Monitoring not only refers to continuously observing possible compliance violations, but also includes the ability to provide fine-grained feedback and to predict possible compliance violations in the future. The body of literature on business process compliance is large and approaches specifically addressing process monitoring are hard to identify. Moreover, proper means for the systematic comparison of these approaches are missing. Hence, it is unclear which approaches are suitable for particular scenarios. The goal of this paper is to define a framework for Compliance Monitoring Functionalities (CMF) that enables the systematic comparison of existing and new approaches for monitoring compliance rules over business processes during runtime. To define the scope of the framework, at first, related areas are identified and discussed. The CMFs are harvested based on a systematic literature review and five selected case studies. The appropriateness of the selection of CMFs is demonstrated in two ways: (a) a systematic comparison with pattern-based compliance approaches and (b) a classification of existing compliance monitoring approaches using the CMFs. Moreover, the application of the CMFs is showcased using three existing tools that are applied to two realistic data sets. Overall, the CMF framework provides powerful means to position existing and future compliance monitoring approaches.
Simultaneous quantification of fentanyl, sufentanil, cefazolin, doxapram and keto-doxapram in plasma using liquid chromatography - tandem mass spectrometry.

PubMed

Flint, Robert B; Bahmany, Soma; van der Nagel, Bart C H; Koch, Birgit C P

2018-05-16

A simple and specific UPLC-MS/MS method was developed and validated for simultaneous quantification of fentanyl, sufentanil, cefazolin, doxapram and its active metabolite keto-doxapram. The internal standard was fentanyl-d5 for all analytes. Chromatographic separation was achieved with a reversed phase Acquity UPLC HSS T3 column with a run-time of only 5.0 minutes per injected sample. Gradient elution was performed with a mobile phase consisting of ammonium acetate, formic acid in Milli-Q ultrapure water or in methanol with a total flow rate of 0.4 mL minute -1 . A plasma volume of only 50 μL was required to achieve both adequate accuracy and precision. Calibration curves of all 5 analytes were linear. All analytes were stable for at least 48 hours in the autosampler. The method was validated according to US Food and Drug Administration guidelines. This method allows quantification of fentanyl, sufentanil, cefazolin, doxapram and keto-doxapram, which serves purposes for research, as well as therapeutic drug monitoring, if applicable. The strength of this method is the combination of a small sample volume, a short run-time, a deuterated internal standard, an easy sample preparation method and the ability to simultaneously quantify all analytes in one run. This article is protected by copyright. All rights reserved.
Finding Mount Everest and handling voids.

PubMed

Storch, Tobias

2011-01-01

Evolutionary algorithms (EAs) are randomized search heuristics that solve problems successfully in many cases. Their behavior is often described in terms of strategies to find a high location on Earth's surface. Unfortunately, many digital elevation models describing it contain void elements. These are elements not assigned an elevation. Therefore, we design and analyze simple EAs with different strategies to handle such partially defined functions. They are experimentally investigated on a dataset describing the elevation of Earth's surface. The largest value found by an EA within a certain runtime is measured, and the median over a few runs is computed and compared for the different EAs. For the dataset, the distribution of void elements seems to be neither random nor adversarial. They are so-called semirandomly distributed. To deepen our understanding of the behavior of the different EAs, they are theoretically considered on well-known pseudo-Boolean functions transferred to partially defined ones. These modifications are also performed in a semirandom way. The typical runtime until an optimum is found by an EA is analyzed, namely bounded from above and below, and compared for the different EAs. We figure out that for the random model it is a good strategy to assume that a void element has a worse function value than all previous elements. Whereas for the adversary model it is a good strategy to assume that a void element has the best function value of all previous elements.
Two-phase computerized planning of cryosurgery using bubble-packing and force-field analogy.

PubMed

Tanaka, Daigo; Shimada, Kenji; Rabin, Yoed

2006-02-01

Cryosurgery is the destruction of undesired tissues by freezing, as in prostate cryosurgery, for example. Minimally invasive cryosurgery is currently performed by means of an array of cryoprobes, each in the shape of a long hypodermic needle. The optimal arrangement of the cryoprobes, which is known to have a dramatic effect on the quality of the cryoprocedure, remains an art held by the cryosurgeon, based on the cryosurgeon's experience and "rules of thumb." An automated computerized technique for cryosurgery planning is the subject matter of the current paper, in an effort to improve the quality of cryosurgery. A two-phase optimization method is proposed for this purpose, based on two previous and independent developments by this research team. Phase I is based on a bubble-packing method, previously used as an efficient method for finite element meshing. Phase II is based on a force-field analogy method, which has proven to be robust at the expense of a typically long runtime. As a proof-of-concept, results are demonstrated on a two-dimensional case of a prostate cross section. The major contribution of this study is to affirm that in many instances cryosurgery planning can be performed without extremely expensive simulations of bioheat transfer, achieved in Phase I. This new method of planning has proven to reduce planning runtime from hours to minutes, making automated planning practical in a clinical time frame.
Ultra-high performance size-exclusion chromatography in polar solvents.

PubMed

Vancoillie, Gertjan; Vergaelen, Maarten; Hoogenboom, Richard

2016-12-23

Size-exclusion chromatography (SEC) is amongst the most widely used polymer characterization methods in both academic and industrial polymer research allowing the determination of molecular weight and distribution parameters, i.e. the dispersity (Ɖ), of unknown polymers. The many advantages, including accuracy, reproducibility and low sample consumption, have contributed to the worldwide success of this analytical technique. The current generation of SEC systems have a stationary phase mostly containing highly porous, styrene-divinylbenzene particles allowing for a size-based separation of various polymers in solution but limiting the flow rate and solvent compatibility. Recently, sub-2μm ethylene-bridged hybrid (BEH) packing materials have become available for SEC analysis. These packing materials can not only withstand much higher pressures up to 15000psi but also show high spatial stability towards different solvents. Combining these BEH columns with the ultra-high performance LC (UHPLC) technology opens up UHP-SEC analysis, showing strongly reduced runtimes and unprecedented solvent compatibility. In this work, this novel characterization technique was compared to conventional SEC using both highly viscous and highly polar solvents as eluent, namely N,N-dimethylacetamide (DMAc), N,N-dimethylformamide (DMF) and methanol, focusing on the suitability of the BEH-columns for analysis of highly functional polymers. The results show a high functional group compatibility comparable with conventional SEC with remarkably short runtimes and enhanced resolution in methanol. Copyright © 2016 Elsevier B.V. All rights reserved.
Compliance monitoring in business processes: Functionalities, application, and tool-support

PubMed Central

Ly, Linh Thao; Maggi, Fabrizio Maria; Montali, Marco; Rinderle-Ma, Stefanie; van der Aalst, Wil M.P.

2015-01-01

In recent years, monitoring the compliance of business processes with relevant regulations, constraints, and rules during runtime has evolved as major concern in literature and practice. Monitoring not only refers to continuously observing possible compliance violations, but also includes the ability to provide fine-grained feedback and to predict possible compliance violations in the future. The body of literature on business process compliance is large and approaches specifically addressing process monitoring are hard to identify. Moreover, proper means for the systematic comparison of these approaches are missing. Hence, it is unclear which approaches are suitable for particular scenarios. The goal of this paper is to define a framework for Compliance Monitoring Functionalities (CMF) that enables the systematic comparison of existing and new approaches for monitoring compliance rules over business processes during runtime. To define the scope of the framework, at first, related areas are identified and discussed. The CMFs are harvested based on a systematic literature review and five selected case studies. The appropriateness of the selection of CMFs is demonstrated in two ways: (a) a systematic comparison with pattern-based compliance approaches and (b) a classification of existing compliance monitoring approaches using the CMFs. Moreover, the application of the CMFs is showcased using three existing tools that are applied to two realistic data sets. Overall, the CMF framework provides powerful means to position existing and future compliance monitoring approaches. PMID:26635430
A Model-Driven Co-Design Framework for Fusing Control and Scheduling Viewpoints

PubMed Central

Navet, Nicolas; Havet, Lionel

2018-01-01

Model-Driven Engineering (MDE) is widely applied in the industry to develop new software functions and integrate them into the existing run-time environment of a Cyber-Physical System (CPS). The design of a software component involves designers from various viewpoints such as control theory, software engineering, safety, etc. In practice, while a designer from one discipline focuses on the core aspects of his field (for instance, a control engineer concentrates on designing a stable controller), he neglects or considers less importantly the other engineering aspects (for instance, real-time software engineering or energy efficiency). This may cause some of the functional and non-functional requirements not to be met satisfactorily. In this work, we present a co-design framework based on timing tolerance contract to address such design gaps between control and real-time software engineering. The framework consists of three steps: controller design, verified by jitter margin analysis along with co-simulation, software design verified by a novel schedulability analysis, and the run-time verification by monitoring the execution of the models on target. This framework builds on CPAL (Cyber-Physical Action Language), an MDE design environment based on model-interpretation, which enforces a timing-realistic behavior in simulation through timing and scheduling annotations. The application of our framework is exemplified in the design of an automotive cruise control system. PMID:29461489
Simulating muscular thin films using thermal contraction capabilities in finite element analysis tools.

PubMed

Webster, Victoria A; Nieto, Santiago G; Grosberg, Anna; Akkus, Ozan; Chiel, Hillel J; Quinn, Roger D

2016-10-01

In this study, new techniques for approximating the contractile properties of cells in biohybrid devices using Finite Element Analysis (FEA) have been investigated. Many current techniques for modeling biohybrid devices use individual cell forces to simulate the cellular contraction. However, such techniques result in long simulation runtimes. In this study we investigated the effect of the use of thermal contraction on simulation runtime. The thermal contraction model was significantly faster than models using individual cell forces, making it beneficial for rapidly designing or optimizing devices. Three techniques, Stoney׳s Approximation, a Modified Stoney׳s Approximation, and a Thermostat Model, were explored for calibrating thermal expansion/contraction parameters (TECPs) needed to simulate cellular contraction using thermal contraction. The TECP values were calibrated by using published data on the deflections of muscular thin films (MTFs). Using these techniques, TECP values that suitably approximate experimental deflections can be determined by using experimental data obtained from cardiomyocyte MTFs. Furthermore, a sensitivity analysis was performed in order to investigate the contribution of individual variables, such as elastic modulus and layer thickness, to the final calibrated TECP for each calibration technique. Additionally, the TECP values are applicable to other types of biohybrid devices. Two non-MTF models were simulated based on devices reported in the existing literature. Copyright © 2016 Elsevier Ltd. All rights reserved.
Improvements to Integrated Tradespace Analysis of Communications Architectures (ITACA) Network Loading Analysis Tool

NASA Technical Reports Server (NTRS)

Lee, Nathaniel; Welch, Bryan W.

2018-01-01

NASA's SCENIC project aims to simplify and reduce the cost of space mission planning by replicating the analysis capabilities of commercially licensed software which are integrated with relevant analysis parameters specific to SCaN assets and SCaN supported user missions. SCENIC differs from current tools that perform similar analyses in that it 1) does not require any licensing fees, 2) will provide an all-in-one package for various analysis capabilities that normally requires add-ons or multiple tools to complete. As part of SCENIC's capabilities, the ITACA network loading analysis tool will be responsible for assessing the loading on a given network architecture and generating a network service schedule. ITACA will allow users to evaluate the quality of service of a given network architecture and determine whether or not the architecture will satisfy the mission's requirements. ITACA is currently under development, and the following improvements were made during the fall of 2017: optimization of runtime, augmentation of network asset pre-service configuration time, augmentation of Brent's method of root finding, augmentation of network asset FOV restrictions, augmentation of mission lifetimes, and the integration of a SCaN link budget calculation tool. The improvements resulted in (a) 25% reduction in runtime, (b) more accurate contact window predictions when compared to STK(Registered Trademark) contact window predictions, and (c) increased fidelity through the use of specific SCaN asset parameters.
The Digital electronic Guideline Library (DeGeL): a hybrid framework for representation and use of clinical guidelines.

PubMed

Shahar, Yuval; Young, Ohad; Shalom, Erez; Mayaffit, Alon; Moskovitch, Robert; Hessing, Alon; Galperin, Maya

2004-01-01

We propose to present a poster (and potentially also a demonstration of the implemented system) summarizing the current state of our work on a hybrid, multiple-format representation of clinical guidelines that facilitates conversion of guidelines from free text to a formal representation. We describe a distributed Web-based architecture (DeGeL) and a set of tools using the hybrid representation. The tools enable performing tasks such as guideline specification, semantic markup, search, retrieval, visualization, eligibility determination, runtime application and retrospective quality assessment. The representation includes four parallel formats: Free text (one or more original sources); semistructured text (labeled by the target guideline-ontology semantic labels); semiformal text (which includes some control specification); and a formal, machine-executable representation. The specification, indexing, search, retrieval, and browsing tools are essentially independent of the ontology chosen for guideline representation, but editing the semi-formal and formal formats requires ontology-specific tools, which we have developed in the case of the Asbru guideline-specification language. The four formats support increasingly sophisticated computational tasks. The hybrid guidelines are stored in a Web-based library. All tools, such as for runtime guideline application or retrospective quality assessment, are designed to operate on all representations. We demonstrate the hybrid framework by providing examples from the semantic markup and search tools.
High performance computing enabling exhaustive analysis of higher order single nucleotide polymorphism interaction in Genome Wide Association Studies.

PubMed

Goudey, Benjamin; Abedini, Mani; Hopper, John L; Inouye, Michael; Makalic, Enes; Schmidt, Daniel F; Wagner, John; Zhou, Zeyu; Zobel, Justin; Reumann, Matthias

2015-01-01

Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform.

PubMed

Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
CrossTalk: The Journal of Defense Software Engineering. Volume 20, Number 9, September 2007

DTIC Science & Technology

2007-09-01

underlying application framework, e.g., Java Enter- prise Edition or .NET. This increases the risk that consumer Web services not based on the same...weaknesses and vulnera- bilities that are targeted by attackers and malicious code. For example, Apache Axis 2 enables a Java devel- oper to simply...load his/her Java objects into the Axis SOAP engine. At runtime, it is the SOAP engine that determines which incoming SOAP request messages should be
A topological hierarchy for functions on triangulated surfaces.

PubMed

Bremer, Peer-Timo; Edelsbrunner, Herbert; Hamann, Bernd; Pascucci, Valerio

2004-01-01

We combine topological and geometric methods to construct a multiresolution representation for a function over a two-dimensional domain. In a preprocessing stage, we create the Morse-Smale complex of the function and progressively simplify its topology by cancelling pairs of critical points. Based on a simple notion of dependency among these cancellations, we construct a hierarchical data structure supporting traversal and reconstruction operations similarly to traditional geometry-based representations. We use this data structure to extract topologically valid approximations that satisfy error bounds provided at runtime.
Exploring business process modelling paradigms and design-time to run-time transitions

NASA Astrophysics Data System (ADS)

Caron, Filip; Vanthienen, Jan

2016-09-01

The business process management literature describes a multitude of approaches (e.g. imperative, declarative or event-driven) that each result in a different mix of process flexibility, compliance, effectiveness and efficiency. Although the use of a single approach over the process lifecycle is often assumed, transitions between approaches at different phases in the process lifecycle may also be considered. This article explores several business process strategies by analysing the approaches at different phases in the process lifecycle as well as the various transitions.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

2001-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Method for resource control in parallel environments using program organization and run-time support

NASA Technical Reports Server (NTRS)

Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

1999-01-01

A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Traveler Trustworthy Autonomy

NASA Technical Reports Server (NTRS)

Skoog, Mark A.

2016-01-01

NASAs Armstrong Flight Research Center has been engaged in the development of highly automatic safety systems for aviation since the mid 80s. For the past three years under Seedling and Center Innovation funding this work has moved toward the development of a software architecture applicable to autonomous safety. This work is now broadening and accelerating to address the airworthiness issues surrounding making a case for trustworthy autonomy. This software architecture is called the expandable variable-autonomy architecture (EVAA) and utilizes a run-time assurance approach to safety assurance.
Earth Global Reference Atmospheric Model (Earth-GRAM) GRAM Virtual Meeting

NASA Technical Reports Server (NTRS)

White, Patrick

2017-01-01

What is Earth-GRAM? Provide monthly mean and standard deviation for any point in atmosphere; Monthly, Geographic, and Altitude Variation. Earth-GRAM is a C++ software package; Currently distributed as Earth-GRAM 2016. Atmospheric variables included: pressure, density, temperature, horizontal and vertical winds, speed of sound, and atmospheric constituents. Used by engineering community because of ability to create dispersions inatmosphere at a rapid runtime; Often embedded in trajectory simulation software. Not a forecast model. Does not readily capture localized atmospheric effects.

The ADAMS interactive interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rietscha, E.R.

1990-12-17

The ADAMS (Advanced DAta Management System) project is exploring next generation database technology. Database management does not follow the usual programming paradigm. Instead, the database dictionary provides an additional name space environment that should be interactively created and tested before writing application code. This document describes the implementation and operation of the ADAMS Interpreter, an interactive interface to the ADAMS data dictionary and runtime system. The Interpreter executes individual statements of the ADAMS Interface Language, providing a fast, interactive mechanism to define and access persistent databases. 5 refs.
Usage Automata

NASA Astrophysics Data System (ADS)

Bartoletti, Massimo

Usage automata are an extension of finite stata automata, with some additional features (e.g. parameters and guards) that improve their expressivity. Usage automata are expressive enough to model security requirements of real-world applications; at the same time, they are simple enough to be statically amenable, e.g. they can be model-checked against abstractions of program usages. We study here some foundational aspects of usage automata. In particular, we discuss about their expressive power, and about their effective use in run-time mechanisms for enforcing usage policies.
HOPE: Just-in-time Python compiler for astrophysical computations

NASA Astrophysics Data System (ADS)

Akeret, Joel; Gamper, Lukas; Amara, Adam; Refregier, Alexandre

2014-11-01

HOPE is a specialized Python just-in-time (JIT) compiler designed for numerical astrophysical applications. HOPE focuses on a subset of the language and is able to translate Python code into C++ while performing numerical optimization on mathematical expressions at runtime. To enable the JIT compilation, the user only needs to add a decorator to the function definition. By using HOPE, the user benefits from being able to write common numerical code in Python while getting the performance of compiled implementation.
The roofline model: A pedagogical tool for program analysis and optimization

DOE PAGES

Williams, Samuel; Patterson, David; Oliker, Leonid; ...

2008-08-01

This article consists of a collection of slides from the authors' conference presentation. The Roofline model is a visually intuitive figure for kernel analysis and optimization. We believe undergraduates will find it useful in assessing performance and scalability limitations. It is easily extended to other architectural paradigms. It is easily extendable to other metrics: performance (sort, graphics, crypto..) bandwidth (L2, PCIe, ..). Furthermore, a performance counters could be used to generate a runtime-specific roofline that would greatly aide the optimization.
Self-Avoiding Walks Over Adaptive Triangular Grids

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Gao, Guang R.; Saini, Subhash (Technical Monitor)

1999-01-01

Space-filling curves is a popular approach based on a geometric embedding for linearizing computational meshes. We present a new O(n log n) combinatorial algorithm for constructing a self avoiding walk through a two dimensional mesh containing n triangles. We show that for hierarchical adaptive meshes, the algorithm can be locally adapted and easily parallelized by taking advantage of the regularity of the refinement rules. The proposed approach should be very useful in the runtime partitioning and load balancing of adaptive unstructured grids.
The Implementation of a Multi-Backend Database System (MDBS). Part I. Software Engineering Strategies and Efforts Towards a Prototype MDBS.

DTIC Science & Technology

1983-06-01

for DEC PDPll systems. MAINSAIL was developed and is marketed with a set of integrated tools for program development. The syntax of the language is...stack, and to test for stack-full and stack-empty conditions. This technique is useful in enforcing data integrity and in con- trolling concurrent...and market MAINSAIL. The language is distinguished by its portability. The same compiler and runtime system, both written in MAINSAIL, are the basis
ModelPlex: Verified Runtime Validation of Verified Cyber-Physical System Models

DTIC Science & Technology

2014-07-01

nondeterministic choice (〈∪〉), deterministic assignment (〈:=〉) and logical con- nectives (∧ r etc.) replace current facts with simpler ones or branch...By sequent proof rule ∃ r , this existentially quantified variable is instantiated with an arbitrary term θ, which is often a new logical variable...that is implicitly existentially quantified [27]. Weakening (Wr) removes facts that are no longer necessary. (〈∗〉) ∃X〈x :=X〉φ 〈x := ∗〉φ 1 (∃ r ) Γ ` φ(θ
Evaluating the operations capability of Freedom's Data Management System

NASA Technical Reports Server (NTRS)

Sowizral, Henry A.

1990-01-01

Three areas of Data Management System (DMS) performance are examined: raw processor speed, the subjective speed of the Lynx OS X-Window system, and the operational capacity of the Runtime Object Database (RODB). It is concluded that the proposed processor will operate at its specified rate of speed and that the X-Window system operates within users' subjective needs. It is also concluded that the RODB cannot provide the required level of service, even with a two-order of magnitude (100 fold) improvement in speed.
TECA: A Parallel Toolkit for Extreme Climate Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prabhat, Mr; Ruebel, Oliver; Byna, Surendra

2012-03-12

We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.
SNLSimMagic v 2.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

This software is an iOS (Apple) Augmented Reality (AR) application that runs on the iPhone and iPad. It is designed to scan in a photograph or graphic and "play" an associated video. This release, SNLSimMagic, was built using Wikitude Augmented Reality (AR) software development kit (SDK) integrated into Apple iOS SDK application and the Cordova libraries. These codes enable the generation of runtime targets using cloud recognition and developer-defined target features which are then accessed by means of a custom application.
A Survey of Scattering, Attenuation, and Size Spectra Studies of Bubble Layers and Plumes Beneath the Air-Sea Interface.

DTIC Science & Technology

1991-08-30

authors exploit the spatial resolution benefits of nonlinear bubble response (at the sum frequency) to the double frequency excitation by two...interaction method is the computational require- ment. Although exact runtimes for MIM are not given, and it apparently does have speed advantages over...Frequencies," J. Acoust. Soc. Am. 75(5), 1473-1477 (1984). (136] T.D.K. Ngoc, E.R. Franchi , and B.B. Adams, "Modeling of Ocean Surface Spectrum and
On the Suitability of MPI as a PGAS Runtime

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daily, Jeffrey A.; Vishnu, Abhinav; Palmer, Bruce J.

2014-12-18

Partitioned Global Address Space (PGAS) models are emerging as a popular alternative to MPI models for designing scalable applications. At the same time, MPI remains a ubiquitous communication subsystem due to its standardization, high performance, and availability on leading platforms. In this paper, we explore the suitability of using MPI as a scalable PGAS communication subsystem. We focus on the Remote Memory Access (RMA) communication in PGAS models which typically includes {\\em get, put,} and {\\em atomic memory operations}. We perform an in-depth exploration of design alternatives based on MPI. These alternatives include using a semantically-matching interface such as MPI-RMA,more » as well as not-so-intuitive interfaces such as MPI two-sided with a combination of multi-threading and dynamic process management. With an in-depth exploration of these alternatives and their shortcomings, we propose a novel design which is facilitated by the data-centric view in PGAS models. This design leverages a combination of highly tuned MPI two-sided semantics and an automatic, user-transparent split of MPI communicators to provide asynchronous progress. We implement the asynchronous progress ranks approach and other approaches within the Communication Runtime for Exascale which is a communication subsystem for Global Arrays. Our performance evaluation spans pure communication benchmarks, graph community detection and sparse matrix-vector multiplication kernels, and a computational chemistry application. The utility of our proposed PR-based approach is demonstrated by a 2.17x speed-up on 1008 processors over the other MPI-based designs.« less
CAreDroid: Adaptation Framework for Android Context-Aware Applications

PubMed Central

Elmalaki, Salma; Wanner, Lucas; Srivastava, Mani

2015-01-01

Context-awareness is the ability of software systems to sense and adapt to their physical environment. Many contemporary mobile applications adapt to changing locations, connectivity states, available computational and energy resources, and proximity to other users and devices. Nevertheless, there is little systematic support for context-awareness in contemporary mobile operating systems. Because of this, application developers must build their own context-awareness adaptation engines, dealing directly with sensors and polluting application code with complex adaptation decisions. In this paper, we introduce CAreDroid, which is a framework that is designed to decouple the application logic from the complex adaptation decisions in Android context-aware applications. In this framework, developers are required— only—to focus on the application logic by providing a list of methods that are sensitive to certain contexts along with the permissible operating ranges under those contexts. At run time, CAreDroid monitors the context of the physical environment and intercepts calls to sensitive methods, activating only the blocks of code that best fit the current physical context. CAreDroid is implemented as part of the Android runtime system. By pushing context monitoring and adaptation into the runtime system, CAreDroid eases the development of context-aware applications and increases their efficiency. In particular, case study applications implemented using CAre-Droid are shown to have: (1) at least half lines of code fewer and (2) at least 10× more efficient in execution time compared to equivalent context-aware applications that use only standard Android APIs. PMID:26834512
An Improved Neutron Transport Algorithm for HZETRN2006

NASA Astrophysics Data System (ADS)

Slaba, Tony

NASA's new space exploration initiative includes plans for long term human presence in space thereby placing new emphasis on space radiation analyses. In particular, a systematic effort of verification, validation and uncertainty quantification of the tools commonly used for radiation analysis for vehicle design and mission planning has begun. In this paper, the numerical error associated with energy discretization in HZETRN2006 is addressed; large errors in the low-energy portion of the neutron fluence spectrum are produced due to a numerical truncation error in the transport algorithm. It is shown that the truncation error results from the narrow energy domain of the neutron elastic spectral distributions, and that an extremely fine energy grid is required in order to adequately resolve the problem under the current formulation. Since adding a sufficient number of energy points will render the code computationally inefficient, we revisit the light-ion transport theory developed for HZETRN2006 and focus on neutron elastic interactions. The new approach that is developed numerically integrates with adequate resolution in the energy domain without affecting the run-time of the code and is easily incorporated into the current code. Efforts were also made to optimize the computational efficiency of the light-ion propagator; a brief discussion of the efforts is given along with run-time comparisons between the original and updated codes. Convergence testing is then completed by running the code for various environments and shielding materials with many different energy grids to ensure stability of the proposed method.
High-throughput, 384-well, LC-MS/MS CYP inhibition assay using automation, cassette-analysis technique, and streamlined data analysis.

PubMed

Halladay, Jason S; Delarosa, Erlie Marie; Tran, Daniel; Wang, Leslie; Wong, Susan; Khojasteh, S Cyrus

2011-08-01

Here we describe a high capacity and high-throughput, automated, 384-well CYP inhibition assay using well-known HLM-based MS probes. We provide consistently robust IC(50) values at the lead optimization stage of the drug discovery process. Our method uses the Agilent Technologies/Velocity11 BioCel 1200 system, timesaving techniques for sample analysis, and streamlined data processing steps. For each experiment, we generate IC(50) values for up to 344 compounds and positive controls for five major CYP isoforms (probe substrate): CYP1A2 (phenacetin), CYP2C9 ((S)-warfarin), CYP2C19 ((S)-mephenytoin), CYP2D6 (dextromethorphan), and CYP3A4/5 (testosterone and midazolam). Each compound is incubated separately at four concentrations with each CYP probe substrate under the optimized incubation condition. Each incubation is quenched with acetonitrile containing the deuterated internal standard of the respective metabolite for each probe substrate. To minimize the number of samples to be analyzed by LC-MS/MS and reduce the amount of valuable MS runtime, we utilize timesaving techniques of cassette analysis (pooling the incubation samples at the end of each CYP probe incubation into one) and column switching (reducing the amount of MS runtime). Here we also report on the comparison of IC(50) results for five major CYP isoforms using our method compared to values reported in the literature.
Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies

PubMed Central

Huang, Jim C.; Meek, Christopher; Kadie, Carl; Heckerman, David

2011-01-01

Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science. PMID:21765897
Model-assisted template extraction SRAF application to contact holes patterns in high-end flash memory device fabrication

NASA Astrophysics Data System (ADS)

Seoud, Ahmed; Kim, Juhwan; Ma, Yuansheng; Jayaram, Srividya; Hong, Le; Chae, Gyu-Yeol; Lee, Jeong-Woo; Park, Dae-Jin; Yune, Hyoung-Soon; Oh, Se-Young; Park, Chan-Ha

2018-03-01

Sub-resolution assist feature (SRAF) insertion techniques have been effectively used for a long time now to increase process latitude in the lithography patterning process. Rule-based SRAF and model-based SRAF are complementary solutions, and each has its own benefits, depending on the objectives of applications and the criticality of the impact on manufacturing yield, efficiency, and productivity. Rule-based SRAF provides superior geometric output consistency and faster runtime performance, but the associated recipe development time can be of concern. Model-based SRAF provides better coverage for more complicated pattern structures in terms of shapes and sizes, with considerably less time required for recipe development, although consistency and performance may be impacted. In this paper, we introduce a new model-assisted template extraction (MATE) SRAF solution, which employs decision tree learning in a model-based solution to provide the benefits of both rule-based and model-based SRAF insertion approaches. The MATE solution is designed to automate the creation of rules/templates for SRAF insertion, and is based on the SRAF placement predicted by model-based solutions. The MATE SRAF recipe provides optimum lithographic quality in relation to various manufacturing aspects in a very short time, compared to traditional methods of rule optimization. Experiments were done using memory device pattern layouts to compare the MATE solution to existing model-based SRAF and pixelated SRAF approaches, based on lithographic process window quality, runtime performance, and geometric output consistency.
Two-phase Computerized Planning of Cryosurgery Using Bubble-packing and Force-field Analogy

PubMed Central

Tanaka, Daigo; Shimada, Kenji; Rabin, Yoed

2007-01-01

Background: Cryosurgery is the destruction of undesired tissues by freezing, as in prostate cryosurgery, for example. Minimally-invasive cryosurgery is currently performed by means of an array of cryoprobes, each in the shape of a long hypodermic needle. The optimal arrangement of the cryoprobes, which is known to have a dramatic effect on the quality of the cryoprocedure, remains an art held by the cryosurgeon, based on the cryosurgeon's experience and “rules of thumb.” An automated computerized technique for cryosurgery planning is the subject matter of the current report, in an effort to improve the quality of cryosurgery. Method of Approach: A two-phase optimization method is proposed for this purpose, based on two previous and independent developments by this research team. Phase I is based on a bubble-packing method, previously used as an efficient method for finite elements meshing. Phase II is based on a force-field analogy method, which has proven to be robust at the expense of a typically long runtime. Results: As a proof-of-concept, results are demonstrated on a 2D case of a prostate cross-section. The major contribution of this study is to affirm that in many instances cryosurgery planning can be performed without extremely expensive simulations of bioheat transfer, achieved in Phase I. Conclusions: This new method of planning has proven to reduce planning runtime from hours to minutes, making automated planning practical in a clinical time frame. PMID:16532617
Enhanced intelligent water drops algorithm for multi-depot vehicle routing problem

PubMed Central

Akutsah, Francis; Olusanya, Micheal O.; Adewumi, Aderemi O.

2018-01-01

The intelligent water drop algorithm is a swarm-based metaheuristic algorithm, inspired by the characteristics of water drops in the river and the environmental changes resulting from the action of the flowing river. Since its appearance as an alternative stochastic optimization method, the algorithm has found applications in solving a wide range of combinatorial and functional optimization problems. This paper presents an improved intelligent water drop algorithm for solving multi-depot vehicle routing problems. A simulated annealing algorithm was introduced into the proposed algorithm as a local search metaheuristic to prevent the intelligent water drop algorithm from getting trapped into local minima and also improve its solution quality. In addition, some of the potential problematic issues associated with using simulated annealing that include high computational runtime and exponential calculation of the probability of acceptance criteria, are investigated. The exponential calculation of the probability of acceptance criteria for the simulated annealing based techniques is computationally expensive. Therefore, in order to maximize the performance of the intelligent water drop algorithm using simulated annealing, a better way of calculating the probability of acceptance criteria is considered. The performance of the proposed hybrid algorithm is evaluated by using 33 standard test problems, with the results obtained compared with the solutions offered by four well-known techniques from the subject literature. Experimental results and statistical tests show that the new method possesses outstanding performance in terms of solution quality and runtime consumed. In addition, the proposed algorithm is suitable for solving large-scale problems. PMID:29554662
kmos: A lattice kinetic Monte Carlo framework

NASA Astrophysics Data System (ADS)

Hoffmann, Max J.; Matera, Sebastian; Reuter, Karsten

2014-07-01

Kinetic Monte Carlo (kMC) simulations have emerged as a key tool for microkinetic modeling in heterogeneous catalysis and other materials applications. Systems, where site-specificity of all elementary reactions allows a mapping onto a lattice of discrete active sites, can be addressed within the particularly efficient lattice kMC approach. To this end we describe the versatile kmos software package, which offers a most user-friendly implementation, execution, and evaluation of lattice kMC models of arbitrary complexity in one- to three-dimensional lattice systems, involving multiple active sites in periodic or aperiodic arrangements, as well as site-resolved pairwise and higher-order lateral interactions. Conceptually, kmos achieves a maximum runtime performance which is essentially independent of lattice size by generating code for the efficiency-determining local update of available events that is optimized for a defined kMC model. For this model definition and the control of all runtime and evaluation aspects kmos offers a high-level application programming interface. Usage proceeds interactively, via scripts, or a graphical user interface, which visualizes the model geometry, the lattice occupations and rates of selected elementary reactions, while allowing on-the-fly changes of simulation parameters. We demonstrate the performance and scaling of kmos with the application to kMC models for surface catalytic processes, where for given operation conditions (temperature and partial pressures of all reactants) central simulation outcomes are catalytic activity and selectivities, surface composition, and mechanistic insight into the occurrence of individual elementary processes in the reaction network.

Learning-based 3D surface optimization from medical image reconstruction

NASA Astrophysics Data System (ADS)

Wei, Mingqiang; Wang, Jun; Guo, Xianglin; Wu, Huisi; Xie, Haoran; Wang, Fu Lee; Qin, Jing

2018-04-01

Mesh optimization has been studied from the graphical point of view: It often focuses on 3D surfaces obtained by optical and laser scanners. This is despite the fact that isosurfaced meshes of medical image reconstruction suffer from both staircases and noise: Isotropic filters lead to shape distortion, while anisotropic ones maintain pseudo-features. We present a data-driven method for automatically removing these medical artifacts while not introducing additional ones. We consider mesh optimization as a combination of vertex filtering and facet filtering in two stages: Offline training and runtime optimization. In specific, we first detect staircases based on the scanning direction of CT/MRI scanners, and design a staircase-sensitive Laplacian filter (vertex-based) to remove them; and then design a unilateral filtered facet normal descriptor (uFND) for measuring the geometry features around each facet of a given mesh, and learn the regression functions from a set of medical meshes and their high-resolution reference counterparts for mapping the uFNDs to the facet normals of the reference meshes (facet-based). At runtime, we first perform staircase-sensitive Laplacian filter on an input MC (Marching Cubes) mesh, and then filter the mesh facet normal field using the learned regression functions, and finally deform it to match the new normal field for obtaining a compact approximation of the high-resolution reference model. Tests show that our algorithm achieves higher quality results than previous approaches regarding surface smoothness and surface accuracy.
PIPER: Performance Insight for Programmers and Exascale Runtimes: Guiding the Development of the Exascale Software Stack

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mellor-Crummey, John

The PIPER project set out to develop methodologies and software for measurement, analysis, attribution, and presentation of performance data for extreme-scale systems. Goals of the project were to support analysis of massive multi-scale parallelism, heterogeneous architectures, multi-faceted performance concerns, and to support both post-mortem performance analysis to identify program features that contribute to problematic performance and on-line performance analysis to drive adaptation. This final report summarizes the research and development activity at Rice University as part of the PIPER project. Producing a complete suite of performance tools for exascale platforms during the course of this project was impossible since bothmore » hardware and software for exascale systems is still a moving target. For that reason, the project focused broadly on the development of new techniques for measurement and analysis of performance on modern parallel architectures, enhancements to HPCToolkit’s software infrastructure to support our research goals or use on sophisticated applications, engaging developers of multithreaded runtimes to explore how support for tools should be integrated into their designs, engaging operating system developers with feature requests for enhanced monitoring support, engaging vendors with requests that they add hardware measure- ment capabilities and software interfaces needed by tools as they design new components of HPC platforms including processors, accelerators and networks, and finally collaborations with partners interested in using HPCToolkit to analyze and tune scalable parallel applications.« less
Enhanced intelligent water drops algorithm for multi-depot vehicle routing problem.

PubMed

Ezugwu, Absalom E; Akutsah, Francis; Olusanya, Micheal O; Adewumi, Aderemi O

2018-01-01

The intelligent water drop algorithm is a swarm-based metaheuristic algorithm, inspired by the characteristics of water drops in the river and the environmental changes resulting from the action of the flowing river. Since its appearance as an alternative stochastic optimization method, the algorithm has found applications in solving a wide range of combinatorial and functional optimization problems. This paper presents an improved intelligent water drop algorithm for solving multi-depot vehicle routing problems. A simulated annealing algorithm was introduced into the proposed algorithm as a local search metaheuristic to prevent the intelligent water drop algorithm from getting trapped into local minima and also improve its solution quality. In addition, some of the potential problematic issues associated with using simulated annealing that include high computational runtime and exponential calculation of the probability of acceptance criteria, are investigated. The exponential calculation of the probability of acceptance criteria for the simulated annealing based techniques is computationally expensive. Therefore, in order to maximize the performance of the intelligent water drop algorithm using simulated annealing, a better way of calculating the probability of acceptance criteria is considered. The performance of the proposed hybrid algorithm is evaluated by using 33 standard test problems, with the results obtained compared with the solutions offered by four well-known techniques from the subject literature. Experimental results and statistical tests show that the new method possesses outstanding performance in terms of solution quality and runtime consumed. In addition, the proposed algorithm is suitable for solving large-scale problems.
FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo³ Framework.

PubMed

Rodríguez, Alfonso; Valverde, Juan; Portilla, Jorge; Otero, Andrés; Riesgo, Teresa; de la Torre, Eduardo

2018-06-08

Cyber-Physical Systems are experiencing a paradigm shift in which processing has been relocated to the distributed sensing layer and is no longer performed in a centralized manner. This approach, usually referred to as Edge Computing, demands the use of hardware platforms that are able to manage the steadily increasing requirements in computing performance, while keeping energy efficiency and the adaptability imposed by the interaction with the physical world. In this context, SRAM-based FPGAs and their inherent run-time reconfigurability, when coupled with smart power management strategies, are a suitable solution. However, they usually fail in user accessibility and ease of development. In this paper, an integrated framework to develop FPGA-based high-performance embedded systems for Edge Computing in Cyber-Physical Systems is presented. This framework provides a hardware-based processing architecture, an automated toolchain, and a runtime to transparently generate and manage reconfigurable systems from high-level system descriptions without additional user intervention. Moreover, it provides users with support for dynamically adapting the available computing resources to switch the working point of the architecture in a solution space defined by computing performance, energy consumption and fault tolerance. Results show that it is indeed possible to explore this solution space at run time and prove that the proposed framework is a competitive alternative to software-based edge computing platforms, being able to provide not only faster solutions, but also higher energy efficiency for computing-intensive algorithms with significant levels of data-level parallelism.
CAreDroid: Adaptation Framework for Android Context-Aware Applications.

PubMed

Elmalaki, Salma; Wanner, Lucas; Srivastava, Mani

2015-09-01

Context-awareness is the ability of software systems to sense and adapt to their physical environment. Many contemporary mobile applications adapt to changing locations, connectivity states, available computational and energy resources, and proximity to other users and devices. Nevertheless, there is little systematic support for context-awareness in contemporary mobile operating systems. Because of this, application developers must build their own context-awareness adaptation engines, dealing directly with sensors and polluting application code with complex adaptation decisions. In this paper, we introduce CAreDroid, which is a framework that is designed to decouple the application logic from the complex adaptation decisions in Android context-aware applications. In this framework, developers are required- only-to focus on the application logic by providing a list of methods that are sensitive to certain contexts along with the permissible operating ranges under those contexts. At run time, CAreDroid monitors the context of the physical environment and intercepts calls to sensitive methods, activating only the blocks of code that best fit the current physical context. CAreDroid is implemented as part of the Android runtime system. By pushing context monitoring and adaptation into the runtime system, CAreDroid eases the development of context-aware applications and increases their efficiency. In particular, case study applications implemented using CAre-Droid are shown to have: (1) at least half lines of code fewer and (2) at least 10× more efficient in execution time compared to equivalent context-aware applications that use only standard Android APIs.
Hydronic Heating Retrofits for Low-Rise Multifamily Buildings: Boiler Control Replacement and Monitoring

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dentz, J.; Henderson, H.; Varshney, K.

2014-09-01

The ARIES Collaborative, a U.S. Department of Energy Building America research team, partnered with NeighborWorks America affiliate Homeowners' Rehab Inc. (HRI) of Cambridge, Massachusetts, to study improvements to the central hydronic heating system in one of the nonprofit's housing developments. The heating controls in the three-building, 42-unit Columbia Cambridge Alliance for Spanish Tenants housing development were upgraded. Fuel use in the development was excessive compared to similar properties. A poorly insulated thermal envelope contributed to high energy bills, but adding wall insulation was not cost-effective or practical. The more cost-effective option was improving heating system efficiency. Efficient operation of themore » heating system faced several obstacles, including inflexible boiler controls and failed thermostatic radiator valves. Boiler controls were replaced with systems that offer temperature setbacks and one that controls heat based on apartment temperature in addition to outdoor temperature. Utility bill analysis shows that post-retrofit weather-normalized heating energy use was reduced by 10%-31% (average of 19%). Indoor temperature cutoff reduced boiler runtime (and therefore heating fuel consumption) by 28% in the one building in which it was implemented. Nearly all savings were obtained during night which had a lower indoor temperature cut off (68 degrees F) than day (73 degrees F). This implies that the outdoor reset curve was appropriately adjusted for this building for daytime operation. Nighttime setback of heating system supply water temperature had no discernable impact on boiler runtime or gas bills.« less
Technology Solutions Case Study: Boiler Control Replacement for Hydronically Heated Multifamily Buildings, Cambridge, Massachusetts

DOE Office of Scientific and Technical Information (OSTI.GOV)

None

2014-11-01

The ARIES Collaborative, a U.S. Department of Energy Building America research team, partnered with NeighborWorks America affiliate Homeowners' Rehab Inc. (HRI) of Cambridge, Massachusetts, to study improvements to the central hydronic heating system in one of the nonprofit's housing developments. The heating controls in the three-building, 42-unit Columbia Cambridge Alliance for Spanish Tenants housing development were upgraded. Fuel use in the development was excessive compared to similar properties. A poorly insulated thermal envelope contributed to high energy bills, but adding wall insulation was not cost-effective or practical. The more cost-effective option was improving heating system efficiency, which faced several obstacles,more » including inflexible boiler controls and failed thermostatic radiator valves. Boiler controls were replaced with systems that offer temperature setbacks and one that controls heat based on apartment temperature in addition to outdoor temperature. Utility bill analysis shows that post-retrofit weather-normalized heating energy use was reduced by 10%-31% (average of 19%). Indoor temperature cutoff reduced boiler runtime (and therefore heating fuel consumption) by 28% in the one building in which it was implemented. Nearly all savings were obtained during night which had a lower indoor temperature cut off (68°F) than day (73° F). This implies that the outdoor reset curve was appropriately adjusted for this building for daytime operation. Nighttime setback of heating system supply water temperature had no discernable impact on boiler runtime or gas bills.« less
Statistical study of defects caused by primary knock-on atoms in fcc Cu and bcc W using molecular dynamics

NASA Astrophysics Data System (ADS)

Warrier, M.; Bhardwaj, U.; Hemani, H.; Schneider, R.; Mutzke, A.; Valsakumar, M. C.

2015-12-01

We report on molecular Dynamics (MD) simulations carried out in fcc Cu and bcc W using the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) code to study (i) the statistical variations in the number of interstitials and vacancies produced by energetic primary knock-on atoms (PKA) (0.1-5 keV) directed in random directions and (ii) the in-cascade cluster size distributions. It is seen that around 60-80 random directions have to be explored for the average number of displaced atoms to become steady in the case of fcc Cu, whereas for bcc W around 50-60 random directions need to be explored. The number of Frenkel pairs produced in the MD simulations are compared with that from the Binary Collision Approximation Monte Carlo (BCA-MC) code SDTRIM-SP and the results from the NRT model. It is seen that a proper choice of the damage energy, i.e. the energy required to create a stable interstitial, is essential for the BCA-MC results to match the MD results. On the computational front it is seen that in-situ processing saves the need to input/output (I/O) atomic position data of several tera-bytes when exploring a large number of random directions and there is no difference in run-time because the extra run-time in processing data is offset by the time saved in I/O.
Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform

PubMed Central

Wang, Min; Tian, Yun

2018-01-01

The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
HARP: A Dynamic Inertial Spectral Partitioner

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Sohn, Andrew; Biswas, Rupak

1997-01-01

Partitioning unstructured graphs is central to the parallel solution of computational science and engineering problems. Spectral partitioners, such recursive spectral bisection (RSB), have proven effecfive in generating high-quality partitions of realistically-sized meshes. The major problem which hindered their wide-spread use was their long execution times. This paper presents a new inertial spectral partitioner, called HARP. The main objective of the proposed approach is to quickly partition the meshes at runtime in a manner that works efficiently for real applications in the context of distributed-memory machines. The underlying principle of HARP is to find the eigenvectors of the unpartitioned vertices and then project them onto the eigerivectors of the original mesh. Results for various meshes ranging in size from 1000 to 100,000 vertices indicate that HARP can indeed partition meshes rapidly at runtime. Experimental results show that our largest mesh can be partitioned sequentially in only a few seconds on an SP2 which is several times faster than other spectral partitioners while maintaining the solution quality of the proven RSB method. A parallel WI version of HARP has also been implemented on IBM SP2 and Cray T3E. Parallel HARP, running on 64 processors SP2 and T3E, can partition a mesh containing more than 100,000 vertices into 64 subgrids in about half a second. These results indicate that graph partitioning can now be truly embedded in dynamically-changing real-world applications.
Machine Learning Based Online Performance Prediction for Runtime Parallelization and Task Scheduling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, J; Ma, X; Singh, K

2008-10-09

With the emerging many-core paradigm, parallel programming must extend beyond its traditional realm of scientific applications. Converting existing sequential applications as well as developing next-generation software requires assistance from hardware, compilers and runtime systems to exploit parallelism transparently within applications. These systems must decompose applications into tasks that can be executed in parallel and then schedule those tasks to minimize load imbalance. However, many systems lack a priori knowledge about the execution time of all tasks to perform effective load balancing with low scheduling overhead. In this paper, we approach this fundamental problem using machine learning techniques first to generatemore » performance models for all tasks and then applying those models to perform automatic performance prediction across program executions. We also extend an existing scheduling algorithm to use generated task cost estimates for online task partitioning and scheduling. We implement the above techniques in the pR framework, which transparently parallelizes scripts in the popular R language, and evaluate their performance and overhead with both a real-world application and a large number of synthetic representative test scripts. Our experimental results show that our proposed approach significantly improves task partitioning and scheduling, with maximum improvements of 21.8%, 40.3% and 22.1% and average improvements of 15.9%, 16.9% and 4.2% for LMM (a real R application) and synthetic test cases with independent and dependent tasks, respectively.« less
A Fault Oblivious Extreme-Scale Execution Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKie, Jim

The FOX project, funded under the ASCR X-stack I program, developed systems software and runtime libraries for a new approach to the data and work distribution for massively parallel, fault oblivious application execution. Our work was motivated by the premise that exascale computing systems will provide a thousand-fold increase in parallelism and a proportional increase in failure rate relative to today’s machines. To deliver the capability of exascale hardware, the systems software must provide the infrastructure to support existing applications while simultaneously enabling efficient execution of new programming models that naturally express dynamic, adaptive, irregular computation; coupled simulations; and massivemore » data analysis in a highly unreliable hardware environment with billions of threads of execution. Our OS research has prototyped new methods to provide efficient resource sharing, synchronization, and protection in a many-core compute node. We have experimented with alternative task/dataflow programming models and shown scalability in some cases to hundreds of thousands of cores. Much of our software is in active development through open source projects. Concepts from FOX are being pursued in next generation exascale operating systems. Our OS work focused on adaptive, application tailored OS services optimized for multi → many core processors. We developed a new operating system NIX that supports role-based allocation of cores to processes which was released to open source. We contributed to the IBM FusedOS project, which promoted the concept of latency-optimized and throughput-optimized cores. We built a task queue library based on distributed, fault tolerant key-value store and identified scaling issues. A second fault tolerant task parallel library was developed, based on the Linda tuple space model, that used low level interconnect primitives for optimized communication. We designed fault tolerance mechanisms for task parallel computations employing work stealing for load balancing that scaled to the largest existing supercomputers. Finally, we implemented the Elastic Building Blocks runtime, a library to manage object-oriented distributed software components. To support the research, we won two INCITE awards for time on Intrepid (BG/P) and Mira (BG/Q). Much of our work has had impact in the OS and runtime community through the ASCR Exascale OS/R workshop and report, leading to the research agenda of the Exascale OS/R program. Our project was, however, also affected by attrition of multiple PIs. While the PIs continued to participate and offer guidance as time permitted, losing these key individuals was unfortunate both for the project and for the DOE HPC community.« less
Testing MODFLOW-LGR for simulating flow around buried Quaternary valleys - synthetic test cases

NASA Astrophysics Data System (ADS)

Vilhelmsen, T. N.; Christensen, S.

2009-12-01

In this study the Local Grid Refinement (LGR) method developed for MODFLOW-2005 (Mehl and Hill, 2005) is utilized to describe groundwater flow in areas containing buried Quaternary valley structures. The tests are conducted as comparative analysis between simulations run with a globally refined model, a locally refined model, and a globally coarse model, respectively. The models vary from simple one layer models to more complex ones with up to 25 model layers. The comparisons of accuracy are conducted within the locally refined area and focus on water budgets, simulated heads, and simulated particle traces. Simulations made with the globally refined model are used as reference (regarded as “true” values). As expected, for all test cases the application of local grid refinement resulted in more accurate results than when using the globally coarse model. A significant advantage of utilizing MODFLOW-LGR was that it allows increased numbers of model layers to better resolve complex geology within local areas. This resulted in more accurate simulations than when using either a globally coarse model grid or a locally refined model with lower geological resolution. Improved accuracy in the latter case could not be expected beforehand because difference in geological resolution between the coarse parent model and the refined child model contradicts the assumptions of the Darcy weighted interpolation used in MODFLOW-LGR. With respect to model runtimes, it was sometimes found that the runtime for the locally refined model is much longer than for the globally refined model. This was the case even when the closure criteria were relaxed compared to the globally refined model. These results are contradictory to those presented by Mehl and Hill (2005). Furthermore, in the complex cases it took some testing (model runs) to identify the closure criteria and the damping factor that secured convergence, accurate solutions, and reasonable runtimes. For our cases this is judged to be a serious disadvantage of applying MODFLOW-LGR. Another disadvantage in the studied cases was that the MODFLOW-LGR results proved to be somewhat dependent on the correction method used at the parent-child model interface. This indicates that when applying MODFLOW-LGR there is a need for thorough and case-specific considerations regarding choice of correction method. References: Mehl, S. and M. C. Hill (2005). "MODFLOW-2005, THE U.S. GEOLOGICAL SURVEY MODULAR GROUND-WATER MODEL - DOCUMENTATION OF SHARED NODE LOCAL GRID REFINEMENT (LGR) AND THE BOUNDARY FLOW AND HEAD (BFH) PACKAGE " U.S. Geological Survey Techniques and Methods 6-A12
YAMM - YET ANOTHER MENU MANAGER

NASA Technical Reports Server (NTRS)

Mazer, A. S.

1994-01-01

One of the most time-consuming yet necessary tasks of writing any piece of interactive software is the development of a user interface. Yet Another Menu Manager, YAMM, is an application independent menuing package, designed to remove much of the difficulty and save much of the time inherent in the implementation of the front ends for large packages. Written in C for UNIX-based operating systems, YAMM provides a complete menuing front end for a wide variety of applications, with provisions for terminal independence, user-specific configurations, and dynamic creation of menu trees. Applications running under the menu package consists of two parts: a description of the menu configuration and the body of application code. The menu configuration is used at runtime to define the menu structure and any non-standard keyboard mappings and terminal capabilities. Menu definitions define specific menus within the menu tree. The names used in a definition may be either a reference to an application function or the name of another menu defined within the menu configuration. Application parameters are entered using data entry screens which allow for required and optional parameters, tables, and legal-value lists. Both automatic and application-specific error checking are available. Help is available for both menu operation and specific applications. The YAMM program was written in C for execution on a Sun Microsystems workstation running SunOS, based on the Berkeley (4.2bsd) version of UNIX. During development, YAMM has been used on both 68020 and SPARC architectures, running SunOS versions 3.5 and 4.0. YAMM should be portable to most other UNIX-based systems. It has a central memory requirement of approximately 232K bytes. The standard distribution medium for this program is one .25 inch streaming magnetic tape cartridge in UNIX tar format. It is also available on a 3.5 inch diskette in UNIX tar format. YAMM was developed in 1988 and last updated in 1990. YAMM is a copyrighted work with all copyright vested in NASA.
An Ada implementation of the network manager for the advanced information processing system

NASA Technical Reports Server (NTRS)

Nagle, Gail A.

1986-01-01

From an implementation standpoint, the Ada language provided many features which facilitated the data and procedure abstraction process. The language supported a design which was dynamically flexible (despite strong typing), modular, and self-documenting. Adequate training of programmers requires access to an efficient compiler which supports full Ada. When the performance issues for real time processing are finally addressed by more stringent requirements for tasking features and the development of efficient run-time environments for embedded systems, the full power of the language will be realized.
JAVA PathFinder

NASA Technical Reports Server (NTRS)

Mehhtz, Peter

2005-01-01

JPF is an explicit state software model checker for Java bytecode. Today, JPF is a swiss army knife for all sort of runtime based verification purposes. This basically means JPF is a Java virtual machine that executes your program not just once (like a normal VM), but theoretically in all possible ways, checking for property violations like deadlocks or unhandled exceptions along all potential execution paths. If it finds an error, JPF reports the whole execution that leads to it. Unlike a normal debugger, JPF keeps track of every step how it got to the defect.
User Interface Technology Transfer to NASA's Virtual Wind Tunnel System

NASA Technical Reports Server (NTRS)

vanDam, Andries

1998-01-01

Funded by NASA grants for four years, the Brown Computer Graphics Group has developed novel 3D user interfaces for desktop and immersive scientific visualization applications. This past grant period supported the design and development of a software library, the 3D Widget Library, which supports the construction and run-time management of 3D widgets. The 3D Widget Library is a mechanism for transferring user interface technology from the Brown Graphics Group to the Virtual Wind Tunnel system at NASA Ames as well as the public domain.
Porting of the transfer-matrix method for multilayer thin-film computations on graphics processing units

NASA Astrophysics Data System (ADS)

Limmer, Steffen; Fey, Dietmar

2013-07-01

Thin-film computations are often a time-consuming task during optical design. An efficient way to accelerate these computations with the help of graphics processing units (GPUs) is described. It turned out that significant speed-ups can be achieved. We investigate the circumstances under which the best speed-up values can be expected. Therefore we compare different GPUs among themselves and with a modern CPU. Furthermore, the effect of thickness modulation on the speed-up and the runtime behavior depending on the input data is examined.
Taking Lessons Learned from a Proxy Application to a Full Application for SNAP and PARTISN

DOE PAGES

Womeldorff, Geoffrey Alan; Payne, Joshua Estes; Bergen, Benjamin Karl

2017-06-09

SNAP is a proxy application which simulates the computational motion of a neutral particle transport code, PARTISN. Here in this work, we have adapted parts of SNAP separately; we have re-implemented the iterative shell of SNAP in the task-model runtime Legion, showing an improvement to the original schedule, and we have created multiple Kokkos implementations of the computational kernel of SNAP, displaying similar performance to the native Fortran. We then translate our Kokkos experiments in SNAP to PARTISN, necessitating engineering development, regression testing, and further thought.
The Wang Landau parallel algorithm for the simple grids. Optimizing OpenMPI parallel implementation

NASA Astrophysics Data System (ADS)

Kussainov, A. S.

2017-12-01

The Wang Landau Monte Carlo algorithm to calculate density of states for the different simple spin lattices was implemented. The energy space was split between the individual threads and balanced according to the expected runtime for the individual processes. Custom spin clustering mechanism, necessary for overcoming of the critical slowdown in the certain energy subspaces, was devised. Stable reconstruction of the density of states was of primary importance. Some data post-processing techniques were involved to produce the expected smooth density of states.

Dynamic data distributions in Vienna Fortran

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Mehrotra, Piyush; Moritsch, Hans; Zima, Hans

1993-01-01

Vienna Fortran is a machine-independent language extension of Fortran, which is based upon the Single-Program-Multiple-Data (SPMD) paradigm and allows the user to write programs for distributed-memory systems using global addresses. The language features focus mainly on the issue of distributing data across virtual processor structures. Those features of Vienna Fortran that allow the data distributions of arrays to change dynamically, depending on runtime conditions are discussed. The relevant language features are discussed, their implementation is outlined, and how they may be used in applications is described.
Earth Global Reference Atmospheric Model (GRAM) Overview and Updates: DOLWG Meeting

NASA Technical Reports Server (NTRS)

White, Patrick

2017-01-01

What is Earth-GRAM (Global Reference Atmospheric Model): Provides monthly mean and standard deviation for any point in atmosphere - Monthly, Geographic, and Altitude Variation; Earth-GRAM is a C++ software package - Currently distributed as Earth-GRAM 2016; Atmospheric variables included: pressure, density, temperature, horizontal and vertical winds, speed of sound, and atmospheric constituents; Used by engineering community because of ability to create dispersions in atmosphere at a rapid runtime - Often embedded in trajectory simulation software; Not a forecast model; Does not readily capture localized atmospheric effects.
Preventing SQL Code Injection by Combining Static and Runtime Analysis

DTIC Science & Technology

2008-05-01

attacker changes the developer’s intended structure of an SQ L com- mand by inserting new SQ L keywords or operators. (Su and Wasser - mann provide a...FROM b o o k s WHERE a u t h o r = ’ ’ GROUP BY r a t i n g We use symbol as a placeholder for the indeterminate part of the command (in this...dialects of SQL.) In our model, we mark transitions that correspond to externally defined strings with the symbol . To illustrate, Figure 2 shows the SQL
Synthesizing Dynamic Programming Algorithms from Linear Temporal Logic Formulae

NASA Technical Reports Server (NTRS)

Rosu, Grigore; Havelund, Klaus

2001-01-01

The problem of testing a linear temporal logic (LTL) formula on a finite execution trace of events, generated by an executing program, occurs naturally in runtime analysis of software. We present an algorithm which takes an LTL formula and generates an efficient dynamic programming algorithm. The generated algorithm tests whether the LTL formula is satisfied by a finite trace of events given as input. The generated algorithm runs in linear time, its constant depending on the size of the LTL formula. The memory needed is constant, also depending on the size of the formula.
Integrating an object system into CLIPS: Language design and implementation issues

NASA Technical Reports Server (NTRS)

Auburn, Mark

1990-01-01

This paper describes the reasons why an object system with integrated pattern-matching and object-oriented programming facilities is desirable for CLIPS and how it is possible to integrate such a system into CLIPS while maintaining the run-time performance and the low memory usage for which CLIPS is known. The requirements for an object system in CLIPS that includes object-oriented programming and integrated pattern-matching are discussed and various techniques for optimizing the object system and its integration with the pattern-matcher are presented.
Universal Serial Bus Architecture for Removable Media (USB-ARM)

DOE Office of Scientific and Technical Information (OSTI.GOV)

2011-03-09

USB-ARM creates operating system drivers which sit between removable media and the user and applications. The drivers isolate the media and submit the contents of the media to a virtual machine containing an entire scanning system. This scanning system may include traditional anti-virus, but also allows more detailed analysis of files, including dynamic run-time analysis, helping to prevent "zero-day" threats not already identified in anti-virus signatures. Once cleared, the media is presented to the operating system, at which point it becomes available to users and applications.
Finding Top-kappa Unexplained Activities in Video

DTIC Science & Technology

2012-03-09

parameters that define an UAP instance affect the running time by varying the values of each parameter while keeping the others fixed to a default...value. Runtime of Top-k TUA. Table 1 reports the values we considered for each parameter along with the corresponding default value. Parameter Values...Default value k 1, 2, 5, All All τ 0.4, 0.6, 0.8 0.6 L 160, 200, 240, 280 200 # worlds 7 E+04, 4 E+05, 2 E+07 2 E+07 TABLE 1: Parameter values used in
An Architecture for Continuous Data Quality Monitoring in Medical Centers.

PubMed

Endler, Gregor; Schwab, Peter K; Wahl, Andreas M; Tenschert, Johannes; Lenz, Richard

2015-01-01

In the medical domain, data quality is very important. Since requirements and data change frequently, continuous and sustainable monitoring and improvement of data quality is necessary. Working together with managers of medical centers, we developed an architecture for a data quality monitoring system. The architecture enables domain experts to adapt the system during runtime to match their specifications using a built-in rule system. It also allows arbitrarily complex analyses to be integrated into the monitoring cycle. We evaluate our architecture by matching its components to the well-known data quality methodology TDQM.
Thermal Control System Automation Project (TCSAP)

NASA Technical Reports Server (NTRS)

Boyer, Roger L.

1991-01-01

Information is given in viewgraph form on the Space Station Freedom (SSF) Thermal Control System Automation Project (TCSAP). Topics covered include the assembly of the External Thermal Control System (ETCS); the ETCS functional schematic; the baseline Fault Detection, Isolation, and Recovery (FDIR), including the development of a knowledge based system (KBS) for application of rule based reasoning to the SSF ETCS; TCSAP software architecture; the High Fidelity Simulator architecture; the TCSAP Runtime Object Database (RODB) data flow; KBS functional architecture and logic flow; TCSAP growth and evolution; and TCSAP relationships.
Dshell++: A Component Based, Reusable Space System Simulation Framework

NASA Technical Reports Server (NTRS)

Lim, Christopher S.; Jain, Abhinandan

2009-01-01

This paper describes the multi-mission Dshell++ simulation framework for high fidelity, physics-based simulation of spacecraft, robotic manipulation and mobility systems. Dshell++ is a C++/Python library which uses modern script driven object-oriented techniques to allow component reuse and a dynamic run-time interface for complex, high-fidelity simulation of spacecraft and robotic systems. The goal of the Dshell++ architecture is to manage the inherent complexity of physicsbased simulations while supporting component model reuse across missions. The framework provides several features that support a large degree of simulation configurability and usability.
Comparison of Control Group Generating Methods.

PubMed

Szekér, Szabolcs; Fogarassy, György; Vathy-Fogarassy, Ágnes

2017-01-01

Retrospective studies suffer from drawbacks such as selection bias. As the selection of the control group has a significant impact on the evaluation of the results, it is very important to find the proper method to generate the most appropriate control group. In this paper we suggest two nearest neighbors based control group selection methods that aim to achieve good matching between the individuals of case and control groups. The effectiveness of the proposed methods is evaluated by runtime and accuracy tests and the results are compared to the classical stratified sampling method.
HAL/S-FC compiler system functional specification

NASA Technical Reports Server (NTRS)

1974-01-01

The functional requirements to be met by the HAL/S-FC compiler, and the hardware and software compatibilities between the compiler system and the environment in which it operates are defined. Associated runtime facilities and the interface with the Software Development Laboratory are specified. The construction of the HAL/S-FC system as functionally separate units and the interfaces between those units is described. An overview of the system's capabilities is presented and the hardware/operating system requirements are specified. The computer-dependent aspects of the HAL/S-FC are also specified. Compiler directives are included.
Taking Lessons Learned from a Proxy Application to a Full Application for SNAP and PARTISN

DOE Office of Scientific and Technical Information (OSTI.GOV)

Womeldorff, Geoffrey Alan; Payne, Joshua Estes; Bergen, Benjamin Karl

SNAP is a proxy application which simulates the computational motion of a neutral particle transport code, PARTISN. Here in this work, we have adapted parts of SNAP separately; we have re-implemented the iterative shell of SNAP in the task-model runtime Legion, showing an improvement to the original schedule, and we have created multiple Kokkos implementations of the computational kernel of SNAP, displaying similar performance to the native Fortran. We then translate our Kokkos experiments in SNAP to PARTISN, necessitating engineering development, regression testing, and further thought.
Production scheduling and rescheduling with genetic algorithms.

PubMed

Bierwirth, C; Mattfeld, D C

1999-01-01

A general model for job shop scheduling is described which applies to static, dynamic and non-deterministic production environments. Next, a Genetic Algorithm is presented which solves the job shop scheduling problem. This algorithm is tested in a dynamic environment under different workload situations. Thereby, a highly efficient decoding procedure is proposed which strongly improves the quality of schedules. Finally, this technique is tested for scheduling and rescheduling in a non-deterministic environment. It is shown by experiment that conventional methods of production control are clearly outperformed at reasonable run-time costs.
Coordinated Fault Tolerance for High-Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dongarra, Jack; Bosilca, George; et al.

2013-04-08

Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.
Closing the Gap Between Specification and Programming: VDM++ and SCALA

NASA Technical Reports Server (NTRS)

Havelund, Klaus

2011-01-01

We argue that a modern programming language such as Scala offers a level of succinctness, which makes it suitable for program and systems specification as well as for high-level programming. We illustrate this by comparing the language with the Vdm++ specification language. The comparison also identifies areas where Scala perhaps could be improved, inspired by Vdm++. We furthermore illustrate Scala's potential as a specification language by augmenting it with a combination of parameterized state machines and temporal logic, defined as a library, thereby forming an expressive but simple runtime verification framework.
An object oriented Python interface for atomistic simulations

NASA Astrophysics Data System (ADS)

Hynninen, T.; Himanen, L.; Parkkinen, V.; Musso, T.; Corander, J.; Foster, A. S.

2016-01-01

Programmable simulation environments allow one to monitor and control calculations efficiently and automatically before, during, and after runtime. Environments directly accessible in a programming environment can be interfaced with powerful external analysis tools and extensions to enhance the functionality of the core program, and by incorporating a flexible object based structure, the environments make building and analysing computational setups intuitive. In this work, we present a classical atomistic force field with an interface written in Python language. The program is an extension for an existing object based atomistic simulation environment.
The European quantum technologies flagship programme

NASA Astrophysics Data System (ADS)

Riedel, Max F.; Binosi, Daniele; Thew, Rob; Calarco, Tommaso

2017-09-01

Quantum technologies, such as quantum communication, computation, simulation as well as sensors and metrology, address and manipulate individual quantum states and make use of superposition and entanglement. Both companies and governments have realised the high disruptive potential of this technology. Consequently, the European Commission has announced an ambitious flagship programme to start in 2018. Here, we sum up the history leading to the quantum technologies flagship programme and outline its envisioned goals and structure. We also give an overview of the strategic research agenda for quantum communication, which the flagship will pursue during its 10-year runtime.
Java application for the superposition T-matrix code to study the optical properties of cosmic dust aggregates

NASA Astrophysics Data System (ADS)

Halder, P.; Chakraborty, A.; Deb Roy, P.; Das, H. S.

2014-09-01

In this paper, we report the development of a java application for the Superposition T-matrix code, JaSTA (Java Superposition T-matrix App), to study the light scattering properties of aggregate structures. It has been developed using Netbeans 7.1.2, which is a java integrated development environment (IDE). The JaSTA uses double precession superposition codes for multi-sphere clusters in random orientation developed by Mackowski and Mischenko (1996). It consists of a graphical user interface (GUI) in the front hand and a database of related data in the back hand. Both the interactive GUI and database package directly enable a user to model by self-monitoring respective input parameters (namely, wavelength, complex refractive indices, grain size, etc.) to study the related optical properties of cosmic dust (namely, extinction, polarization, etc.) instantly, i.e., with zero computational time. This increases the efficiency of the user. The database of JaSTA is now created for a few sets of input parameters with a plan to create a large database in future. This application also has an option where users can compile and run the scattering code directly for aggregates in GUI environment. The JaSTA aims to provide convenient and quicker data analysis of the optical properties which can be used in different fields like planetary science, atmospheric science, nano science, etc. The current version of this software is developed for the Linux and Windows platform to study the light scattering properties of small aggregates which will be extended for larger aggregates using parallel codes in future. Catalogue identifier: AETB_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AETB_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 571570 No. of bytes in distributed program, including test data, etc.: 120226886 Distribution format: tar.gz Programming language: Java, Fortran95. Computer: Any Windows or Linux systems capable of hosting a java runtime environment, java3D and fortran95 compiler; Developed on 2.40 GHz Intel Core i3. Operating system: Any Windows or Linux systems capable of hosting a java runtime environment, java3D and fortran95 compiler. RAM: Ranging from a few Mbytes to several Gbytes, depending on the input parameters. Classification: 1.3. External routines: jfreechart-1.0.14 [1] (free plotting library for java), j3d-jre-1.5.2 [2] (3D visualization). Nature of problem: Optical properties of cosmic dust aggregates. Solution method: Java application based on Mackowski and Mischenko's Superposition T-Matrix code. Restrictions: The program is designed for single processor systems. Additional comments: The distribution file for this program is over 120 Mbytes and therefore is not delivered directly when Download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Ranging from few minutes to several hours, depending on the input parameters. References: [1] http://www.jfree.org/index.html [2] https://java3d.java.net/
Fast and Exact Fiber Surfaces for Tetrahedral Meshes.

PubMed

Klacansky, Pavol; Tierny, Julien; Carr, Hamish; Zhao Geng

2017-07-01

Isosurfaces are fundamental geometrical objects for the analysis and visualization of volumetric scalar fields. Recent work has generalized them to bivariate volumetric fields with fiber surfaces, the pre-image of polygons in range space. However, the existing algorithm for their computation is approximate, and is limited to closed polygons. Moreover, its runtime performance does not allow instantaneous updates of the fiber surfaces upon user edits of the polygons. Overall, these limitations prevent a reliable and interactive exploration of the space of fiber surfaces. This paper introduces the first algorithm for the exact computation of fiber surfaces in tetrahedral meshes. It assumes no restriction on the topology of the input polygon, handles degenerate cases and better captures sharp features induced by polygon bends. The algorithm also allows visualization of individual fibers on the output surface, better illustrating their relationship with data features in range space. To enable truly interactive exploration sessions, we further improve the runtime performance of this algorithm. In particular, we show that it is trivially parallelizable and that it scales nearly linearly with the number of cores. Further, we study acceleration data-structures both in geometrical domain and range space and we show how to generalize interval trees used in isosurface extraction to fiber surface extraction. Experiments demonstrate the superiority of our algorithm over previous work, both in terms of accuracy and running time, with up to two orders of magnitude speedups. This improvement enables interactive edits of range polygons with instantaneous updates of the fiber surface for exploration purpose. A VTK-based reference implementation is provided as additional material to reproduce our results.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.