Ada (trademark) projects at NASA. Runtime environment issues and recommendations
NASA Technical Reports Server (NTRS)
Roy, Daniel M.; Wilke, Randall W.
1988-01-01
Ada practitioners should use this document to discuss and establish common short term requirements for Ada runtime environments. The major current Ada runtime environment issues are identified through the analysis of some of the Ada efforts at NASA and other research centers. The runtime environment characteristics of major compilers are compared while alternate runtime implementations are reviewed. Modifications and extensions to the Ada Language Reference Manual to address some of these runtime issues are proposed. Three classes of projects focusing on the most critical runtime features of Ada are recommended, including a range of immediately feasible full scale Ada development projects. Also, a list of runtime features and procurement issues is proposed for consideration by the vendors, contractors and the government.
Compiler analysis for irregular problems in FORTRAN D
NASA Technical Reports Server (NTRS)
Vonhanxleden, Reinhard; Kennedy, Ken; Koelbel, Charles; Das, Raja; Saltz, Joel
1992-01-01
We developed a dataflow framework which provides a basis for rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same off-processor memory locations. Our runtime support gives us a mechanism for tracking and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-processor data. Another crucial function of the compiler analysis is to identify situations which allow runtime preprocessing overheads to be amortized. This dataflow analysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime preprocessing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Gavin Matthew; Bettencourt, Matthew Tyler; Bova, Steven W.
2015-09-01
This report provides in-depth information and analysis to help create a technical road map for developing next- generation Orogramming mocleN and runtime systemsl that support Advanced Simulation and Computing (ASC) work- load requirements. The focus herein is on 4synchronous many-task (AMT) model and runtime systems, which are of great interest in the context of "Oriascale7 computing, as they hold the promise to address key issues associated with future extreme-scale computer architectures. This report includes a thorough qualitative and quantitative examination of three best-of-class AIM] runtime systemsHCharm-HE, Legion, and Uintah, all of which are in use as part of the Centers.more » The studies focus on each of the runtimes' programmability, performance, and mutability. Through the experiments and analysis presented, several overarching Predictive Science Academic Alliance Program II (PSAAP-II) Ascl findings emerge. From a performance perspective, AIVT11runtimes show tremendous potential for addressing extreme- scale challenges. Empirical studies show an AM11 runtime can mitigate performance heterogeneity inherent to the machine itself and that Message Passing Interface (MP1) and AM11runtimes perform comparably under balanced con- ditions. From a programmability and mutability perspective however, none of the runtimes in this study are currently ready for use in developing production-ready Sandia ASCIapplications. The report concludes by recommending a co- design path forward, wherein application, programming model, and runtime system developers work together to define requirements and solutions. Such a requirements-driven co-design approach benefits the community as a whole, with widespread community engagement mitigating risk for both application developers developers. and high-performance computing inntime systein« less
Hines, Michael L; Eichner, Hubert; Schürmann, Felix
2008-08-01
Neuron tree topology equations can be split into two subtrees and solved on different processors with no change in accuracy, stability, or computational effort; communication costs involve only sending and receiving two double precision values by each subtree at each time step. Splitting cells is useful in attaining load balance in neural network simulations, especially when there is a wide range of cell sizes and the number of cells is about the same as the number of processors. For compute-bound simulations load balance results in almost ideal runtime scaling. Application of the cell splitting method to two published network models exhibits good runtime scaling on twice as many processors as could be effectively used with whole-cell balancing.
Angular-contact ball-bearing internal load estimation algorithm using runtime adaptive relaxation
NASA Astrophysics Data System (ADS)
Medina, H.; Mutu, R.
2017-07-01
An algorithm to estimate internal loads for single-row angular contact ball bearings due to externally applied thrust loads and high-operating speeds is presented. A new runtime adaptive relaxation procedure and blending function is proposed which ensures algorithm stability whilst also reducing the number of iterations needed to reach convergence, leading to an average reduction in computation time in excess of approximately 80%. The model is validated based on a 218 angular contact bearing and shows excellent agreement compared to published results.
Using Runtime Analysis to Guide Model Checking of Java Programs
NASA Technical Reports Server (NTRS)
Havelund, Klaus; Norvig, Peter (Technical Monitor)
2001-01-01
This paper describes how two runtime analysis algorithms, an existing data race detection algorithm and a new deadlock detection algorithm, have been implemented to analyze Java programs. Runtime analysis is based on the idea of executing the program once. and observing the generated run to extract various kinds of information. This information can then be used to predict whether other different runs may violate some properties of interest, in addition of course to demonstrate whether the generated run itself violates such properties. These runtime analyses can be performed stand-alone to generate a set of warnings. It is furthermore demonstrated how these warnings can be used to guide a model checker, thereby reducing the search space. The described techniques have been implemented in the b e grown Java model checker called PathFinder.
An Analytical Framework for Runtime of a Class of Continuous Evolutionary Algorithms.
Zhang, Yushan; Hu, Guiwu
2015-01-01
Although there have been many studies on the runtime of evolutionary algorithms in discrete optimization, relatively few theoretical results have been proposed on continuous optimization, such as evolutionary programming (EP). This paper proposes an analysis of the runtime of two EP algorithms based on Gaussian and Cauchy mutations, using an absorbing Markov chain. Given a constant variation, we calculate the runtime upper bound of special Gaussian mutation EP and Cauchy mutation EP. Our analysis reveals that the upper bounds are impacted by individual number, problem dimension number n, searching range, and the Lebesgue measure of the optimal neighborhood. Furthermore, we provide conditions whereby the average runtime of the considered EP can be no more than a polynomial of n. The condition is that the Lebesgue measure of the optimal neighborhood is larger than a combinatorial calculation of an exponential and the given polynomial of n.
Asynchronous Runtimes in Action: An Introspective Framework for a Next Gen Runtime
DOE Office of Scientific and Technical Information (OSTI.GOV)
Suetterlein, Joshua D.; Landwehr, Joshua B.; Marquez, Andres
2016-05-23
One of the most critical challenges that new high performance systems face is the lack of system software support for these large scale systems. Investment on system stack components is essential in the development, debugging and optimization of the new emerging programming models. These emerging models have the promise to better utilize the vast hardware resources available in current and future systems. To aid in the development of applications and new system stacks, runtimes, as instances of their respective execution models, need to produce facilities to introspect their inner workings and allow an indepth attribution of performance bottlenecks and computationalmore » patterns. In other words, the runtime systems need to reduce their opacity to observers so that users of a novel program execution model can adapt their designs to fit the intended model usage, regardless of the layer that they are working on. This design/development loop (akin to co-design) enables synergistic opportunities across the entire computational stack. This paper presents the design and implementation of a simple “gray” box performance attribution harness running inside a fine grain runtime system: the Open Community Runtime (OCR). We showcase what such a framework can indicate regarding the runtime behavior while running at scale. To this end, we have designed a set of synthetic scenarios aimed to test the runtime at their best and worst cases. We present an analysis of the most important runtime features, properties and idiosyncrasies that will affect the development of new runtime features, algorithmic selection, and application development.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Atzeni, Simone; Ahn, Dong; Gopalakrishnan, Ganesh
2017-01-12
Archer is built on top of the LLVM/Clang compilers that support OpenMP. It applies static and dynamic analysis techniques to detect data races in OpenMP programs generating a very low runtime and memory overhead. Static analyses identify data race free OpenMP regions and exclude them from runtime analysis, which is performed by ThreadSanitizer included in LLVM/Clang.
Ultra-high performance size-exclusion chromatography in polar solvents.
Vancoillie, Gertjan; Vergaelen, Maarten; Hoogenboom, Richard
2016-12-23
Size-exclusion chromatography (SEC) is amongst the most widely used polymer characterization methods in both academic and industrial polymer research allowing the determination of molecular weight and distribution parameters, i.e. the dispersity (Ɖ), of unknown polymers. The many advantages, including accuracy, reproducibility and low sample consumption, have contributed to the worldwide success of this analytical technique. The current generation of SEC systems have a stationary phase mostly containing highly porous, styrene-divinylbenzene particles allowing for a size-based separation of various polymers in solution but limiting the flow rate and solvent compatibility. Recently, sub-2μm ethylene-bridged hybrid (BEH) packing materials have become available for SEC analysis. These packing materials can not only withstand much higher pressures up to 15000psi but also show high spatial stability towards different solvents. Combining these BEH columns with the ultra-high performance LC (UHPLC) technology opens up UHP-SEC analysis, showing strongly reduced runtimes and unprecedented solvent compatibility. In this work, this novel characterization technique was compared to conventional SEC using both highly viscous and highly polar solvents as eluent, namely N,N-dimethylacetamide (DMAc), N,N-dimethylformamide (DMF) and methanol, focusing on the suitability of the BEH-columns for analysis of highly functional polymers. The results show a high functional group compatibility comparable with conventional SEC with remarkably short runtimes and enhanced resolution in methanol. Copyright © 2016 Elsevier B.V. All rights reserved.
Compilation time analysis to minimize run-time overhead in preemptive scheduling on multiprocessors
NASA Astrophysics Data System (ADS)
Wauters, Piet; Lauwereins, Rudy; Peperstraete, J.
1994-10-01
This paper describes a scheduling method for hard real-time Digital Signal Processing (DSP) applications, implemented on a multi-processor. Due to the very high operating frequencies of DSP applications (typically hundreds of kHz) runtime overhead should be kept as small as possible. Because static scheduling introduces very little run-time overhead it is used as much as possible. Dynamic pre-emption of tasks is allowed if and only if it leads to better performance in spite of the extra run-time overhead. We essentially combine static scheduling with dynamic pre-emption using static priorities. Since we are dealing with hard real-time applications we must be able to guarantee at compile-time that all timing requirements will be satisfied at run-time. We will show that our method performs at least as good as any static scheduling method. It also reduces the total amount of dynamic pre-emptions compared with run time methods like deadline monotonic scheduling.
Run-time parallelization and scheduling of loops
NASA Technical Reports Server (NTRS)
Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay
1991-01-01
Run-time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run-time, wavefronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing, and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run-time reordering of loop indexes can have a significant impact on performance.
A ROSE-based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liao, C; Quinlan, D; Panas, T
2010-01-25
OpenMP is a popular and evolving programming model for shared-memory platforms. It relies on compilers for optimal performance and to target modern hardware architectures. A variety of extensible and robust research compilers are key to OpenMP's sustainable success in the future. In this paper, we present our efforts to build an OpenMP 3.0 research compiler for C, C++, and Fortran; using the ROSE source-to-source compiler framework. Our goal is to support OpenMP research for ourselves and others. We have extended ROSE's internal representation to handle all of the OpenMP 3.0 constructs and facilitate their manipulation. Since OpenMP research is oftenmore » complicated by the tight coupling of the compiler translations and the runtime system, we present a set of rules to define a common OpenMP runtime library (XOMP) on top of multiple runtime libraries. These rules additionally define how to build a set of translations targeting XOMP. Our work demonstrates how to reuse OpenMP translations across different runtime libraries. This work simplifies OpenMP research by decoupling the problematic dependence between the compiler translations and the runtime libraries. We present an evaluation of our work by demonstrating an analysis tool for OpenMP correctness. We also show how XOMP can be defined using both GOMP and Omni and present comparative performance results against other OpenMP compilers.« less
Application configuration selection for energy-efficient execution on multicore systems
Wang, Shinan; Luo, Bing; Shi, Weisong; ...
2015-09-21
Balanced performance and energy consumption are incorporated in the design of modern computer systems. Several runtime factors, such as concurrency levels, thread mapping strategies, and dynamic voltage and frequency scaling (DVFS) should be considered in order to achieve optimal energy efficiency fora workload. Selecting appropriate run-time factors, however, is one of the most challenging tasks because the run-time factors are architecture-specific and workload-specific. And while most existing works concentrate on either static analysis of the workload or run-time prediction results, we present a hybrid two-step method that utilizes concurrency levels and DVFS settings to achieve the energy efficiency configuration formore » a worldoad. The experimental results based on a Xeon E5620 server with NPB and PARSEC benchmark suites show that the model is able to predict the energy efficient configuration accurately. On average, an additional 10% EDP (Energy Delay Product) saving is obtained by using run-time DVFS for the entire system. An off-line optimal solution is used to compare with the proposed scheme. Finally, the experimental results show that the average extra EDP saved by the optimal solution is within 5% on selective parallel benchmarks.« less
Instrumentation of Java Bytecode for Runtime Analysis
NASA Technical Reports Server (NTRS)
Goldberg, Allen; Haveland, Klaus
2003-01-01
This paper describes JSpy, a system for high-level instrumentation of Java bytecode and its use with JPaX, OUT system for runtime analysis of Java programs. JPaX monitors the execution of temporal logic formulas and performs predicative analysis of deadlocks and data races. JSpy s input is an instrumentation specification, which consists of a collection of rules, where a rule is a predicate/action pair The predicate is a conjunction of syntactic constraints on a Java statement, and the action is a description of logging information to be inserted in the bytecode corresponding to the statement. JSpy is built using JTrek an instrumentation package at a lower level of abstraction.
Rule Systems for Runtime Verification: A Short Tutorial
NASA Astrophysics Data System (ADS)
Barringer, Howard; Havelund, Klaus; Rydeheard, David; Groce, Alex
In this tutorial, we introduce two rule-based systems for on and off-line trace analysis, RuleR and LogScope. RuleR is a conditional rule-based system, which has a simple and easily implemented algorithm for effective runtime verification, and into which one can compile a wide range of temporal logics and other specification formalisms used for runtime verification. Specifications can be parameterized with data, or even with specifications, allowing for temporal logic combinators to be defined. We outline a number of simple syntactic extensions of core RuleR that can lead to further conciseness of specification but still enabling easy and efficient implementation. RuleR is implemented in Java and we will demonstrate its ease of use in monitoring Java programs. LogScope is a derivation of RuleR adding a simple very user-friendly temporal logic. It was developed in Python, specifically for supporting testing of spacecraft flight software for NASA’s next 2011 Mars mission MSL (Mars Science Laboratory). The system has been applied by test engineers to analysis of log files generated by running the flight software. Detailed logging is already part of the system design approach, and hence there is no added instrumentation overhead caused by this approach. While post-mortem log analysis prevents the autonomous reaction to problems possible with traditional runtime verification, it provides a powerful tool for test automation. A new system is being developed that integrates features from both RuleR and LogScope.
A Scala DSL for RETE-Based Runtime Verification
NASA Technical Reports Server (NTRS)
Havelund, Klaus
2013-01-01
Runtime verification (RV) consists in part of checking execution traces against formalized specifications. Several systems have emerged, most of which support specification notations based on state machines, regular expressions, temporal logic, or grammars. The field of Artificial Intelligence (AI) has for an even longer period of time studied rule-based production systems, which at a closer look appear to be relevant for RV, although seemingly focused on slightly different application domains, such as for example business processes and expert systems. The core algorithm in many of these systems is the Rete algorithm. We have implemented a Rete-based runtime verification system, named LogFire (originally intended for offline log analysis but also applicable to online analysis), as an internal DSL in the Scala programming language, using Scala's support for defining DSLs. This combination appears attractive from a practical point of view. Our contribution is in part conceptual in arguing that such rule-based frameworks originating from AI may be suited for RV.
Runtime Analysis of Linear Temporal Logic Specifications
NASA Technical Reports Server (NTRS)
Giannakopoulou, Dimitra; Havelund, Klaus
2001-01-01
This report presents an approach to checking a running program against its Linear Temporal Logic (LTL) specifications. LTL is a widely used logic for expressing properties of programs viewed as sets of executions. Our approach consists of translating LTL formulae to finite-state automata, which are used as observers of the program behavior. The translation algorithm we propose modifies standard LTL to B chi automata conversion techniques to generate automata that check finite program traces. The algorithm has been implemented in a tool, which has been integrated with the generic JPaX framework for runtime analysis of Java programs.
Apparatuses and Methods for Producing Runtime Architectures of Computer Program Modules
NASA Technical Reports Server (NTRS)
Abi-Antoun, Marwan Elia (Inventor); Aldrich, Jonathan Erik (Inventor)
2013-01-01
Apparatuses and methods for producing run-time architectures of computer program modules. One embodiment includes creating an abstract graph from the computer program module and from containment information corresponding to the computer program module, wherein the abstract graph has nodes including types and objects, and wherein the abstract graph relates an object to a type, and wherein for a specific object the abstract graph relates the specific object to a type containing the specific object; and creating a runtime graph from the abstract graph, wherein the runtime graph is a representation of the true runtime object graph, wherein the runtime graph represents containment information such that, for a specific object, the runtime graph relates the specific object to another object that contains the specific object.
A Cross-Platform Infrastructure for Scalable Runtime Application Performance Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jack Dongarra; Shirley Moore; Bart Miller, Jeffrey Hollingsworth
2005-03-15
The purpose of this project was to build an extensible cross-platform infrastructure to facilitate the development of accurate and portable performance analysis tools for current and future high performance computing (HPC) architectures. Major accomplishments include tools and techniques for multidimensional performance analysis, as well as improved support for dynamic performance monitoring of multithreaded and multiprocess applications. Previous performance tool development has been limited by the burden of having to re-write a platform-dependent low-level substrate for each architecture/operating system pair in order to obtain the necessary performance data from the system. Manual interpretation of performance data is not scalable for large-scalemore » long-running applications. The infrastructure developed by this project provides a foundation for building portable and scalable performance analysis tools, with the end goal being to provide application developers with the information they need to analyze, understand, and tune the performance of terascale applications on HPC architectures. The backend portion of the infrastructure provides runtime instrumentation capability and access to hardware performance counters, with thread-safety for shared memory environments and a communication substrate to support instrumentation of multiprocess and distributed programs. Front end interfaces provides tool developers with a well-defined, platform-independent set of calls for requesting performance data. End-user tools have been developed that demonstrate runtime data collection, on-line and off-line analysis of performance data, and multidimensional performance analysis. The infrastructure is based on two underlying performance instrumentation technologies. These technologies are the PAPI cross-platform library interface to hardware performance counters and the cross-platform Dyninst library interface for runtime modification of executable images. The Paradyn and KOJAK projects have made use of this infrastructure to build performance measurement and analysis tools that scale to long-running programs on large parallel and distributed systems and that automate much of the search for performance bottlenecks.« less
MESA: Message-Based System Analysis Using Runtime Verification
NASA Technical Reports Server (NTRS)
Shafiei, Nastaran; Tkachuk, Oksana; Mehlitz, Peter
2017-01-01
In this paper, we present a novel approach and framework for run-time verication of large, safety critical messaging systems. This work was motivated by verifying the System Wide Information Management (SWIM) project of the Federal Aviation Administration (FAA). SWIM provides live air traffic, site and weather data streams for the whole National Airspace System (NAS), which can easily amount to several hundred messages per second. Such safety critical systems cannot be instrumented, therefore, verification and monitoring has to happen using a nonintrusive approach, by connecting to a variety of network interfaces. Due to a large number of potential properties to check, the verification framework needs to support efficient formulation of properties with a suitable Domain Specific Language (DSL). Our approach is to utilize a distributed system that is geared towards connectivity and scalability and interface it at the message queue level to a powerful verification engine. We implemented our approach in the tool called MESA: Message-Based System Analysis, which leverages the open source projects RACE (Runtime for Airspace Concept Evaluation) and TraceContract. RACE is a platform for instantiating and running highly concurrent and distributed systems and enables connectivity to SWIM and scalability. TraceContract is a runtime verication tool that allows for checking traces against properties specified in a powerful DSL. We applied our approach to verify a SWIM service against several requirements.We found errors such as duplicate and out-of-order messages.
Active Storage with Analytics Capabilities and I/O Runtime System for Petascale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choudhary, Alok
Computational scientists must understand results from experimental, observational and computational simulation generated data to gain insights and perform knowledge discovery. As systems approach the petascale range, problems that were unimaginable a few years ago are within reach. With the increasing volume and complexity of data produced by ultra-scale simulations and high-throughput experiments, understanding the science is largely hampered by the lack of comprehensive I/O, storage, acceleration of data manipulation, analysis, and mining tools. Scientists require techniques, tools and infrastructure to facilitate better understanding of their data, in particular the ability to effectively perform complex data analysis, statistical analysis and knowledgemore » discovery. The goal of this work is to enable more effective analysis of scientific datasets through the integration of enhancements in the I/O stack, from active storage support at the file system layer to MPI-IO and high-level I/O library layers. We propose to provide software components to accelerate data analytics, mining, I/O, and knowledge discovery for large-scale scientific applications, thereby increasing productivity of both scientists and the systems. Our approaches include 1) design the interfaces in high-level I/O libraries, such as parallel netCDF, for applications to activate data mining operations at the lower I/O layers; 2) Enhance MPI-IO runtime systems to incorporate the functionality developed as a part of the runtime system design; 3) Develop parallel data mining programs as part of runtime library for server-side file system in PVFS file system; and 4) Prototype an active storage cluster, which will utilize multicore CPUs, GPUs, and FPGAs to carry out the data mining workload.« less
2017-06-01
Training time statistics from Jones’ thesis. . . . . . . . . . . . . . 15 Table 2.2 Evaluation runtime statistics from Camp’s thesis for a single image. 17...Table 2.3 Training and evaluation runtime statistics from Sharpe’s thesis. . . 19 Table 2.4 Sharpe’s screenshot detector results for combinations of...training resources available and time required for each algorithm Jones [15] tested. Table 2.1. Training time statistics from Jones’ [15] thesis. Algorithm
Porting DubaiSat-2 Flight Software to RTEMS: A Feasibility Study
NASA Astrophysics Data System (ADS)
Khoory, Mohammed; Al Shamsi, Zakareyya; Al Midfa, Ibrahim
2015-09-01
This paper details the process taken by EIAST to study RTEMS as a potential real-time operating system for future space missions. The direction was to attempt to run the DubaiSat-2 flight software under RTEMS 4.10.2 with as little modification to the original source as possible. The implementation used a “translation layer” to translate system calls used by the DS-2 flight software into RTEMS system calls. The RTEMS RTL project was integrated to satisfy the run-time loading requirement, and some differences in the filesystem were encountered and worked around. The implementation was tested for performance and stability, and comparisons were made. The conclusion is that RTEMS provides an adequate base for future space missions with certain advantages over other RTOS’s including cost, a smaller executable size, and control over the source. Drawbacks include the slow speed of loading tasks during runtime and some filesystem integrity issues during unexpected reboots.
Adaptive runtime for a multiprocessing API
Antao, Samuel F.; Bertolli, Carlo; Eichenberger, Alexandre E.; O'Brien, John K.
2016-11-15
A computer-implemented method includes selecting a runtime for executing a program. The runtime includes a first combination of feature implementations, where each feature implementation implements a feature of an application programming interface (API). Execution of the program is monitored, and the execution uses the runtime. Monitor data is generated based on the monitoring. A second combination of feature implementations are selected, by a computer processor, where the selection is based at least in part on the monitor data. The runtime is modified by activating the second combination of feature implementations to replace the first combination of feature implementations.
Adaptive runtime for a multiprocessing API
Antao, Samuel F.; Bertolli, Carlo; Eichenberger, Alexandre E.; O'Brien, John K.
2016-10-11
A computer-implemented method includes selecting a runtime for executing a program. The runtime includes a first combination of feature implementations, where each feature implementation implements a feature of an application programming interface (API). Execution of the program is monitored, and the execution uses the runtime. Monitor data is generated based on the monitoring. A second combination of feature implementations are selected, by a computer processor, where the selection is based at least in part on the monitor data. The runtime is modified by activating the second combination of feature implementations to replace the first combination of feature implementations.
NASA Astrophysics Data System (ADS)
Myre, Joseph M.
Heterogeneous computing systems have recently come to the forefront of the High-Performance Computing (HPC) community's interest. HPC computer systems that incorporate special purpose accelerators, such as Graphics Processing Units (GPUs), are said to be heterogeneous. Large scale heterogeneous computing systems have consistently ranked highly on the Top500 list since the beginning of the heterogeneous computing trend. By using heterogeneous computing systems that consist of both general purpose processors and special- purpose accelerators, the speed and problem size of many simulations could be dramatically increased. Ultimately this results in enhanced simulation capabilities that allows, in some cases for the first time, the execution of parameter space and uncertainty analyses, model optimizations, and other inverse modeling techniques that are critical for scientific discovery and engineering analysis. However, simplifying the usage and optimization of codes for heterogeneous computing systems remains a challenge. This is particularly true for scientists and engineers for whom understanding HPC architectures and undertaking performance analysis may not be primary research objectives. To enable scientists and engineers to remain focused on their primary research objectives, a modular environment for geophysical inversion and run-time autotuning on heterogeneous computing systems is presented. This environment is composed of three major components: 1) CUSH---a framework for reducing the complexity of programming heterogeneous computer systems, 2) geophysical inversion routines which can be used to characterize physical systems, and 3) run-time autotuning routines designed to determine configurations of heterogeneous computing systems in an attempt to maximize the performance of scientific and engineering codes. Using three case studies, a lattice-Boltzmann method, a non-negative least squares inversion, and a finite-difference fluid flow method, it is shown that this environment provides scientists and engineers with means to reduce the programmatic complexity of their applications, to perform geophysical inversions for characterizing physical systems, and to determine high-performing run-time configurations of heterogeneous computing systems using a run-time autotuner.
Formal Specifications for an Electrical Power Grid System Stability and Reliability
2015-09-01
expressed in this thesis are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government. IRB...analyze the power grid system requirements and express the critical runtime behavior using first-order logic. First, we identify observable...Verification System, and Type systems to name a few [5]. Theorem proving’s specification dimension is dependent on the expressive power of the formal
Load Index Metrics for an Optimized Management of Web Services: A Systematic Evaluation
Souza, Paulo S. L.; Santana, Regina H. C.; Santana, Marcos J.; Zaluska, Ed; Faical, Bruno S.; Estrella, Julio C.
2013-01-01
The lack of precision to predict service performance through load indices may lead to wrong decisions regarding the use of web services, compromising service performance and raising platform cost unnecessarily. This paper presents experimental studies to qualify the behaviour of load indices in the web service context. The experiments consider three services that generate controlled and significant server demands, four levels of workload for each service and six distinct execution scenarios. The evaluation considers three relevant perspectives: the capability for representing recent workloads, the capability for predicting near-future performance and finally stability. Eight different load indices were analysed, including the JMX Average Time index (proposed in this paper) specifically designed to address the limitations of the other indices. A systematic approach is applied to evaluate the different load indices, considering a multiple linear regression model based on the stepwise-AIC method. The results show that the load indices studied represent the workload to some extent; however, in contrast to expectations, most of them do not exhibit a coherent correlation with service performance and this can result in stability problems. The JMX Average Time index is an exception, showing a stable behaviour which is tightly-coupled to the service runtime for all executions. Load indices are used to predict the service runtime and therefore their inappropriate use can lead to decisions that will impact negatively on both service performance and execution cost. PMID:23874776
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hollman, David; Lifflander, Jonathon; Wilke, Jeremiah
2017-03-14
DARMA is a portability layer for asynchronous many-task (AMT) runtime systems. AMT runtime systems show promise to mitigate challenges imposed by next generation high performance computing architectures. However, current runtime system technologies are not production-ready. DARMA is a portability layer that seeks to insulate application developers from idiosyncrasies of individual runtime systems, thereby facilitating application-developer use of these technologies. DARMA comprises a frontend application programming interface (API) for application developers, a backend API for runtime system developers, and a translation that translates frontend API calls into backend API calls. Application developers use C++ abstractions to annotate both data and tasksmore » in their code. The DARMA translation layer uses C++ template metaprogramming to capture data-task dependencies, and provides this information to a potential backend runtime system via a series of backend API calls.« less
An Improved Neutron Transport Algorithm for HZETRN2006
NASA Astrophysics Data System (ADS)
Slaba, Tony
NASA's new space exploration initiative includes plans for long term human presence in space thereby placing new emphasis on space radiation analyses. In particular, a systematic effort of verification, validation and uncertainty quantification of the tools commonly used for radiation analysis for vehicle design and mission planning has begun. In this paper, the numerical error associated with energy discretization in HZETRN2006 is addressed; large errors in the low-energy portion of the neutron fluence spectrum are produced due to a numerical truncation error in the transport algorithm. It is shown that the truncation error results from the narrow energy domain of the neutron elastic spectral distributions, and that an extremely fine energy grid is required in order to adequately resolve the problem under the current formulation. Since adding a sufficient number of energy points will render the code computationally inefficient, we revisit the light-ion transport theory developed for HZETRN2006 and focus on neutron elastic interactions. The new approach that is developed numerically integrates with adequate resolution in the energy domain without affecting the run-time of the code and is easily incorporated into the current code. Efforts were also made to optimize the computational efficiency of the light-ion propagator; a brief discussion of the efforts is given along with run-time comparisons between the original and updated codes. Convergence testing is then completed by running the code for various environments and shielding materials with many different energy grids to ensure stability of the proposed method.
NASA Technical Reports Server (NTRS)
Agrawal, Gagan; Sussman, Alan; Saltz, Joel
1993-01-01
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). A combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion was described. A runtime library which can be used to port these applications on distributed memory machines was designed and implemented. The library is currently implemented on several different systems. To further ease the task of application programmers, methods were developed for integrating this runtime library with compilers for HPK-like parallel programming languages. How this runtime library was integrated with the Fortran 90D compiler being developed at Syracuse University is discussed. Experimental results to demonstrate the efficacy of our approach are presented. A multiblock Navier-Stokes solver template and a multigrid code were experimented with. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20 percent of the code parallelized by manually inserting calls to the runtime library.
Runtime Detection of C-Style Errors in UPC Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pirkelbauer, P; Liao, C; Panas, T
2011-09-29
Unified Parallel C (UPC) extends the C programming language (ISO C 99) with explicit parallel programming support for the partitioned global address space (PGAS), which provides a global memory space with localized partitions to each thread. Like its ancestor C, UPC is a low-level language that emphasizes code efficiency over safety. The absence of dynamic (and static) safety checks allows programmer oversights and software flaws that can be hard to spot. In this paper, we present an extension of a dynamic analysis tool, ROSE-Code Instrumentation and Runtime Monitor (ROSECIRM), for UPC to help programmers find C-style errors involving the globalmore » address space. Built on top of the ROSE source-to-source compiler infrastructure, the tool instruments source files with code that monitors operations and keeps track of changes to the system state. The resulting code is linked to a runtime monitor that observes the program execution and finds software defects. We describe the extensions to ROSE-CIRM that were necessary to support UPC. We discuss complications that arise from parallel code and our solutions. We test ROSE-CIRM against a runtime error detection test suite, and present performance results obtained from running error-free codes. ROSE-CIRM is released as part of the ROSE compiler under a BSD-style open source license.« less
Malware detection and analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chiang, Ken; Lloyd, Levi; Crussell, Jonathan
Embodiments of the invention describe systems and methods for malicious software detection and analysis. A binary executable comprising obfuscated malware on a host device may be received, and incident data indicating a time when the binary executable was received and identifying processes operating on the host device may be recorded. The binary executable is analyzed via a scalable plurality of execution environments, including one or more non-virtual execution environments and one or more virtual execution environments, to generate runtime data and deobfuscation data attributable to the binary executable. At least some of the runtime data and deobfuscation data attributable tomore » the binary executable is stored in a shared database, while at least some of the incident data is stored in a private, non-shared database.« less
2014 Runtime Systems Summit. Runtime Systems Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sarkar, Vivek; Budimlic, Zoran; Kulkani, Milind
2016-09-19
This report summarizes runtime system challenges for exascale computing, that follow from the fundamental challenges for exascale systems that have been well studied in past reports, e.g., [6, 33, 34, 32, 24]. Some of the key exascale challenges that pertain to runtime systems include parallelism, energy efficiency, memory hierarchies, data movement, heterogeneous processors and memories, resilience, performance variability, dynamic resource allocation, performance portability, and interoperability with legacy code. In addition to summarizing these challenges, the report also outlines different approaches to addressing these significant challenges that have been pursued by research projects in the DOE-sponsored X-Stack and OS/R programs. Sincemore » there is often confusion as to what exactly the term “runtime system” refers to in the software stack, we include a section on taxonomy to clarify the terminology used by participants in these research projects. In addition, we include a section on deployment opportunities for vendors and government labs to build on the research results from these projects. Finally, this report is also intended to provide a framework for discussing future research and development investments for exascale runtime systems, and for clarifying the role of runtime systems in exascale software.« less
Methodology for fast detection of false sharing in threaded scientific codes
Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang
2014-11-25
A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
An Adaptive Cross-Architecture Combination Method for Graph Traversal
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Song, Shuaiwen; Kerbyson, Darren J.
2014-06-18
Breadth-First Search (BFS) is widely used in many real-world applications including computational biology, social networks, and electronic design automation. The combination method, using both top-down and bottom-up techniques, is the most effective BFS approach. However, current combination methods rely on trial-and-error and exhaustive search to locate the optimal switching point, which may cause significant runtime overhead. To solve this problem, we design an adaptive method based on regression analysis to predict an optimal switching point for the combination method at runtime within less than 0.1% of the BFS execution time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tock, Yoav; Mandler, Benjamin; Moreira, Jose
2013-01-01
As HPC systems and applications get bigger and more complex, we are approaching an era in which resiliency and run-time elasticity concerns be- come paramount.We offer a building block for an alternative resiliency approach in which computations will be able to make progress while components fail, in addition to enabling a dynamic set of nodes throughout a computation lifetime. The core of our solution is a hierarchical scalable membership service provid- ing eventual consistency semantics. An attribute replication service is used for hierarchy organization, and is exposed to external applications. Our solution is based on P2P technologies and provides resiliencymore » and elastic runtime support at ultra large scales. Resulting middleware is general purpose while exploiting HPC platform unique features and architecture. We have implemented and tested this system on BlueGene/P with Linux, and using worst-case analysis, evaluated the service scalability as effective for up to 1M nodes.« less
MOLAR: Modular Linux and Adaptive Runtime Support for HEC OS/R Research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frank Mueller
2009-02-05
MOLAR is a multi-institution research effort that concentrates on adaptive, reliable,and efficient operating and runtime system solutions for ultra-scale high-end scientific computing on the next generation of supercomputers. This research addresses the challenges outlined by the FAST-OS - forum to address scalable technology for runtime and operating systems --- and HECRTF --- high-end computing revitalization task force --- activities by providing a modular Linux and adaptable runtime support for high-end computing operating and runtime systems. The MOLAR research has the following goals to address these issues. (1) Create a modular and configurable Linux system that allows customized changes based onmore » the requirements of the applications, runtime systems, and cluster management software. (2) Build runtime systems that leverage the OS modularity and configurability to improve efficiency, reliability, scalability, ease-of-use, and provide support to legacy and promising programming models. (3) Advance computer reliability, availability and serviceability (RAS) management systems to work cooperatively with the OS/R to identify and preemptively resolve system issues. (4) Explore the use of advanced monitoring and adaptation to improve application performance and predictability of system interruptions. The overall goal of the research conducted at NCSU is to develop scalable algorithms for high-availability without single points of failure and without single points of control.« less
Highly accurate fast lung CT registration
NASA Astrophysics Data System (ADS)
Rühaak, Jan; Heldmann, Stefan; Kipshagen, Till; Fischer, Bernd
2013-03-01
Lung registration in thoracic CT scans has received much attention in the medical imaging community. Possible applications range from follow-up analysis, motion correction for radiation therapy, monitoring of air flow and pulmonary function to lung elasticity analysis. In a clinical environment, runtime is always a critical issue, ruling out quite a few excellent registration approaches. In this paper, a highly efficient variational lung registration method based on minimizing the normalized gradient fields distance measure with curvature regularization is presented. The method ensures diffeomorphic deformations by an additional volume regularization. Supplemental user knowledge, like a segmentation of the lungs, may be incorporated as well. The accuracy of our method was evaluated on 40 test cases from clinical routine. In the EMPIRE10 lung registration challenge, our scheme ranks third, with respect to various validation criteria, out of 28 algorithms with an average landmark distance of 0.72 mm. The average runtime is about 1:50 min on a standard PC, making it by far the fastest approach of the top-ranking algorithms. Additionally, the ten publicly available DIR-Lab inhale-exhale scan pairs were registered to subvoxel accuracy at computation times of only 20 seconds. Our method thus combines very attractive runtimes with state-of-the-art accuracy in a unique way.
Alternatives to Re-Planning: Methods for Plan Re-Evaluation at Runtime
NASA Technical Reports Server (NTRS)
Benazera, Emmanuel
2005-01-01
Current planning algorithms have difficulty handling the complexity that is due to an increase in domain uncertainty, and especially in the case of multi-dimensional continuous spaces. Therefore, they produce plans that do not take into account numerous situations that can occur at runtime, such as faults or other changes in the planning domain itself. Thus there is a gap between the plan generation and the reality experienced at runtime. Here we present two methods that allow the plan conditionals to be revised w.r.t. uncertainty on the system as estimated at runtime.
Preventing Run-Time Bugs at Compile-Time Using Advanced C++
DOE Office of Scientific and Technical Information (OSTI.GOV)
Neswold, Richard
When writing software, we develop algorithms that tell the computer what to do at run-time. Our solutions are easier to understand and debug when they are properly modeled using class hierarchies, enumerations, and a well-factored API. Unfortunately, even with these design tools, we end up having to debug our programs at run-time. Worse still, debugging an embedded system changes its dynamics, making it tough to find and fix concurrency issues. This paper describes techniques using C++ to detect run-time bugs *at compile time*. A concurrency library, developed at Fermilab, is used for examples in illustrating these techniques.
Automated Run-Time Mission and Dialog Generation
2007-03-01
Processing, Social Network Analysis, Simulation, Automated Scenario Generation 16. PRICE CODE 17. SECURITY CLASSIFICATION OF REPORT Unclassified...9 D. SOCIAL NETWORKS...13 B. MISSION AND DIALOG GENERATION.................................................13 C. SOCIAL NETWORKS
High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms
Teodoro, George; Pan, Tony; Kurc, Tahsin M.; Kong, Jun; Cooper, Lee A. D.; Podhorszki, Norbert; Klasky, Scott; Saltz, Joel H.
2014-01-01
Analysis of large pathology image datasets offers significant opportunities for the investigation of disease morphology, but the resource requirements of analysis pipelines limit the scale of such studies. Motivated by a brain cancer study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high resolution pathology tissue images on distributed CPU-GPU platforms. To achieve efficient execution on these hybrid systems, we have built runtime support that allows us to express the cancer image analysis application as a hierarchical data processing pipeline. The application is implemented as a coarse-grain pipeline of stages, where each stage may be further partitioned into another pipeline of fine-grain operations. The fine-grain operations are efficiently managed and scheduled for computation on CPUs and GPUs using performance aware scheduling techniques along with several optimizations, including architecture aware process placement, data locality conscious task assignment, data prefetching, and asynchronous data copy. These optimizations are employed to maximize the utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. Our experimental evaluation shows that the cooperative use of CPUs and GPUs achieves significant improvements on top of GPU-only versions (up to 1.6×) and that the execution of the application as a set of fine-grain operations provides more opportunities for runtime optimizations and attains better performance than coarser-grain, monolithic implementations used in other works. An implementation of the cancer image analysis pipeline using the runtime support was able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles (about 1.8TB uncompressed) in less than 4 minutes (150 tiles/second) on 100 nodes of a state-of-the-art hybrid cluster system. PMID:25419546
Runtime Performance Monitoring Tool for RTEMS System Software
NASA Astrophysics Data System (ADS)
Cho, B.; Kim, S.; Park, H.; Kim, H.; Choi, J.; Chae, D.; Lee, J.
2007-08-01
RTEMS is a commercial-grade real-time operating system that supports multi-processor computers. However, there are not many development tools for RTEMS. In this paper, we report new RTEMS-based runtime performance monitoring tool. We have implemented a light weight runtime monitoring task with an extension to the RTEMS APIs. Using our tool, software developers can verify various performance- related parameters during runtime. Our tool can be used during software development phase and in-orbit operation as well. Our implemented target agent is light weight and has small overhead using SpaceWire interface. Efforts to reduce overhead and to add other monitoring parameters are currently under research.
Towards Run-time Assurance of Advanced Propulsion Algorithms
NASA Technical Reports Server (NTRS)
Wong, Edmond; Schierman, John D.; Schlapkohl, Thomas; Chicatelli, Amy
2014-01-01
This paper covers the motivation and rationale for investigating the application of run-time assurance methods as a potential means of providing safety assurance for advanced propulsion control systems. Certification is becoming increasingly infeasible for such systems using current verification practices. Run-time assurance systems hold the promise of certifying these advanced systems by continuously monitoring the state of the feedback system during operation and reverting to a simpler, certified system if anomalous behavior is detected. The discussion will also cover initial efforts underway to apply a run-time assurance framework to NASA's model-based engine control approach. Preliminary experimental results are presented and discussed.
MPI Runtime Error Detection with MUST: Advances in Deadlock Detection
Hilbrich, Tobias; Protze, Joachim; Schulz, Martin; ...
2013-01-01
The widely used Message Passing Interface (MPI) is complex and rich. As a result, application developers require automated tools to avoid and to detect MPI programming errors. We present the Marmot Umpire Scalable Tool (MUST) that detects such errors with significantly increased scalability. We present improvements to our graph-based deadlock detection approach for MPI, which cover future MPI extensions. Our enhancements also check complex MPI constructs that no previous graph-based detection approach handled correctly. Finally, we present optimizations for the processing of MPI operations that reduce runtime deadlock detection overheads. Existing approaches often require ( p ) analysis time permore » MPI operation, for p processes. We empirically observe that our improvements lead to sub-linear or better analysis time per operation for a wide range of real world applications.« less
Experiments with Test Case Generation and Runtime Analysis
NASA Technical Reports Server (NTRS)
Artho, Cyrille; Drusinsky, Doron; Goldberg, Allen; Havelund, Klaus; Lowry, Mike; Pasareanu, Corina; Rosu, Grigore; Visser, Willem; Koga, Dennis (Technical Monitor)
2003-01-01
Software testing is typically an ad hoc process where human testers manually write many test inputs and expected test results, perhaps automating their execution in a regression suite. This process is cumbersome and costly. This paper reports preliminary results on an approach to further automate this process. The approach consists of combining automated test case generation based on systematically exploring the program's input domain, with runtime analysis, where execution traces are monitored and verified against temporal logic specifications, or analyzed using advanced algorithms for detecting concurrency errors such as data races and deadlocks. The approach suggests to generate specifications dynamically per input instance rather than statically once-and-for-all. The paper describes experiments with variants of this approach in the context of two examples, a planetary rover controller and a space craft fault protection system.
Estimating job runtime for CMS analysis jobs
NASA Astrophysics Data System (ADS)
Sfiligoi, I.
2014-06-01
The basic premise of pilot systems is to create an overlay scheduling system on top of leased resources. And by definition, leases have a limited lifetime, so any job that is scheduled on such resources must finish before the lease is over, or it will be killed and all the computation is wasted. In order to effectively schedule jobs to resources, the pilot system thus requires the expected runtime of the users' jobs. Past studies have shown that relying on user provided estimates is not a valid strategy, so the system should try to make an estimate by itself. This paper provides a study of the historical data obtained from the Compact Muon Solenoid (CMS) experiment's Analysis Operations submission system. Clear patterns are observed, suggesting that making prediction of an expected job lifetime range is achievable with high confidence level in this environment.
A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rajbhandari, Samyam; Kim, Jinsung; Krishnamoorthy, Sriram
This paper describes the design and implementation of a layered domain-specific compiler to support MADNESS---Multiresolution ADaptive Numerical Environment for Scientific Simulation. MADNESS is a high-level software environment for the solution of integral and differential equations in many dimensions, using adaptive and fast harmonic analysis methods with guaranteed precision. MADNESS uses k-d trees to represent spatial functions and implements operators like addition, multiplication, differentiation, and integration on the numerical representation of functions. The MADNESS runtime system provides global namespace support and a task-based execution model including futures. MADNESS is currently deployed on massively parallel supercomputers and has enabled many science advances.more » Due to the highly irregular and statically unpredictable structure of the k-d trees representing the spatial functions encountered in MADNESS applications, only purely runtime approaches to optimization have previously been implemented in the MADNESS framework. This paper describes a layered domain-specific compiler developed to address some performance bottlenecks in MADNESS. The newly developed static compile-time optimizations, in conjunction with the MADNESS runtime support, enable significant performance improvement for the MADNESS framework.« less
NASA Astrophysics Data System (ADS)
Christensen, C.; Summa, B.; Scorzelli, G.; Lee, J. W.; Venkat, A.; Bremer, P. T.; Pascucci, V.
2017-12-01
Massive datasets are becoming more common due to increasingly detailed simulations and higher resolution acquisition devices. Yet accessing and processing these huge data collections for scientific analysis is still a significant challenge. Solutions that rely on extensive data transfers are increasingly untenable and often impossible due to lack of sufficient storage at the client side as well as insufficient bandwidth to conduct such large transfers, that in some cases could entail petabytes of data. Large-scale remote computing resources can be useful, but utilizing such systems typically entails some form of offline batch processing with long delays, data replications, and substantial cost for any mistakes. Both types of workflows can severely limit the flexible exploration and rapid evaluation of new hypotheses that are crucial to the scientific process and thereby impede scientific discovery. In order to facilitate interactivity in both analysis and visualization of these massive data ensembles, we introduce a dynamic runtime system suitable for progressive computation and interactive visualization of arbitrarily large, disparately located spatiotemporal datasets. Our system includes an embedded domain-specific language (EDSL) that allows users to express a wide range of data analysis operations in a simple and abstract manner. The underlying runtime system transparently resolves issues such as remote data access and resampling while at the same time maintaining interactivity through progressive and interruptible processing. Computations involving large amounts of data can be performed remotely in an incremental fashion that dramatically reduces data movement, while the client receives updates progressively thereby remaining robust to fluctuating network latency or limited bandwidth. This system facilitates interactive, incremental analysis and visualization of massive remote datasets up to petabytes in size. Our system is now available for general use in the community through both docker and anaconda.
NASA Technical Reports Server (NTRS)
Chien, Andrew A.; Karamcheti, Vijay; Plevyak, John; Sahrawat, Deepak
1993-01-01
Concurrent object-oriented languages, particularly fine-grained approaches, reduce the difficulty of large scale concurrent programming by providing modularity through encapsulation while exposing large degrees of concurrency. Despite these programmability advantages, such languages have historically suffered from poor efficiency. This paper describes the Concert project whose goal is to develop portable, efficient implementations of fine-grained concurrent object-oriented languages. Our approach incorporates aggressive program analysis and program transformation with careful information management at every stage from the compiler to the runtime system. The paper discusses the basic elements of the Concert approach along with a description of the potential payoffs. Initial performance results and specific plans for system development are also detailed.
Accountable Information Flow for Java-Based Web Applications
2010-01-01
runtime library Swift server runtime Java servlet framework HTTP Web server Web browser Figure 2: The Swift architecture introduced an open-ended...On the server, the Java application code links against Swift’s server-side run-time library, which in turn sits on top of the standard Java servlet ...AFRL-RI-RS-TR-2010-9 Final Technical Report January 2010 ACCOUNTABLE INFORMATION FLOW FOR JAVA -BASED WEB APPLICATIONS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Landwehr, Joshua B.; Suetterlein, Joshua D.; Marquez, Andres
2016-05-16
Since 2012, the U.S. Department of Energy’s X-Stack program has been developing solutions including runtime systems, programming models, languages, compilers, and tools for the Exascale system software to address crucial performance and power requirements. Fine grain programming models and runtime systems show a great potential to efficiently utilize the underlying hardware. Thus, they are essential to many X-Stack efforts. An abundant amount of small tasks can better utilize the vast parallelism available on current and future machines. Moreover, finer tasks can recover faster and adapt better, due to a decrease in state and control. Nevertheless, current applications have been writtenmore » to exploit old paradigms (such as Communicating Sequential Processor and Bulk Synchronous Parallel processing). To fully utilize the advantages of these new systems, applications need to be adapted to these new paradigms. As part of the applications’ porting process, in-depth characterization studies, focused on both application characteristics and runtime features, need to take place to fully understand the application performance bottlenecks and how to resolve them. This paper presents a characterization study for a novel high performance runtime system, called the Open Community Runtime, using key HPC kernels as its vehicle. This study has the following contributions: one of the first high performance, fine grain, distributed memory runtime system implementing the OCR standard (version 0.99a); and a characterization study of key HPC kernels in terms of runtime primitives running on both intra and inter node environments. Running on a general purpose cluster, we have found up to 1635x relative speed-up for a parallel tiled Cholesky Kernels on 128 nodes with 16 cores each and a 1864x relative speed-up for a parallel tiled Smith-Waterman kernel on 128 nodes with 30 cores.« less
An overview of the Opus language and runtime system
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Haines, Matthew
1994-01-01
We have recently introduced a new language, called Opus, which provides a set of Fortran language extensions that allow for integrated support of task and data parallelism. lt also provides shared data abstractions (SDA's) as a method for communication and synchronization among these tasks. In this paper, we first provide a brief description of the language features and then focus on both the language-dependent and language-independent parts of the runtime system that support the language. The language-independent portion of the runtime system supports lightweight threads across multiple address spaces, and is built upon existing lightweight thread and communication systems. The language-dependent portion of the runtime system supports conditional invocation of SDA methods and distributed SDA argument handling.
Dynamic analysis methods for detecting anomalies in asynchronously interacting systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, Akshat; Solis, John Hector; Matschke, Benjamin
2014-01-01
Detecting modifications to digital system designs, whether malicious or benign, is problematic due to the complexity of the systems being analyzed. Moreover, static analysis techniques and tools can only be used during the initial design and implementation phases to verify safety and liveness properties. It is computationally intractable to guarantee that any previously verified properties still hold after a system, or even a single component, has been produced by a third-party manufacturer. In this paper we explore new approaches for creating a robust system design by investigating highly-structured computational models that simplify verification and analysis. Our approach avoids the needmore » to fully reconstruct the implemented system by incorporating a small verification component that dynamically detects for deviations from the design specification at run-time. The first approach encodes information extracted from the original system design algebraically into a verification component. During run-time this component randomly queries the implementation for trace information and verifies that no design-level properties have been violated. If any deviation is detected then a pre-specified fail-safe or notification behavior is triggered. Our second approach utilizes a partitioning methodology to view liveness and safety properties as a distributed decision task and the implementation as a proposed protocol that solves this task. Thus the problem of verifying safety and liveness properties is translated to that of verifying that the implementation solves the associated decision task. We develop upon results from distributed systems and algebraic topology to construct a learning mechanism for verifying safety and liveness properties from samples of run-time executions.« less
Petersson, K J F; Friberg, L E; Karlsson, M O
2010-10-01
Computer models of biological systems grow more complex as computing power increase. Often these models are defined as differential equations and no analytical solutions exist. Numerical integration is used to approximate the solution; this can be computationally intensive, time consuming and be a large proportion of the total computer runtime. The performance of different integration methods depend on the mathematical properties of the differential equations system at hand. In this paper we investigate the possibility of runtime gains by calculating parts of or the whole differential equations system at given time intervals, outside of the differential equations solver. This approach was tested on nine models defined as differential equations with the goal to reduce runtime while maintaining model fit, based on the objective function value. The software used was NONMEM. In four models the computational runtime was successfully reduced (by 59-96%). The differences in parameter estimates, compared to using only the differential equations solver were less than 12% for all fixed effects parameters. For the variance parameters, estimates were within 10% for the majority of the parameters. Population and individual predictions were similar and the differences in OFV were between 1 and -14 units. When computational runtime seriously affects the usefulness of a model we suggest evaluating this approach for repetitive elements of model building and evaluation such as covariate inclusions or bootstraps.
AEOSS runtime manual for system analysis on Advanced Earth-Orbital Spacecraft Systems
NASA Technical Reports Server (NTRS)
Lee, Hwa-Ping
1990-01-01
Advanced earth orbital spacecraft system (AEOSS) enables users to project the required power, weight, and cost for a generic earth-orbital spacecraft system. These variables are calculated on the component and subsystem levels, and then the system level. The included six subsystems are electric power, thermal control, structure, auxiliary propulsion, attitude control, and communication, command, and data handling. The costs are computed using statistically determined models that were derived from the flown spacecraft in the past and were categorized into classes according to their functions and structural complexity. Selected design and performance analyses for essential components and subsystems are also provided. AEOSS has the feature permitting a user to enter known values of these parameters, totally and partially, at all levels. All information is of vital importance to project managers of subsystems or a spacecraft system. AEOSS is a specially tailored software coded from the relational database program of the Acius' 4th Dimension with a Macintosh version. Because of the licensing agreements, two versions of the AEOSS documents were prepared. This version, AEOSS Runtime Manual, is permitted to be distributed with a finite number of the restrictive 4D Runtime version. It can perform all contained applications without any programming alterations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haas, Nicholas Q; Gillen, Robert E; Karnowski, Thomas P
MathWorks' MATLAB is widely used in academia and industry for prototyping, data analysis, data processing, etc. Many users compile their programs using the MATLAB Compiler to run on workstations/computing clusters via the free MATLAB Compiler Runtime (MCR). The MCR facilitates the execution of code calling Application Programming Interfaces (API) functions from both base MATLAB and MATLAB toolboxes. In a Linux environment, a sizable number of third-party runtime dependencies (i.e. shared libraries) are necessary. Unfortunately, to the MTLAB community's knowledge, these dependencies are not documented, leaving system administrators and/or end-users to find/install the necessary libraries either as runtime errors resulting frommore » them missing or by inspecting the header information of Executable and Linkable Format (ELF) libraries of the MCR to determine which ones are missing from the system. To address various shortcomings, Docker Images based on Community Enterprise Operating System (CentOS) 7, a derivative of Redhat Enterprise Linux (RHEL) 7, containing recent (2015-2017) MCR releases and their dependencies were created. These images, along with a provided sample Docker Compose YAML Script, can be used to create a simulated computing cluster where MATLAB Compiler created binaries can be executed using a sample Slurm Workload Manager script.« less
Environment Modeling Using Runtime Values for JPF-Android
NASA Technical Reports Server (NTRS)
van der Merwe, Heila; Tkachuk, Oksana; Nel, Seal; van der Merwe, Brink; Visser, Willem
2015-01-01
Software applications are developed to be executed in a specific environment. This environment includes external native libraries to add functionality to the application and drivers to fire the application execution. For testing and verification, the environment of an application is simplified abstracted using models or stubs. Empty stubs, returning default values, are simple to generate automatically, but they do not perform well when the application expects specific return values. Symbolic execution is used to find input parameters for drivers and return values for library stubs, but it struggles to detect the values of complex objects. In this work-in-progress paper, we explore an approach to generate drivers and stubs based on values collected during runtime instead of using default values. Entry-points and methods that need to be modeled are instrumented to log their parameters and return values. The instrumented applications are then executed using a driver and instrumented libraries. The values collected during runtime are used to generate driver and stub values on- the-fly that improve coverage during verification by enabling the execution of code that previously crashed or was missed. We are implementing this approach to improve the environment model of JPF-Android, our model checking and analysis tool for Android applications.
Quantified Event Automata: Towards Expressive and Efficient Runtime Monitors
NASA Technical Reports Server (NTRS)
Barringer, Howard; Falcone, Ylies; Havelund, Klaus; Reger, Giles; Rydeheard, David
2012-01-01
Runtime verification is the process of checking a property on a trace of events produced by the execution of a computational system. Runtime verification techniques have recently focused on parametric specifications where events take data values as parameters. These techniques exist on a spectrum inhabited by both efficient and expressive techniques. These characteristics are usually shown to be conflicting - in state-of-the-art solutions, efficiency is obtained at the cost of loss of expressiveness and vice-versa. To seek a solution to this conflict we explore a new point on the spectrum by defining an alternative runtime verification approach.We introduce a new formalism for concisely capturing expressive specifications with parameters. Our technique is more expressive than the currently most efficient techniques while at the same time allowing for optimizations.
Optimized Temporal Monitors for SystemC
NASA Technical Reports Server (NTRS)
Tabakov, Deian; Rozier, Kristin Y.; Vardi, Moshe Y.
2012-01-01
SystemC is a modeling language built as an extension of C++. Its growing popularity and the increasing complexity of designs have motivated research efforts aimed at the verification of SystemC models using assertion-based verification (ABV), where the designer asserts properties that capture the design intent in a formal language such as PSL or SVA. The model then can be verified against the properties using runtime or formal verification techniques. In this paper we focus on automated generation of runtime monitors from temporal properties. Our focus is on minimizing runtime overhead, rather than monitor size or monitor-generation time. We identify four issues in monitor generation: state minimization, alphabet representation, alphabet minimization, and monitor encoding. We conduct extensive experimentation and identify a combination of settings that offers the best performance in terms of runtime overhead.
Establish and Evaluate Ada Runtime Features of Interest for Real-Time Systems
1989-02-15
Runtime Features of Interest for Real - Time Systems -,-. CLEARED POR :)E,4 pUEL tCATLON SEP 2 0 19E19 ,CETM ORP t ’R RE LOO O Nt-U~HM- ANDQ SECURITY...ESTABLISH AND EVALUATE py ADA RUNTIME FEATURES OF INTEREST FOR REAL - TIME SYSTEMS CONTRACT NUMBER: MDA 903-87-D-0056 IITRI PROJECT NUMBER: T06168 PREPARED...2 2.0 SELECTION PROCESS OVERVIEW .................................... 3 2.1 REAL - TIME SYSTEMS IDENTIFICATION ........................... 4 2.2
NASA Astrophysics Data System (ADS)
Kern, Bastian; Jöckel, Patrick
2016-10-01
Numerical climate and weather models have advanced to finer scales, accompanied by large amounts of output data. The model systems hit the input and output (I/O) bottleneck of modern high-performance computing (HPC) systems. We aim to apply diagnostic methods online during the model simulation instead of applying them as a post-processing step to written output data, to reduce the amount of I/O. To include diagnostic tools into the model system, we implemented a standardised, easy-to-use interface based on the Modular Earth Submodel System (MESSy) into the ICOsahedral Non-hydrostatic (ICON) modelling framework. The integration of the diagnostic interface into the model system is briefly described. Furthermore, we present a prototype implementation of an advanced online diagnostic tool for the aggregation of model data onto a user-defined regular coarse grid. This diagnostic tool will be used to reduce the amount of model output in future simulations. Performance tests of the interface and of two different diagnostic tools show, that the interface itself introduces no overhead in form of additional runtime to the model system. The diagnostic tools, however, have significant impact on the model system's runtime. This overhead strongly depends on the characteristics and implementation of the diagnostic tool. A diagnostic tool with high inter-process communication introduces large overhead, whereas the additional runtime of a diagnostic tool without inter-process communication is low. We briefly describe our efforts to reduce the additional runtime from the diagnostic tools, and present a brief analysis of memory consumption. Future work will focus on optimisation of the memory footprint and the I/O operations of the diagnostic interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bergen, Ben; Moss, Nicholas; Charest, Marc Robert Joseph
FleCSI is a compile-time configurable framework designed to support multi-physics application development. As such, FleCSI attempts to provide a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. Current support includes multi-dimensional mesh topology, mesh geometry, and mesh adjacency information, n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures. FleCSI also introduces a functional programming model with control, execution, and data abstractions that are consistent with both MPI and state-of-the-art task-based runtimes such as Legion and Charm++. The FleCSI abstraction layer providesmore » the developer with insulation from the underlying runtime, while allowing support for multiple runtime systems, including conventional models like asynchronous MPI. The intent is to give developers a concrete set of user-friendly programming tools that can be used now, while allowing flexibility in choosing runtime implementations and optimizations that can be applied to architectures and runtimes that arise in the future. The control and execution models in FleCSI also provide formal nomenclature for describing poorly understood concepts like kernels and tasks.« less
Framework for architecture-independent run-time reconfigurable applications
NASA Astrophysics Data System (ADS)
Lehn, David I.; Hudson, Rhett D.; Athanas, Peter M.
2000-10-01
Configurable Computing Machines (CCMs) have emerged as a technology with the computational benefits of custom ASICs as well as the flexibility and reconfigurability of general-purpose microprocessors. Significant effort from the research community has focused on techniques to move this reconfigurability from a rapid application development tool to a run-time tool. This requires the ability to change the hardware design while the application is executing and is known as Run-Time Reconfiguration (RTR). Widespread acceptance of run-time reconfigurable custom computing depends upon the existence of high-level automated design tools. Such tools must reduce the designers effort to port applications between different platforms as the architecture, hardware, and software evolves. A Java implementation of a high-level application framework, called Janus, is presented here. In this environment, developers create Java classes that describe the structural behavior of an application. The framework allows hardware and software modules to be freely mixed and interchanged. A compilation phase of the development process analyzes the structure of the application and adapts it to the target platform. Janus is capable of structuring the run-time behavior of an application to take advantage of the memory and computational resources available.
Organisational Pattern Driven Recovery Mechanisms
NASA Astrophysics Data System (ADS)
Giacomo, Valentina Di; Presenza, Domenico; Riccucci, Carlo
The process of reaction to system failures and security attacks is strongly influenced by its infrastructural, procedural and organisational settings. Analysis of reaction procedures and practices from different domains (Air Traffic Management, Response to Computer Security Incident, Response to emergencies, recovery in Chemical Process Industry) highlight three key requirements for this activity: smooth collaboration and coordination among responders, accurate monitoring and management of resources and ability to adapt pre-established reaction plans to the actual context. The SERENITY Reaction Mechanisms (SRM) is the subsystem of the SERENITY Run-time Framework aimed to provide SERENITY aware AmI settings (i.e. socio-technical systems with highly distributed dynamic services) with functionalities to implement applications specific reaction strategies. The SRM uses SERENITY Organisational S&D Patterns as run-time models to drive these three key functionalities.
Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres
In this paper, we have developed a novel methodology that takes into consideration multithreaded many-core designs to better utilize memory/processing resources and improve memory residence on tileable applications. It takes advantage of polyhedral analysis and transformation in the form of PLUTO, combined with a highly optimized finegrain tile runtime to exploit parallelism at all levels. The main contributions of this paper include the introduction of multi-hierarchical tiling techniques that increases intra tile parallelism; and a data-flow inspired runtime library that allows the expression of parallel tiles with an efficient synchronization registry. Our current implementation shows performance improvements on an Intelmore » Xeon Phi board up to 32.25% against instances produced by state-of-the-art compiler frameworks for selected stencil applications.« less
Monitoring Distributed Real-Time Systems: A Survey and Future Directions
NASA Technical Reports Server (NTRS)
Goodloe, Alwyn E.; Pike, Lee
2010-01-01
Runtime monitors have been proposed as a means to increase the reliability of safety-critical systems. In particular, this report addresses runtime monitors for distributed hard real-time systems. This class of systems has had little attention from the monitoring community. The need for monitors is shown by discussing examples of avionic systems failure. We survey related work in the field of runtime monitoring. Several potential monitoring architectures for distributed real-time systems are presented along with a discussion of how they might be used to monitor properties of interest.
Nonadiabatic holonomic quantum computation in decoherence-free subspaces.
Xu, G F; Zhang, J; Tong, D M; Sjöqvist, Erik; Kwek, L C
2012-10-26
Quantum computation that combines the coherence stabilization virtues of decoherence-free subspaces and the fault tolerance of geometric holonomic control is of great practical importance. Some schemes of adiabatic holonomic quantum computation in decoherence-free subspaces have been proposed in the past few years. However, nonadiabatic holonomic quantum computation in decoherence-free subspaces, which avoids a long run-time requirement but with all the robust advantages, remains an open problem. Here, we demonstrate how to realize nonadiabatic holonomic quantum computation in decoherence-free subspaces. By using only three neighboring physical qubits undergoing collective dephasing to encode one logical qubit, we realize a universal set of quantum gates.
Runtime Verification of Pacemaker Functionality Using Hierarchical Fuzzy Colored Petri-nets.
Majma, Negar; Babamir, Seyed Morteza; Monadjemi, Amirhassan
2017-02-01
Today, implanted medical devices are increasingly used for many patients and in case of diverse health problems. However, several runtime problems and errors are reported by the relevant organizations, even resulting in patient death. One of those devices is the pacemaker. The pacemaker is a device helping the patient to regulate the heartbeat by connecting to the cardiac vessels. This device is directed by its software, so any failure in this software causes a serious malfunction. Therefore, this study aims to a better way to monitor the device's software behavior to decrease the failure risk. Accordingly, we supervise the runtime function and status of the software. The software verification means examining limitations and needs of the system users by the system running software. In this paper, a method to verify the pacemaker software, based on the fuzzy function of the device, is presented. So, the function limitations of the device are identified and presented as fuzzy rules and then the device is verified based on the hierarchical Fuzzy Colored Petri-net (FCPN), which is formed considering the software limits. Regarding the experiences of using: 1) Fuzzy Petri-nets (FPN) to verify insulin pumps, 2) Colored Petri-nets (CPN) to verify the pacemaker and 3) To verify the pacemaker by a software agent with Petri-network based knowledge, which we gained during the previous studies, the runtime behavior of the pacemaker software is examined by HFCPN, in this paper. This is considered a developing step compared to the earlier work. HFCPN in this paper, compared to the FPN and CPN used in our previous studies reduces the complexity. By presenting the Petri-net (PN) in a hierarchical form, the verification runtime, decreased as 90.61% compared to the verification runtime in the earlier work. Since we need an inference engine in the runtime verification, we used the HFCPN to enhance the performance of the inference engine.
A Type System For Certified Runtime Type Analysis
2002-12-01
1999 ACM SIGPLAN International Conf. on Functional Pro- gramming (ICFP’99), pages 183–196. ACM Press, September 1999. [Min97] Yasuhiko Minamide. Full...lifting of type parameters. Technical report, RIMS, Kyoto University, 1997. [MMH96] Yasuhiko Minamide, Greg Morrisett, and Robert Harper. Typed
Block-Parallel Data Analysis with DIY2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morozov, Dmitriy; Peterka, Tom
DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Estimation Accuracy on Execution Time of Run-Time Tasks in a Heterogeneous Distributed Environment.
Liu, Qi; Cai, Weidong; Jin, Dandan; Shen, Jian; Fu, Zhangjie; Liu, Xiaodong; Linge, Nigel
2016-08-30
Distributed Computing has achieved tremendous development since cloud computing was proposed in 2006, and played a vital role promoting rapid growth of data collecting and analysis models, e.g., Internet of things, Cyber-Physical Systems, Big Data Analytics, etc. Hadoop has become a data convergence platform for sensor networks. As one of the core components, MapReduce facilitates allocating, processing and mining of collected large-scale data, where speculative execution strategies help solve straggler problems. However, there is still no efficient solution for accurate estimation on execution time of run-time tasks, which can affect task allocation and distribution in MapReduce. In this paper, task execution data have been collected and employed for the estimation. A two-phase regression (TPR) method is proposed to predict the finishing time of each task accurately. Detailed data of each task have drawn interests with detailed analysis report being made. According to the results, the prediction accuracy of concurrent tasks' execution time can be improved, in particular for some regular jobs.
Wasza, Jakob; Bauer, Sebastian; Hornegger, Joachim
2012-01-01
Over the last years, range imaging (RI) techniques have been proposed for patient positioning and respiration analysis in motion compensation. Yet, current RI based approaches for patient positioning employ rigid-body transformations, thus neglecting free-form deformations induced by respiratory motion. Furthermore, RI based respiration analysis relies on non-rigid registration techniques with run-times of several seconds. In this paper we propose a real-time framework based on RI to perform respiratory motion compensated positioning and non-rigid surface deformation estimation in a joint manner. The core of our method are pre-procedurally obtained 4-D shape priors that drive the intra-procedural alignment of the patient to the reference state, simultaneously yielding a rigid-body table transformation and a free-form deformation accounting for respiratory motion. We show that our method outperforms conventional alignment strategies by a factor of 3.0 and 2.3 in the rotation and translation accuracy, respectively. Using a GPU based implementation, we achieve run-times of 40 ms.
The SERENITY Runtime Framework
NASA Astrophysics Data System (ADS)
Crespo, Beatriz Gallego-Nicasio; Piñuela, Ana; Soria-Rodriguez, Pedro; Serrano, Daniel; Maña, Antonio
The SERENITY Runtime Framework (SRF) provides support for applications at runtime, by managing S&D Solutions and monitoring the systems’ context. The main functionality of the SRF, amongst others, is to provide S&D Solutions, by means of Executable Components, in response to applications security requirements. Runtime environment is defined in SRF through the S&D Library and Context Manager components. S&D Library is a local S&D Artefact repository, and stores S&D Classes, S&D Patterns and S&D Implementations. The Context Manager component is in charge of storing and management of the information used by the SRF to select the most appropriate S&D Pattern for a given scenario. The management of the execution of the Executable Component, as running realizations of the S&D Patterns, including instantiation, de-activation and control, as well as providing communication and monitoring mechanisms, besides the recovery and reconfiguration aspects, complete the list of tasks performed by the SRF.
ATDM LANL FleCSI: Topology and Execution Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bergen, Benjamin Karl
FleCSI is a compile-time configurable C++ framework designed to support multi-physics application development. As such, FleCSI attempts to provide a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. This means that FleCSI is potentially useful to many different ECP projects. Current support includes multidimensional mesh topology, mesh geometry, and mesh adjacency information, n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures (to identify data dependencies between distributed-memory address spaces). FleCSI introduces a functional programming model with control, execution, and data abstractionsmore » that are consistent with state-of-the-art task-based runtimes such as Legion and Charm++. The model also provides support for fine-grained, data-parallel execution with backend support for runtimes such as OpenMP and C++17. The FleCSI abstraction layer provides the developer with insulation from the underlying runtimes, while allowing support for multiple runtime systems, including conventional models like asynchronous MPI. The intent is to give developers a concrete set of user-friendly programming tools that can be used now, while allowing flexibility in choosing runtime implementations and optimizations that can be applied to architectures and runtimes that arise in the future. This project is essential to the ECP Ristra Next-Generation Code project, part of ASC ATDM, because it provides a hierarchically parallel programming model that is consistent with the design of modern system architectures, but which allows for the straightforward expression of algorithmic parallelism in a portably performant manner.« less
Convergence Acceleration and Documentation of CFD Codes for Turbomachinery Applications
NASA Technical Reports Server (NTRS)
Marquart, Jed E.
2005-01-01
The development and analysis of turbomachinery components for industrial and aerospace applications has been greatly enhanced in recent years through the advent of computational fluid dynamics (CFD) codes and techniques. Although the use of this technology has greatly reduced the time required to perform analysis and design, there still remains much room for improvement in the process. In particular, there is a steep learning curve associated with most turbomachinery CFD codes, and the computation times need to be reduced in order to facilitate their integration into standard work processes. Two turbomachinery codes have recently been developed by Dr. Daniel Dorney (MSFC) and Dr. Douglas Sondak (Boston University). These codes are entitled Aardvark (for 2-D and quasi 3-D simulations) and Phantom (for 3-D simulations). The codes utilize the General Equation Set (GES), structured grid methodology, and overset O- and H-grids. The codes have been used with success by Drs. Dorney and Sondak, as well as others within the turbomachinery community, to analyze engine components and other geometries. One of the primary objectives of this study was to establish a set of parametric input values which will enhance convergence rates for steady state simulations, as well as reduce the runtime required for unsteady cases. The goal is to reduce the turnaround time for CFD simulations, thus permitting more design parametrics to be run within a given time period. In addition, other code enhancements to reduce runtimes were investigated and implemented. The other primary goal of the study was to develop enhanced users manuals for Aardvark and Phantom. These manuals are intended to answer most questions for new users, as well as provide valuable detailed information for the experienced user. The existence of detailed user s manuals will enable new users to become proficient with the codes, as well as reducing the dependency of new users on the code authors. In order to achieve the objectives listed, the following tasks were accomplished: 1) Parametric Study Of Preconditioning Parameters And Other Code Inputs; 2) Code Modifications To Reduce Runtimes; 3) Investigation Of Compiler Options To Reduce Code Runtime; and 4) Development/Enhancement of Users Manuals for Aardvark and Phantom
Runtime Speculative Software-Only Fault Tolerance
2012-06-01
reliability of RSFT, a in-depth analysis on its window of vulnerability is also discussed and measured via simulated fault injection. The performance...propagation of faults through the entire program. For optimal performance, these techniques have to use herotic alias analysis to find the minimum set of...affect program output. No program source code or alias analysis is needed to analyze the fault propagation ahead of time. 2.3 Limitations of Existing
Improved Air Combat Awareness; with AESA and Next-Generation Signal Processing
2002-09-01
competence network Building techniques Software development environment Communication Computer architecture Modeling Real-time programming Radar...memory access, skewed load and store, 3.2 GB/s BW • Performance: 400 MFLOPS Runtime environment Custom runtime routines Driver routines Hardware
Runtime support for data parallel tasks
NASA Technical Reports Server (NTRS)
Haines, Matthew; Hess, Bryan; Mehrotra, Piyush; Vanrosendale, John; Zima, Hans
1994-01-01
We have recently introduced a set of Fortran language extensions that allow for integrated support of task and data parallelism, and provide for shared data abstractions (SDA's) as a method for communications and synchronization among these tasks. In this paper we discuss the design and implementation issues of the runtime system necessary to support these extensions, and discuss the underlying requirements for such a system. To test the feasibility of this approach, we implement a prototype of the runtime system and use this to support an abstract multidisciplinary optimization (MDO) problem for aircraft design. We give initial results and discuss future plans.
Goudey, Benjamin; Abedini, Mani; Hopper, John L; Inouye, Michael; Makalic, Enes; Schmidt, Daniel F; Wagner, John; Zhou, Zeyu; Zobel, Justin; Reumann, Matthias
2015-01-01
Genome-wide association studies (GWAS) are a common approach for systematic discovery of single nucleotide polymorphisms (SNPs) which are associated with a given disease. Univariate analysis approaches commonly employed may miss important SNP associations that only appear through multivariate analysis in complex diseases. However, multivariate SNP analysis is currently limited by its inherent computational complexity. In this work, we present a computational framework that harnesses supercomputers. Based on our results, we estimate a three-way interaction analysis on 1.1 million SNP GWAS data requiring over 5.8 years on the full "Avoca" IBM Blue Gene/Q installation at the Victorian Life Sciences Computation Initiative. This is hundreds of times faster than estimates for other CPU based methods and four times faster than runtimes estimated for GPU methods, indicating how the improvement in the level of hardware applied to interaction analysis may alter the types of analysis that can be performed. Furthermore, the same analysis would take under 3 months on the currently largest IBM Blue Gene/Q supercomputer "Sequoia" at the Lawrence Livermore National Laboratory assuming linear scaling is maintained as our results suggest. Given that the implementation used in this study can be further optimised, this runtime means it is becoming feasible to carry out exhaustive analysis of higher order interaction studies on large modern GWAS.
Detecting Runtime Anomalies in AJAX Applications through Trace Analysis
2011-08-10
statements by adding the instrumentation to the GWT UI classes, leaving the user code untouched. Some content management frameworks such as Drupal [12...Google web toolkit.” http://code.google.com/webtoolkit/. [12] “Form generation – drupal api.” http://api.drupal.org/api/group/form_api/6. 9
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gong, Zhenhuan; Boyuka, David; Zou, X
Download Citation Email Print Request Permissions Save to Project The size and scope of cutting-edge scientific simulations are growing much faster than the I/O and storage capabilities of their run-time environments. The growing gap is exacerbated by exploratory, data-intensive analytics, such as querying simulation data with multivariate, spatio-temporal constraints, which induces heterogeneous access patterns that stress the performance of the underlying storage system. Previous work addresses data layout and indexing techniques to improve query performance for a single access pattern, which is not sufficient for complex analytics jobs. We present PARLO a parallel run-time layout optimization framework, to achieve multi-levelmore » data layout optimization for scientific applications at run-time before data is written to storage. The layout schemes optimize for heterogeneous access patterns with user-specified priorities. PARLO is integrated with ADIOS, a high-performance parallel I/O middleware for large-scale HPC applications, to achieve user-transparent, light-weight layout optimization for scientific datasets. It offers simple XML-based configuration for users to achieve flexible layout optimization without the need to modify or recompile application codes. Experiments show that PARLO improves performance by 2 to 26 times for queries with heterogeneous access patterns compared to state-of-the-art scientific database management systems. Compared to traditional post-processing approaches, its underlying run-time layout optimization achieves a 56% savings in processing time and a reduction in storage overhead of up to 50%. PARLO also exhibits a low run-time resource requirement, while also limiting the performance impact on running applications to a reasonable level.« less
Global Deployment Anaylsis System Algorithm Description (With Updates)
1998-09-01
Global Deployment Analysis System Algorithm Description (with Updates) By Noetics , Inc. For U.S. Army Concepts Analysis Agency Contract...t "O -Q £5.3 Q 20000224 107 aQU’no-bi-o^f r This Algorithm Description for the Global Deployment Analysis System (GDAS) was prepared by Noetics ...support for Paradox Runtime will be provided by the GDAS developers, CAA and Noetics Inc., and not by Borland International. GDAS for Windows has
Specification-based Error Recovery: Theory, Algorithms, and Usability
2013-02-01
transmuting the specification to an implementation at run-time and reducing the performance overhead. A suite of techniques and tools were designed...in the specification, thereby transmuting the specification to an implementation at run-time and reducing the perfor- mance overhead. A suite of
The SERENITY Runtime Monitoring Framework
NASA Astrophysics Data System (ADS)
Spanoudakis, George; Kloukinas, Christos; Mahbub, Khaled
This chapter describes SERENITY’s approach to runtime monitoring and the framework that has been developed to support it. Runtime monitoring is required in SERENITY in order to check for violations of security and dependability properties which are necessary for the correct operation of the security and dependability solutions that are available from the SERENITY framework. This chapter discusses how such properties are specified and monitored. The chapter focuses on the activation and execution of monitoring activities using S&D Patterns and the actions that may be undertaken following the detection of property violations. The approach is demonstrated in reference to one of the industrial case studies of the SERENITY project.
NASA Astrophysics Data System (ADS)
Wahl, N.; Hennig, P.; Wieser, H. P.; Bangert, M.
2017-07-01
The sensitivity of intensity-modulated proton therapy (IMPT) treatment plans to uncertainties can be quantified and mitigated with robust/min-max and stochastic/probabilistic treatment analysis and optimization techniques. Those methods usually rely on sparse random, importance, or worst-case sampling. Inevitably, this imposes a trade-off between computational speed and accuracy of the uncertainty propagation. Here, we investigate analytical probabilistic modeling (APM) as an alternative for uncertainty propagation and minimization in IMPT that does not rely on scenario sampling. APM propagates probability distributions over range and setup uncertainties via a Gaussian pencil-beam approximation into moments of the probability distributions over the resulting dose in closed form. It supports arbitrary correlation models and allows for efficient incorporation of fractionation effects regarding random and systematic errors. We evaluate the trade-off between run-time and accuracy of APM uncertainty computations on three patient datasets. Results are compared against reference computations facilitating importance and random sampling. Two approximation techniques to accelerate uncertainty propagation and minimization based on probabilistic treatment plan optimization are presented. Runtimes are measured on CPU and GPU platforms, dosimetric accuracy is quantified in comparison to a sampling-based benchmark (5000 random samples). APM accurately propagates range and setup uncertainties into dose uncertainties at competitive run-times (GPU ≤slant {5} min). The resulting standard deviation (expectation value) of dose show average global γ{3% / {3}~mm} pass rates between 94.2% and 99.9% (98.4% and 100.0%). All investigated importance sampling strategies provided less accuracy at higher run-times considering only a single fraction. Considering fractionation, APM uncertainty propagation and treatment plan optimization was proven to be possible at constant time complexity, while run-times of sampling-based computations are linear in the number of fractions. Using sum sampling within APM, uncertainty propagation can only be accelerated at the cost of reduced accuracy in variance calculations. For probabilistic plan optimization, we were able to approximate the necessary pre-computations within seconds, yielding treatment plans of similar quality as gained from exact uncertainty propagation. APM is suited to enhance the trade-off between speed and accuracy in uncertainty propagation and probabilistic treatment plan optimization, especially in the context of fractionation. This brings fully-fledged APM computations within reach of clinical application.
Wahl, N; Hennig, P; Wieser, H P; Bangert, M
2017-06-26
The sensitivity of intensity-modulated proton therapy (IMPT) treatment plans to uncertainties can be quantified and mitigated with robust/min-max and stochastic/probabilistic treatment analysis and optimization techniques. Those methods usually rely on sparse random, importance, or worst-case sampling. Inevitably, this imposes a trade-off between computational speed and accuracy of the uncertainty propagation. Here, we investigate analytical probabilistic modeling (APM) as an alternative for uncertainty propagation and minimization in IMPT that does not rely on scenario sampling. APM propagates probability distributions over range and setup uncertainties via a Gaussian pencil-beam approximation into moments of the probability distributions over the resulting dose in closed form. It supports arbitrary correlation models and allows for efficient incorporation of fractionation effects regarding random and systematic errors. We evaluate the trade-off between run-time and accuracy of APM uncertainty computations on three patient datasets. Results are compared against reference computations facilitating importance and random sampling. Two approximation techniques to accelerate uncertainty propagation and minimization based on probabilistic treatment plan optimization are presented. Runtimes are measured on CPU and GPU platforms, dosimetric accuracy is quantified in comparison to a sampling-based benchmark (5000 random samples). APM accurately propagates range and setup uncertainties into dose uncertainties at competitive run-times (GPU [Formula: see text] min). The resulting standard deviation (expectation value) of dose show average global [Formula: see text] pass rates between 94.2% and 99.9% (98.4% and 100.0%). All investigated importance sampling strategies provided less accuracy at higher run-times considering only a single fraction. Considering fractionation, APM uncertainty propagation and treatment plan optimization was proven to be possible at constant time complexity, while run-times of sampling-based computations are linear in the number of fractions. Using sum sampling within APM, uncertainty propagation can only be accelerated at the cost of reduced accuracy in variance calculations. For probabilistic plan optimization, we were able to approximate the necessary pre-computations within seconds, yielding treatment plans of similar quality as gained from exact uncertainty propagation. APM is suited to enhance the trade-off between speed and accuracy in uncertainty propagation and probabilistic treatment plan optimization, especially in the context of fractionation. This brings fully-fledged APM computations within reach of clinical application.
A Compiler and Run-time System for Network Programming Languages
2012-01-01
A Compiler and Run-time System for Network Programming Languages Christopher Monsanto Princeton University Nate Foster Cornell University Rob...Foster, R. Harrison, M. Freedman, C. Monsanto , J. Rexford, A. Story, and D. Walker. Frenetic: A network programming language. In ICFP, Sep 2011. [10] A
A Simplified Method for Implementing Run-Time Polymorphism in Fortran95
Decyk, Viktor K.; Norton, Charles D.
2004-01-01
This paper discusses a simplified technique for software emulation of inheritance and run-time polymorphism in Fortran95. This technique involves retaining the same type throughout an inheritance hierarchy, so that only functions which are modified in a derived class need to be implemented.
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-06-01
We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-01-01
We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Optimizing ROOT’s Performance Using C++ Modules
NASA Astrophysics Data System (ADS)
Vassilev, Vassil
2017-10-01
ROOT comes with a C++ compliant interpreter cling. Cling needs to understand the content of the libraries in order to interact with them. Exposing the full shared library descriptors to the interpreter at runtime translates into increased memory footprint. ROOT’s exploratory programming concepts allow implicit and explicit runtime shared library loading. It requires the interpreter to load the library descriptor. Re-parsing of descriptors’ content has a noticeable effect on the runtime performance. Present state-of-art lazy parsing technique brings the runtime performance to reasonable levels but proves to be fragile and can introduce correctness issues. An elegant solution is to load information from the descriptor lazily and in a non-recursive way. The LLVM community advances its C++ Modules technology providing an io-efficient, on-disk representation capable to reduce build times and peak memory usage. The feature is standardized as a C++ technical specification. C++ Modules are a flexible concept, which can be employed to match CMS and other experiments’ requirement for ROOT: to optimize both runtime memory usage and performance. Cling technically “inherits” the feature, however tweaking it to ROOT scale and beyond is a complex endeavor. The paper discusses the status of the C++ Modules in the context of ROOT, supported by few preliminary performance results. It shows a step-by-step migration plan and describes potential challenges which could appear.
A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics.
Halloran, John T; Rocke, David M
2018-05-04
Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide-spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l 2 -SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l 2 -SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l 2 -SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade .
Runtime visualization of the human arterial tree.
Insley, Joseph A; Papka, Michael E; Dong, Suchuan; Karniadakis, George; Karonis, Nicholas T
2007-01-01
Large-scale simulation codes typically execute for extended periods of time and often on distributed computational resources. Because these simulations can run for hours, or even days, scientists like to get feedback about the state of the computation and the validity of its results as it runs. It is also important that these capabilities be made available with little impact on the performance and stability of the simulation. Visualizing and exploring data in the early stages of the simulation can help scientists identify problems early, potentially avoiding a situation where a simulation runs for several days, only to discover that an error with an input parameter caused both time and resources to be wasted. We describe an application that aids in the monitoring and analysis of a simulation of the human arterial tree. The application provides researchers with high-level feedback about the state of the ongoing simulation and enables them to investigate particular areas of interest in greater detail. The application also offers monitoring information about the amount of data produced and data transfer performance among the various components of the application.
Ahn, J M; Lee, J H; Choi, S W; Kim, W E; Omn, K S; Park, S K; Kim, W G; Roh, J R; Min, B G
1998-03-01
The moving actuator type total artificial heart (TAH) developed in the Seoul National University has numerous design improvements based upon the digital signal processor (DSP). These improvements include the implantability of all electronics, an automatic control algorithm, and extension of the battery run-time in connection with an amorphous silicon solar system (SS). The implantable electronics consist of the motor drive, main processor, intelligent Li ion battery management (LIBM) based upon the DSP, telemetry system, and transcutaneous energy transmission (TET) system. Major changes in the implantable electronics include decreasing the temperature rise by over 21 degrees C on the motor drive, volume reduction (40 x 55 x 33 mm, 7 cell assembly) of the battery pack using a Li ion (3.6 V/cell, 900 mA.h), and improvement of the battery run-time (over 40 min) while providing the cardiac output (CO) of 5 L/min at 100 mm Hg afterload when the external battery for testing is connected with the SS (2.5 W, 192.192, 1 kg) for the external battery recharge or the partial TAH drive. The phase locked loop (PLL) based telemetry system was implemented to improve stability and the error correction DSP algorithm programmed to achieve high accuracy. A field focused light emitting diode (LED) was used to obtain low light scattering along the propagation path, similar to the optical property of the laser and miniature sized, mounted on the pancake type TET coils. The TET operating resonance frequency was self tuned in a range of 360 to 410 kHz to provide enough power even at high afterloads. An automatic cardiac output regulation algorithm was developed based on interventricular pressure analysis and carried out in several animal experiments successfully. All electronics have been evaluated in vitro and in vivo and prepared for implantation of the TAH. Substantial progress has been made in designing a completely implantable TAH at the preclinical stage.
Estimation Accuracy on Execution Time of Run-Time Tasks in a Heterogeneous Distributed Environment
Liu, Qi; Cai, Weidong; Jin, Dandan; Shen, Jian; Fu, Zhangjie; Liu, Xiaodong; Linge, Nigel
2016-01-01
Distributed Computing has achieved tremendous development since cloud computing was proposed in 2006, and played a vital role promoting rapid growth of data collecting and analysis models, e.g., Internet of things, Cyber-Physical Systems, Big Data Analytics, etc. Hadoop has become a data convergence platform for sensor networks. As one of the core components, MapReduce facilitates allocating, processing and mining of collected large-scale data, where speculative execution strategies help solve straggler problems. However, there is still no efficient solution for accurate estimation on execution time of run-time tasks, which can affect task allocation and distribution in MapReduce. In this paper, task execution data have been collected and employed for the estimation. A two-phase regression (TPR) method is proposed to predict the finishing time of each task accurately. Detailed data of each task have drawn interests with detailed analysis report being made. According to the results, the prediction accuracy of concurrent tasks’ execution time can be improved, in particular for some regular jobs. PMID:27589753
DOE Office of Scientific and Technical Information (OSTI.GOV)
Janjusic, Tommy; Kartsaklis, Christos
Application analysis is facilitated through a number of program profiling tools. The tools vary in their complexity, ease of deployment, design, and profiling detail. Specifically, understand- ing, analyzing, and optimizing is of particular importance for scientific applications where minor changes in code paths and data-structure layout can have profound effects. Understanding how intricate data-structures are accessed and how a given memory system responds is a complex task. In this paper we describe a trace profiling tool, Glprof, specifically aimed to lessen the burden of the programmer to pin-point heavily involved data-structures during an application's run-time, and understand data-structure run-time usage.more » Moreover, we showcase the tool's modularity using additional cache simulation components. We elaborate on the tool's design, and features. Finally we demonstrate the application of our tool in the context of Spec bench- marks using the Glprof profiler and two concurrently running cache simulators, PPC440 and AMD Interlagos.« less
NASA Technical Reports Server (NTRS)
Lee, Nathaniel; Welch, Bryan W.
2018-01-01
NASA's SCENIC project aims to simplify and reduce the cost of space mission planning by replicating the analysis capabilities of commercially licensed software which are integrated with relevant analysis parameters specific to SCaN assets and SCaN supported user missions. SCENIC differs from current tools that perform similar analyses in that it 1) does not require any licensing fees, 2) will provide an all-in-one package for various analysis capabilities that normally requires add-ons or multiple tools to complete. As part of SCENIC's capabilities, the ITACA network loading analysis tool will be responsible for assessing the loading on a given network architecture and generating a network service schedule. ITACA will allow users to evaluate the quality of service of a given network architecture and determine whether or not the architecture will satisfy the mission's requirements. ITACA is currently under development, and the following improvements were made during the fall of 2017: optimization of runtime, augmentation of network asset pre-service configuration time, augmentation of Brent's method of root finding, augmentation of network asset FOV restrictions, augmentation of mission lifetimes, and the integration of a SCaN link budget calculation tool. The improvements resulted in (a) 25% reduction in runtime, (b) more accurate contact window predictions when compared to STK(Registered Trademark) contact window predictions, and (c) increased fidelity through the use of specific SCaN asset parameters.
An Overview of the Runtime Verification Tool Java PathExplorer
NASA Technical Reports Server (NTRS)
Havelund, Klaus; Rosu, Grigore; Clancy, Daniel (Technical Monitor)
2002-01-01
We present an overview of the Java PathExplorer runtime verification tool, in short referred to as JPAX. JPAX can monitor the execution of a Java program and check that it conforms with a set of user provided properties formulated in temporal logic. JPAX can in addition analyze the program for concurrency errors such as deadlocks and data races. The concurrency analysis requires no user provided specification. The tool facilitates automated instrumentation of a program's bytecode, which when executed will emit an event stream, the execution trace, to an observer. The observer dispatches the incoming event stream to a set of observer processes, each performing a specialized analysis, such as the temporal logic verification, the deadlock analysis and the data race analysis. Temporal logic specifications can be formulated by the user in the Maude rewriting logic, where Maude is a high-speed rewriting system for equational logic, but here extended with executable temporal logic. The Maude rewriting engine is then activated as an event driven monitoring process. Alternatively, temporal specifications can be translated into efficient automata, which check the event stream. JPAX can be used during program testing to gain increased information about program executions, and can potentially furthermore be applied during operation to survey safety critical systems.
NASA Technical Reports Server (NTRS)
Rogers, Pat
1992-01-01
The Ada Runtime Environment Working Group has, since 1985, developed and published the Catalog of Interface Features and Options (CFIO) for Ada runtime environments. These interfaces, expressed in legal Ada, provide 'hooks' into the runtime system to export both functionality and enhanced performance beyond that of 'vanilla' Ada implementations. Such enhancements include high- and low-level scheduling control, asynchronous communications facilities, predictable storage management facilities, and fast interrupt response. CIFO 3.0 represents the latest release, which incorporates the efforts of the European real time community as well as new interfaces and expansions of previous catalog entries. This presentation will give both an overview of the Catalog's contents and an 'insider's' view of the Catalog as a whole.
Towards Just-In-Time Partial Evaluation of Prolog
NASA Astrophysics Data System (ADS)
Bolz, Carl Friedrich; Leuschel, Michael; Rigo, Armin
We introduce a just-in-time specializer for Prolog. Just-in-time specialization attempts to unify of the concepts and benefits of partial evaluation (PE) and just-in-time (JIT) compilation. It is a variant of PE that occurs purely at runtime, which lazily generates residual code and is constantly driven by runtime feedback.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erez, Mattan; Yelick, Katherine; Sarkar, Vivek
The Dynamic, Exascale Global Address Space programming environment (DEGAS) project will develop the next generation of programming models and runtime systems to meet the challenges of Exascale computing. Our approach is to provide an efficient and scalable programming model that can be adapted to application needs through the use of dynamic runtime features and domain-specific languages for computational kernels. We address the following technical challenges: Programmability: Rich set of programming constructs based on a Hierarchical Partitioned Global Address Space (HPGAS) model, demonstrated in UPC++. Scalability: Hierarchical locality control, lightweight communication (extended GASNet), and ef- ficient synchronization mechanisms (Phasers). Performance Portability:more » Just-in-time specialization (SEJITS) for generating hardware-specific code and scheduling libraries for domain-specific adaptive runtimes (Habanero). Energy Efficiency: Communication-optimal code generation to optimize energy efficiency by re- ducing data movement. Resilience: Containment Domains for flexible, domain-specific resilience, using state capture mechanisms and lightweight, asynchronous recovery mechanisms. Interoperability: Runtime and language interoperability with MPI and OpenMP to encourage broad adoption.« less
AMRZone: A Runtime AMR Data Sharing Framework For Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Wenzhao; Tang, Houjun; Harenberg, Steven
Frameworks that facilitate runtime data sharing across multiple applications are of great importance for scientific data analytics. Although existing frameworks work well over uniform mesh data, they can not effectively handle adaptive mesh refinement (AMR) data. Among the challenges to construct an AMR-capable framework include: (1) designing an architecture that facilitates online AMR data management; (2) achieving a load-balanced AMR data distribution for the data staging space at runtime; and (3) building an effective online index to support the unique spatial data retrieval requirements for AMR data. Towards addressing these challenges to support runtime AMR data sharing across scientific applications,more » we present the AMRZone framework. Experiments over real-world AMR datasets demonstrate AMRZone's effectiveness at achieving a balanced workload distribution, reading/writing large-scale datasets with thousands of parallel processes, and satisfying queries with spatial constraints. Moreover, AMRZone's performance and scalability are even comparable with existing state-of-the-art work when tested over uniform mesh data with up to 16384 cores; in the best case, our framework achieves a 46% performance improvement.« less
Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications.
Cabezas, Javier; Gelado, Isaac; Stone, John E; Navarro, Nacho; Kirk, David B; Hwu, Wen-Mei
2015-05-01
Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice.
Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications
Cabezas, Javier; Gelado, Isaac; Stone, John E.; Navarro, Nacho; Kirk, David B.; Hwu, Wen-mei
2014-01-01
Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice. PMID:26180487
Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming.
Wang, Haizhou; Song, Mingzhou
2011-12-01
The heuristic k -means algorithm, widely used for cluster analysis, does not guarantee optimality. We developed a dynamic programming algorithm for optimal one-dimensional clustering. The algorithm is implemented as an R package called Ckmeans.1d.dp . We demonstrate its advantage in optimality and runtime over the standard iterative k -means algorithm.
2011-01-01
A simple, sensitive and accurate stability-indicating HPLC method has been developed and validated for determination of varenicline (VRC) in its bulk form and pharmaceutical tablets. Chromatographic separation was achieved on a Zorbax Eclipse XDB-C8 column (150 mm × 4.6 mm i.d., particle size 5 μm, maintained at ambient temperature) by a mobile phase consisted of acetonitrile and 50 mM potassium dihydrogen phosphate buffer (10:90, v/v) with apparent pH of 3.5 ± 0.1 and a flow rate of 1.0 ml/min. The detection wavelength was set at 235 nm. VRC was subjected to different accelerated stress conditions. The degradation products, when any, were well resolved from the pure drug with significantly different retention time values. The method was linear (r = 0.9998) at a concentration range of 2 - 14 μg/ml. The limit of detection and limit of quantitation were 0.38 and 1.11 μg/ml, respectively. The intra- and inter-assay precisions were satisfactory; the relative standard deviations did not exceed 2%. The accuracy of the method was proved; the mean recovery of VRC was 100.10 ± 1.08%. The proposed method has high throughput as the analysis involved short run-time (~ 6 min). The method met the ICH/FDA regulatory requirements. The proposed method was successfully applied for the determination of VRC in bulk and tablets with acceptable accuracy and precisions; the label claim percentages were 99.65 ± 0.32%. The results demonstrated that the method would have a great value when applied in quality control and stability studies for VRC. PMID:21672253
Secure and Resilient Functional Modeling for Navy Cyber-Physical Systems
2017-05-24
Functional Modeling Compiler (SCCT) FM Compiler and Key Performance Indicators (KPI) May 2018 Pending. Model Management Backbone (SCCT) MMB Demonstration...implement the agent- based distributed runtime. - KPIs for single/multicore controllers and temporal/spatial domains. - Integration of the model management ...Distributed Runtime (UCI) Not started. Model Management Backbone (SCCT) Not started. Siemens Corporation Corporate Technology Unrestricted
A manual for PARTI runtime primitives
NASA Technical Reports Server (NTRS)
Berryman, Harry; Saltz, Joel
1990-01-01
Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
An enhanced Ada run-time system for real-time embedded processors
NASA Technical Reports Server (NTRS)
Sims, J. T.
1991-01-01
An enhanced Ada run-time system has been developed to support real-time embedded processor applications. The primary focus of this development effort has been on the tasking system and the memory management facilities of the run-time system. The tasking system has been extended to support efficient and precise periodic task execution as required for control applications. Event-driven task execution providing a means of task-asynchronous control and communication among Ada tasks is supported in this system. Inter-task control is even provided among tasks distributed on separate physical processors. The memory management system has been enhanced to provide object allocation and protected access support for memory shared between disjoint processors, each of which is executing a distinct Ada program.
Runtime Verification in Context : Can Optimizing Error Detection Improve Fault Diagnosis
NASA Technical Reports Server (NTRS)
Dwyer, Matthew B.; Purandare, Rahul; Person, Suzette
2010-01-01
Runtime verification has primarily been developed and evaluated as a means of enriching the software testing process. While many researchers have pointed to its potential applicability in online approaches to software fault tolerance, there has been a dearth of work exploring the details of how that might be accomplished. In this paper, we describe how a component-oriented approach to software health management exposes the connections between program execution, error detection, fault diagnosis, and recovery. We identify both research challenges and opportunities in exploiting those connections. Specifically, we describe how recent approaches to reducing the overhead of runtime monitoring aimed at error detection might be adapted to reduce the overhead and improve the effectiveness of fault diagnosis.
Toward real-time performance benchmarks for Ada
NASA Technical Reports Server (NTRS)
Clapp, Russell M.; Duchesneau, Louis; Volz, Richard A.; Mudge, Trevor N.; Schultze, Timothy
1986-01-01
The issue of real-time performance measurements for the Ada programming language through the use of benchmarks is addressed. First, the Ada notion of time is examined and a set of basic measurement techniques are developed. Then a set of Ada language features believed to be important for real-time performance are presented and specific measurement methods discussed. In addition, other important time related features which are not explicitly part of the language but are part of the run-time related features which are not explicitly part of the language but are part of the run-time system are also identified and measurement techniques developed. The measurement techniques are applied to the language and run-time system features and the results are presented.
Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.
Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan
2017-01-01
Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.
BFL: a node and edge betweenness based fast layout algorithm for large scale networks
Hashimoto, Tatsunori B; Nagasaki, Masao; Kojima, Kaname; Miyano, Satoru
2009-01-01
Background Network visualization would serve as a useful first step for analysis. However, current graph layout algorithms for biological pathways are insensitive to biologically important information, e.g. subcellular localization, biological node and graph attributes, or/and not available for large scale networks, e.g. more than 10000 elements. Results To overcome these problems, we propose the use of a biologically important graph metric, betweenness, a measure of network flow. This metric is highly correlated with many biological phenomena such as lethality and clusters. We devise a new fast parallel algorithm calculating betweenness to minimize the preprocessing cost. Using this metric, we also invent a node and edge betweenness based fast layout algorithm (BFL). BFL places the high-betweenness nodes to optimal positions and allows the low-betweenness nodes to reach suboptimal positions. Furthermore, BFL reduces the runtime by combining a sequential insertion algorim with betweenness. For a graph with n nodes, this approach reduces the expected runtime of the algorithm to O(n2) when considering edge crossings, and to O(n log n) when considering only density and edge lengths. Conclusion Our BFL algorithm is compared against fast graph layout algorithms and approaches requiring intensive optimizations. For gene networks, we show that our algorithm is faster than all layout algorithms tested while providing readability on par with intensive optimization algorithms. We achieve a 1.4 second runtime for a graph with 4000 nodes and 12000 edges on a standard desktop computer. PMID:19146673
Baig, Hasan; Madsen, Jan
2017-01-15
Simulation and behavioral analysis of genetic circuits is a standard approach of functional verification prior to their physical implementation. Many software tools have been developed to perform in silico analysis for this purpose, but none of them allow users to interact with the model during runtime. The runtime interaction gives the user a feeling of being in the lab performing a real world experiment. In this work, we present a user-friendly software tool named D-VASim (Dynamic Virtual Analyzer and Simulator), which provides a virtual laboratory environment to simulate and analyze the behavior of genetic logic circuit models represented in an SBML (Systems Biology Markup Language). Hence, SBML models developed in other software environments can be analyzed and simulated in D-VASim. D-VASim offers deterministic as well as stochastic simulation; and differs from other software tools by being able to extract and validate the Boolean logic from the SBML model. D-VASim is also capable of analyzing the threshold value and propagation delay of a genetic circuit model. D-VASim is available for Windows and Mac OS and can be downloaded from bda.compute.dtu.dk/downloads/. haba@dtu.dk, jama@dtu.dk. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Performance Analysis, Modeling and Scaling of HPC Applications and Tools
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhatele, Abhinav
2016-01-13
E cient use of supercomputers at DOE centers is vital for maximizing system throughput, mini- mizing energy costs and enabling science breakthroughs faster. This requires complementary e orts along several directions to optimize the performance of scienti c simulation codes and the under- lying runtimes and software stacks. This in turn requires providing scalable performance analysis tools and modeling techniques that can provide feedback to physicists and computer scientists developing the simulation codes and runtimes respectively. The PAMS project is using time allocations on supercomputers at ALCF, NERSC and OLCF to further the goals described above by performing research alongmore » the following fronts: 1. Scaling Study of HPC applications; 2. Evaluation of Programming Models; 3. Hardening of Performance Tools; 4. Performance Modeling of Irregular Codes; and 5. Statistical Analysis of Historical Performance Data. We are a team of computer and computational scientists funded by both DOE/NNSA and DOE/ ASCR programs such as ECRP, XStack (Traleika Glacier, PIPER), ExaOSR (ARGO), SDMAV II (MONA) and PSAAP II (XPACC). This allocation will enable us to study big data issues when analyzing performance on leadership computing class systems and to assist the HPC community in making the most e ective use of these resources.« less
Usability of a Runtime Environment for the Use of IMS Learning Design in Mixed Mode Higher Education
ERIC Educational Resources Information Center
Klebl, Michael
2006-01-01
Starting from the first public draft of IMS Learning Design in November 2002, a research project at the Catholic University Eichstaett-Ingolstadt in Germany was dedicated to the conceptual examination and empirical review of IMS Learning Design Level A. A prototypical runtime environment called "lab005" was developed. It was built based…
National Information Exchange Model (NIEM): DoD Adoption and Implications for C2 (Briefing Charts)
2014-06-18
Application Data Consumers Information Exchange Package ( IEP ) the data exchanged at runtime Data Producers IES defines Information Exchange...Specification (IES) build-time description of the data to be exchanged Developers System / Application System / Application IEP | 9 | Data...Exchange Package ( IEP ) the data exchanged at runtime Data Producers System / Application System / Application IEP Consumer’s Understanding
Implementation of a Learning Design Run-Time Environment for the .LRN Learning Management System
ERIC Educational Resources Information Center
del Cid, Jose Pablo Escobedo; de la Fuente Valentin, Luis; Gutierrez, Sergio; Pardo, Abelardo; Kloos, Carlos Delgado
2007-01-01
The IMS Learning Design specification aims at capturing the complete learning flow of courses, without being restricted to a particular pedagogical model. Such flow description for a course, called a Unit of Learning, must be able to be reproduced in different systems using a so called run-time environment. In the last few years there has been…
A manual for PARTI runtime primitives, revision 1
NASA Technical Reports Server (NTRS)
Das, Raja; Saltz, Joel; Berryman, Harry
1991-01-01
Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Song, Shuaiwen; Fu, Haohuan
2014-08-16
Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. To address the challenges above, we designed and implemented MICSVM, a highly efficient parallel SVM for x86 based multi-core and many core architectures,more » such as the Intel Ivy Bridge CPUs and Intel Xeon Phi coprocessor (MIC).« less
Implementation of and Ada real-time executive: A case study
NASA Technical Reports Server (NTRS)
Laird, James D.; Burton, Bruce A.; Koppes, Mary R.
1986-01-01
Current Ada language implementations and runtime environments are immature, unproven and are a key risk area for real-time embedded computer system (ECS). A test-case environment is provided in which the concerns of the real-time, ECS community are addressed. A priority driven executive is selected to be implemented in the Ada programming language. The model selected is representative of real-time executives tailored for embedded systems used missile, spacecraft, and avionics applications. An Ada-based design methodology is utilized, and two designs are considered. The first of these designs requires the use of vendor supplied runtime and tasking support. An alternative high-level design is also considered for an implementation requiring no vendor supplied runtime or tasking support. The former approach is carried through to implementation.
Static Extraction and Conformance Analysis of Hierarchical Runtime Architectural Structure
2010-05-14
Example: CryptoDB 253 Architectural Component Java Class Note CustomerManager cryptodb.test.CustomerManager AKA “ crypto consumer” CustomerManager.Receipts...PROVIDERS PLAIN KEYID KEYMANAGEMENT KEYSTORAGE CRYPTO (+) (+) (+) (+) (+) (+) (+)(+) Figure 7.29: CryptoDB: Level-0 OOG with String objects...better understand this communication, we declared different domains for plain-text (PLAIN), encrypted ( CRYPTO ), alias identifier (ALIASID), and key
An Incremental Life-cycle Assurance Strategy for Critical System Certification
2014-11-04
for Safe Aircraft Operation Embedded software systems introduce a new class of problems not addressed by traditional system modeling & analysis...Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency jitter affects control behavior...do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software system as major source of
NASA Astrophysics Data System (ADS)
Yan, Hui; Wang, K. G.; Jones, Jim E.
2016-06-01
A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mellor-Crummey, John
The PIPER project set out to develop methodologies and software for measurement, analysis, attribution, and presentation of performance data for extreme-scale systems. Goals of the project were to support analysis of massive multi-scale parallelism, heterogeneous architectures, multi-faceted performance concerns, and to support both post-mortem performance analysis to identify program features that contribute to problematic performance and on-line performance analysis to drive adaptation. This final report summarizes the research and development activity at Rice University as part of the PIPER project. Producing a complete suite of performance tools for exascale platforms during the course of this project was impossible since bothmore » hardware and software for exascale systems is still a moving target. For that reason, the project focused broadly on the development of new techniques for measurement and analysis of performance on modern parallel architectures, enhancements to HPCToolkit’s software infrastructure to support our research goals or use on sophisticated applications, engaging developers of multithreaded runtimes to explore how support for tools should be integrated into their designs, engaging operating system developers with feature requests for enhanced monitoring support, engaging vendors with requests that they add hardware measure- ment capabilities and software interfaces needed by tools as they design new components of HPC platforms including processors, accelerators and networks, and finally collaborations with partners interested in using HPCToolkit to analyze and tune scalable parallel applications.« less
Rideout, Jai Ram; He, Yan; Navas-Molina, Jose A; Walters, William A; Ursell, Luke K; Gibbons, Sean M; Chase, John; McDonald, Daniel; Gonzalez, Antonio; Robbins-Pianka, Adam; Clemente, Jose C; Gilbert, Jack A; Huse, Susan M; Zhou, Hong-Wei; Knight, Rob; Caporaso, J Gregory
2014-01-01
We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to "classic" open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, "classic" open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of "classic" open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by "classic" open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME's uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME's OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.
The roofline model: A pedagogical tool for program analysis and optimization
Williams, Samuel; Patterson, David; Oliker, Leonid; ...
2008-08-01
This article consists of a collection of slides from the authors' conference presentation. The Roofline model is a visually intuitive figure for kernel analysis and optimization. We believe undergraduates will find it useful in assessing performance and scalability limitations. It is easily extended to other architectural paradigms. It is easily extendable to other metrics: performance (sort, graphics, crypto..) bandwidth (L2, PCIe, ..). Furthermore, a performance counters could be used to generate a runtime-specific roofline that would greatly aide the optimization.
Cooperative runtime monitoring
NASA Astrophysics Data System (ADS)
Hallé, Sylvain
2013-11-01
Requirements on message-based interactions can be formalised as an interface contract that specifies constraints on the sequence of possible messages that can be exchanged by multiple parties. At runtime, each peer can monitor incoming messages and check that the contract is correctly being followed by their respective senders. We introduce cooperative runtime monitoring, where a recipient 'delegates' its monitoring task to the sender, which is required to provide evidence that the message it sends complies with the contract. In turn, this evidence can be quickly checked by the recipient, which is then guaranteed of the sender's compliance to the contract without doing the monitoring computation by itself. A particular application of this concept is shown on web services, where service providers can monitor and enforce contract compliance of third-party clients at a small cost on the server side, while avoiding to certify or digitally sign them.
Bypassing Races in Live Applications with Execution Filters
2010-01-01
LOOM creates the needed locks and semaphores on demand. The first time a lock or semaphore is refer- enced by one of the inserted synchronization ...runtime. LOOM provides a flexible and safe language for develop- ers to write execution filters that explicitly synchronize code. It then uses an...first compile their application with LOOM. At runtime, to workaround a race, an application developer writes an execution filter that synchronizes the
Argobots: A Lightweight Low-Level Threading and Tasking Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. We describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less
Compiler and Runtime Support for Programming in Adaptive Parallel Environments
1998-10-15
noother job is waiting for resources, and use a smaller number of processors when other jobs needresources. Setia et al. [15, 20] have shown that such...15] Vijay K. Naik, Sanjeev Setia , and Mark Squillante. Performance analysis of job scheduling policiesin parallel supercomputing environments. In...on networks ofheterogeneous workstations. Technical Report CSE-94-012, Oregon Graduate Institute of Scienceand Technology, 1994.[20] Sanjeev Setia
Decaf: Decoupled Dataflows for In Situ High-Performance Workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dreher, M.; Peterka, T.
Decaf is a dataflow system for the parallel communication of coupled tasks in an HPC workflow. The dataflow can perform arbitrary data transformations ranging from simply forwarding data to complex data redistribution. Decaf does this by allowing the user to allocate resources and execute custom code in the dataflow. All communication through the dataflow is efficient parallel message passing over MPI. The runtime for calling tasks is entirely message-driven; Decaf executes a task when all messages for the task have been received. Such a messagedriven runtime allows cyclic task dependencies in the workflow graph, for example, to enact computational steeringmore » based on the result of downstream tasks. Decaf includes a simple Python API for describing the workflow graph. This allows Decaf to stand alone as a complete workflow system, but Decaf can also be used as the dataflow layer by one or more other workflow systems to form a heterogeneous task-based computing environment. In one experiment, we couple a molecular dynamics code with a visualization tool using the FlowVR and Damaris workflow systems and Decaf for the dataflow. In another experiment, we test the coupling of a cosmology code with Voronoi tessellation and density estimation codes using MPI for the simulation, the DIY programming model for the two analysis codes, and Decaf for the dataflow. Such workflows consisting of heterogeneous software infrastructures exist because components are developed separately with different programming models and runtimes, and this is the first time that such heterogeneous coupling of diverse components was demonstrated in situ on HPC systems.« less
A History-based Estimation for LHCb job requirements
NASA Astrophysics Data System (ADS)
Rauschmayr, Nathalie
2015-12-01
The main goal of a Workload Management System (WMS) is to find and allocate resources for the given tasks. The more and better job information the WMS receives, the easier will be to accomplish its task, which directly translates into higher utilization of resources. Traditionally, the information associated with each job, like expected runtime, is defined beforehand by the Production Manager in best case and fixed arbitrary values by default. In the case of LHCb's Workload Management System no mechanisms are provided which automate the estimation of job requirements. As a result, much more CPU time is normally requested than actually needed. Particularly, in the context of multicore jobs this presents a major problem, since single- and multicore jobs shall share the same resources. Consequently, grid sites need to rely on estimations given by the VOs in order to not decrease the utilization of their worker nodes when making multicore job slots available. The main reason for going to multicore jobs is the reduction of the overall memory footprint. Therefore, it also needs to be studied how memory consumption of jobs can be estimated. A detailed workload analysis of past LHCb jobs is presented. It includes a study of job features and their correlation with runtime and memory consumption. Following the features, a supervised learning algorithm is developed based on a history based prediction. The aim is to learn over time how jobs’ runtime and memory evolve influenced due to changes in experiment conditions and software versions. It will be shown that estimation can be notably improved if experiment conditions are taken into account.
Webster, Victoria A; Nieto, Santiago G; Grosberg, Anna; Akkus, Ozan; Chiel, Hillel J; Quinn, Roger D
2016-10-01
In this study, new techniques for approximating the contractile properties of cells in biohybrid devices using Finite Element Analysis (FEA) have been investigated. Many current techniques for modeling biohybrid devices use individual cell forces to simulate the cellular contraction. However, such techniques result in long simulation runtimes. In this study we investigated the effect of the use of thermal contraction on simulation runtime. The thermal contraction model was significantly faster than models using individual cell forces, making it beneficial for rapidly designing or optimizing devices. Three techniques, Stoney׳s Approximation, a Modified Stoney׳s Approximation, and a Thermostat Model, were explored for calibrating thermal expansion/contraction parameters (TECPs) needed to simulate cellular contraction using thermal contraction. The TECP values were calibrated by using published data on the deflections of muscular thin films (MTFs). Using these techniques, TECP values that suitably approximate experimental deflections can be determined by using experimental data obtained from cardiomyocyte MTFs. Furthermore, a sensitivity analysis was performed in order to investigate the contribution of individual variables, such as elastic modulus and layer thickness, to the final calibrated TECP for each calibration technique. Additionally, the TECP values are applicable to other types of biohybrid devices. Two non-MTF models were simulated based on devices reported in the existing literature. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Tan, F.; Wang, G.; Chen, C.; Ge, Z.
2016-12-01
Back-projection of teleseismic P waves [Ishii et al., 2005] has been widely used to image the rupture of earthquakes. Besides the conventional narrowband beamforming in time domain, approaches in frequency domain such as MUSIC back projection (Meng 2011) and compressive sensing (Yao et al, 2011), are proposed to improve the resolution. Each method has its advantages and disadvantages and should be properly used in different cases. Therefore, a thorough research to compare and test these methods is needed. We write a GUI program, which puts the three methods together so that people can conveniently use different methods to process the same data and compare the results. Then we use all the methods to process several earthquake data, including 2008 Wenchuan Mw7.9 earthquake and 2011 Tohoku-Oki Mw9.0 earthquake, and theoretical seismograms of both simple sources and complex ruptures. Our results show differences in efficiency, accuracy and stability among the methods. Quantitative and qualitative analysis are applied to measure their dependence on data and parameters, such as station number, station distribution, grid size, calculate window length and so on. In general, back projection makes it possible to get a good result in a very short time using less than 20 lines of high-quality data with proper station distribution, but the swimming artifact can be significant. Some ways, for instance, combining global seismic data, could help ameliorate this method. Music back projection needs relatively more data to obtain a better and more stable result, which means it needs a lot more time since its runtime accumulates obviously faster than back projection with the increase of station number. Compressive sensing deals more effectively with multiple sources in a same time window, however, costs the longest time due to repeatedly solving matrix. Resolution of all the methods is complicated and depends on many factors. An important one is the grid size, which in turn influences runtime significantly. More detailed results in this research may help people to choose proper data, method and parameters.
Halladay, Jason S; Delarosa, Erlie Marie; Tran, Daniel; Wang, Leslie; Wong, Susan; Khojasteh, S Cyrus
2011-08-01
Here we describe a high capacity and high-throughput, automated, 384-well CYP inhibition assay using well-known HLM-based MS probes. We provide consistently robust IC(50) values at the lead optimization stage of the drug discovery process. Our method uses the Agilent Technologies/Velocity11 BioCel 1200 system, timesaving techniques for sample analysis, and streamlined data processing steps. For each experiment, we generate IC(50) values for up to 344 compounds and positive controls for five major CYP isoforms (probe substrate): CYP1A2 (phenacetin), CYP2C9 ((S)-warfarin), CYP2C19 ((S)-mephenytoin), CYP2D6 (dextromethorphan), and CYP3A4/5 (testosterone and midazolam). Each compound is incubated separately at four concentrations with each CYP probe substrate under the optimized incubation condition. Each incubation is quenched with acetonitrile containing the deuterated internal standard of the respective metabolite for each probe substrate. To minimize the number of samples to be analyzed by LC-MS/MS and reduce the amount of valuable MS runtime, we utilize timesaving techniques of cassette analysis (pooling the incubation samples at the end of each CYP probe incubation into one) and column switching (reducing the amount of MS runtime). Here we also report on the comparison of IC(50) results for five major CYP isoforms using our method compared to values reported in the literature.
Optimization Strategies for Hardware-Based Cofactorization
NASA Astrophysics Data System (ADS)
Loebenberger, Daniel; Putzka, Jens
We use the specific structure of the inputs to the cofactorization step in the general number field sieve (GNFS) in order to optimize the runtime for the cofactorization step on a hardware cluster. An optimal distribution of bitlength-specific ECM modules is proposed and compared to existing ones. With our optimizations we obtain a speedup between 17% and 33% of the cofactorization step of the GNFS when compared to the runtime of an unoptimized cluster.
NASA Technical Reports Server (NTRS)
Vanderbilt, Peter
1999-01-01
This paper gives an overview of GXD, a framework facilitating publication and use of data from diverse data sources. GXD defines an object-oriented data model designed to represent a wide range of things including data, its metadata, resources and query results. GXD also defines a data transport language. a dialect of XML, for representing instances of the data model. This language allows for a wide range of data source implementations by supporting both the direct incorporation of data and the specification of data by various rules. The GXD software library, proto-typed in Java, includes client and server runtimes. The server runtime facilitates the generation of entities containing data encoded in the GXD transport language. The GXD client runtime interprets these entities (potentially from many data sources) to create an illusion of a globally interconnected data space, one that is independent of data source location and implementation.
Enhancing knowledge discovery from cancer genomics data with Galaxy
Albuquerque, Marco A.; Grande, Bruno M.; Ritch, Elie J.; Pararajalingam, Prasath; Jessa, Selin; Krzywinski, Martin; Grewal, Jasleen K.; Shah, Sohrab P.; Boutros, Paul C.
2017-01-01
Abstract The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker. PMID:28327945
Enhancing knowledge discovery from cancer genomics data with Galaxy.
Albuquerque, Marco A; Grande, Bruno M; Ritch, Elie J; Pararajalingam, Prasath; Jessa, Selin; Krzywinski, Martin; Grewal, Jasleen K; Shah, Sohrab P; Boutros, Paul C; Morin, Ryan D
2017-05-01
The field of cancer genomics has demonstrated the power of massively parallel sequencing techniques to inform on the genes and specific alterations that drive tumor onset and progression. Although large comprehensive sequence data sets continue to be made increasingly available, data analysis remains an ongoing challenge, particularly for laboratories lacking dedicated resources and bioinformatics expertise. To address this, we have produced a collection of Galaxy tools that represent many popular algorithms for detecting somatic genetic alterations from cancer genome and exome data. We developed new methods for parallelization of these tools within Galaxy to accelerate runtime and have demonstrated their usability and summarized their runtimes on multiple cloud service providers. Some tools represent extensions or refinement of existing toolkits to yield visualizations suited to cohort-wide cancer genomic analysis. For example, we present Oncocircos and Oncoprintplus, which generate data-rich summaries of exome-derived somatic mutation. Workflows that integrate these to achieve data integration and visualizations are demonstrated on a cohort of 96 diffuse large B-cell lymphomas and enabled the discovery of multiple candidate lymphoma-related genes. Our toolkit is available from our GitHub repository as Galaxy tool and dependency definitions and has been deployed using virtualization on multiple platforms including Docker. © The Author 2017. Published by Oxford University Press.
Typing Local Control and State Using Flow Analysis
NASA Astrophysics Data System (ADS)
Guha, Arjun; Saftoiu, Claudiu; Krishnamurthi, Shriram
Programs written in scripting languages employ idioms that confound conventional type systems. In this paper, we highlight one important set of related idioms: the use of local control and state to reason informally about types. To address these idioms, we formalize run-time tags and their relationship to types, and use these to present a novel strategy to integrate typing with flow analysis in a modular way. We demonstrate that in our separation of typing and flow analysis, each component remains conventional, their composition is simple, but the result can handle these idioms better than either one alone.
Increasing the Runtime Speed of Case-Based Plan Recognition
2015-05-01
number of situations that the robot might reasonably be expected to encounter. This requires ef- ficient indexing schemes to ensure that plan retrieval...collection of information if it does not display a currently valid OMB control number . 1. REPORT DATE MAY 2015 2. REPORT TYPE 3. DATES COVERED 00...00-2015 to 00-00-2015 4. TITLE AND SUBTITLE Increasing the Runtime Speed of Case-Based Plan Recognition 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c
Determination of the Underlying Task Scheduling Algorithm for an Ada Runtime System
1989-12-01
was also curious as to how well I could model the test cases with Ada programs . In particular, I wanted to see whether I could model the equal arrival...parameter relationshis=s required to detect the execution of individual algorithms. These test cases were modeled using Ada programs . Then, the...results were analyzed to determine whether the Ada programs were capable of revealing the task scheduling algorithm used by the Ada run-time system. This
Accelerating semantic graph databases on commodity clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Castellana, Vito G.; Haglin, David J.
We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.
2016-02-01
system consists of a high-fidelity hardware simulation using field programmable gate arrays (FPGAs), with a set of runtime services (ConcreteWare...perimeter protection, patch, and pray” is not aligned with the threat. Programmers will not bail us out of this situation (by writing defect free code...hosted on a Field Programmable Gate Array (FPGA), with a set of runtime services (concreteware) running on the hardware. Secure applications can be
Argobots: A Lightweight Low-Level Threading and Tasking Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach.« less
Argobots: A Lightweight Low-Level Threading and Tasking Framework
Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan; ...
2017-10-24
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less
Argobots: A Lightweight Low-Level Threading and Tasking Framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seo, Sangmin; Amer, Abdelhalim; Balaji, Pavan
In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, are either too specific to applications or architectures or are not as powerful or flexible. In this article, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing amore » rich set of controls to allow specialization by the user or high-level programming model. Here, we describe the design, implementation, and optimization of Argobots and present integrations with three example high-level models: OpenMP, MPI, and co-located I/O service. Evaluations show that (1) Argobots outperforms existing generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency hiding capabilities; and (4) I/O service with Argobots reduces interference with co-located applications, achieving performance competitive with that of the Pthreads version.« less
Static and Dynamic Frequency Scaling on Multicore CPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bao, Wenlei; Hong, Changwan; Chunduri, Sudheer
2016-12-28
Dynamic voltage and frequency scaling (DVFS) adapts CPU power consumption by modifying a processor’s operating frequency (and the associated voltage). Typical approaches employing DVFS involve default strategies such as running at the lowest or the highest frequency, or observing the CPU’s runtime behavior and dynamically adapting the voltage/frequency configuration based on CPU usage. In this paper, we argue that many previous approaches suffer from inherent limitations, such as not account- ing for processor-specific impact of frequency changes on energy for different workload types. We first propose a lightweight runtime-based approach to automatically adapt the frequency based on the CPU workload,more » that is agnostic of the processor characteristics. We then show that further improvements can be achieved for affine kernels in the application, using a compile-time characterization instead of run-time monitoring to select the frequency and number of CPU cores to use. Our framework relies on a one-time energy characterization of CPU-specific DVFS profiles followed by a compile-time categorization of loop-based code segments in the application. These are combined to determine a priori of the frequency and the number of cores to use to execute the application so as to optimize energy or energy-delay product, outperforming runtime approach. Extensive evaluation on 60 benchmarks and five multi-core CPUs show that our approach systematically outperforms the powersave Linux governor, while improving overall performance.« less
Universal Serial Bus Architecture for Removable Media (USB-ARM)
DOE Office of Scientific and Technical Information (OSTI.GOV)
2011-03-09
USB-ARM creates operating system drivers which sit between removable media and the user and applications. The drivers isolate the media and submit the contents of the media to a virtual machine containing an entire scanning system. This scanning system may include traditional anti-virus, but also allows more detailed analysis of files, including dynamic run-time analysis, helping to prevent "zero-day" threats not already identified in anti-virus signatures. Once cleared, the media is presented to the operating system, at which point it becomes available to users and applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sadayappan, Ponnuswamy
Exascale computing systems will provide a thousand-fold increase in parallelism and a proportional increase in failure rate relative to today's machines. Systems software for exascale machines must provide the infrastructure to support existing applications while simultaneously enabling efficient execution of new programming models that naturally express dynamic, adaptive, irregular computation; coupled simulations; and massive data analysis in a highly unreliable hardware environment with billions of threads of execution. We propose a new approach to the data and work distribution model provided by system software based on the unifying formalism of an abstract file system. The proposed hierarchical data model providesmore » simple, familiar visibility and access to data structures through the file system hierarchy, while providing fault tolerance through selective redundancy. The hierarchical task model features work queues whose form and organization are represented as file system objects. Data and work are both first class entities. By exposing the relationships between data and work to the runtime system, information is available to optimize execution time and provide fault tolerance. The data distribution scheme provides replication (where desirable and possible) for fault tolerance and efficiency, and it is hierarchical to make it possible to take advantage of locality. The user, tools, and applications, including legacy applications, can interface with the data, work queues, and one another through the abstract file model. This runtime environment will provide multiple interfaces to support traditional Message Passing Interface applications, languages developed under DARPA's High Productivity Computing Systems program, as well as other, experimental programming models. We will validate our runtime system with pilot codes on existing platforms and will use simulation to validate for exascale-class platforms. In this final report, we summarize research results from the work done at the Ohio State University towards the larger goals of the project listed above.« less
Challenges in High-Assurance Runtime Verification
NASA Technical Reports Server (NTRS)
Goodloe, Alwyn E.
2016-01-01
Safety-critical systems are growing more complex and becoming increasingly autonomous. Runtime Verification (RV) has the potential to provide protections when a system cannot be assured by conventional means, but only if the RV itself can be trusted. In this paper, we proffer a number of challenges to realizing high-assurance RV and illustrate how we have addressed them in our research. We argue that high-assurance RV provides a rich target for automated verification tools in hope of fostering closer collaboration among the communities.
Affordance Templates for Shared Robot Control
NASA Technical Reports Server (NTRS)
Hart, Stephen; Dinh, Paul; Hambuchen, Kim
2014-01-01
This paper introduces the Affordance Template framework used to supervise task behaviors on the NASA-JSC Valkyrie robot at the 2013 DARPA Robotics Challenge (DRC) Trials. This framework provides graphical interfaces to human supervisors that are adjustable based on the run-time environmental context (e.g., size, location, and shape of objects that the robot must interact with, etc.). Additional improvements, described below, inject degrees of autonomy into instantiations of affordance templates at run-time in order to enable efficient human supervision of the robot for accomplishing tasks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel; Patterson, David; Oliker, Leonid
This article consists of a collection of slides from the authors' conference presentation. The Roofline model is a visually intuitive figure for kernel analysis and optimization. We believe undergraduates will find it useful in assessing performance and scalability limitations. It is easily extended to other architectural paradigms. It is easily extendable to other metrics: performance (sort, graphics, crypto..) bandwidth (L2, PCIe, ..). Furthermore, a performance counters could be used to generate a runtime-specific roofline that would greatly aide the optimization.
UAV Swarm Tactics: An Agent-Based Simulation and Markov Process Analysis
2013-06-01
CRN Common Random Numbers CSV Comma Separated Values DoE Design of Experiment GLM Generalized Linear Model HVT High Value Target JAR Java ARchive JMF... Java Media Framework JRE Java runtime environment Mason Multi-Agent Simulator Of Networks MOE Measure Of Effectiveness MOP Measures Of Performance...with every set several times, and to write a CSV file with the results. Rather than scripting the agent behavior deterministically, the agents should
A Development Testbed for ALPS-Based Systems
1988-10-01
alloted to tile application because of size or power constraints). Given an underlying support ALPS architecture such as the d-ALPS architecture, a...resource on which it is assigned at runtime. A second representation problem is that most graph analysis algorithms treat either graphs with weighted links...subtask) associated with it but is treated like other links. In d-ALPS, as a priority precedence link, it would cause the binding of a pro- cessor: as a
An Exploratory Analysis of Projected Navy Officer Inventory Strength Using Data Farming
2016-09-01
model’s run-time. 3. Base Case In addition to the experimental design, this study includes a base case scenario to serve as a baseline for comparison...47 3. SWO Operating Strength Deviation-Base Case One objective of this study is to determine the risk in operating strength deviation presented by...ANSWERS TO RESEARCH QUESTIONS ................................... 71 B. RECOMMENDATIONS FOR FUTURE STUDIES ......................... 73 1. Continuous
DOE Office of Scientific and Technical Information (OSTI.GOV)
Poliakoff, David; Legendre, Matt
2017-03-29
GOTCHA is a runtime API intercepting function calls between shared libraries. It is intended to be used by HPC Tools (i.e., performance analysis tools like Open/SpeedShop, HPCToolkit, TAU, etc.). 2:18 PMThese other tools can use Gotch to intercept interesting functions, such as MPI functions, and collect performance metrics about those functions. We intend for this to be open-source software that gets adopted by other open-s0urse tools that are used at LLNL.
Design for Run-Time Monitor on Cloud Computing
NASA Astrophysics Data System (ADS)
Kang, Mikyung; Kang, Dong-In; Yun, Mira; Park, Gyung-Leen; Lee, Junghoon
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is the type of a parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring the system status change, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize resources on cloud computing. RTM monitors application software through library instrumentation as well as underlying hardware through performance counter optimizing its computing configuration based on the analyzed data.
Colt: an experiment in wormhole run-time reconfiguration
NASA Astrophysics Data System (ADS)
Bittner, Ray; Athanas, Peter M.; Musgrove, Mark
1996-10-01
Wormhole run-time reconfiguration (RTR) is an attempt to create a refined computing paradigm for high performance computational tasks. By combining concepts from field programmable gate array (FPGA) technologies with data flow computing, the Colt/Stallion architecture achieves high utilization of hardware resources, and facilitates rapid run-time reconfiguration. Targeted mainly at DSP-type operations, the Colt integrated circuit -- a prototype wormhole RTR device -- compares favorably to contemporary DSP alternatives in terms of silicon area consumed per unit computation and in computing performance. Although emphasis has been placed on signal processing applications, general purpose computation has not been overlooked. Colt is a prototype that defines an architecture not only at the chip level but also in terms of an overall system design. As this system is realized, the concept of wormhole RTR will be applied to numerical computation and DSP applications including those common to image processing, communications systems, digital filters, acoustic processing, real-time control systems and simulation acceleration.
Pattern Driven Selection and Configuration of S&D Mechanisms at Runtime
NASA Astrophysics Data System (ADS)
Crespo, Beatriz Gallego-Nicasio; Piñuela, Ana; Soria-Rodriguez, Pedro; Serrano, Daniel; Maña, Antonio
In order to satisfy the requests of SERENITY-aware applications, the SERENITY Runtime Framework’s main task is to perform pattern selection, to provide the application with the most suitable S&D Solution that satisfies the request. The result of this selection process depends on two main factors: the content of the S&D Library and the information stored and managed by the Context Manager. Three processes are involved: searching of the S&D Library to get the initial set of candidates to be selected; filtering and ordering the collection, based on the SRF configuration; and perform a loop to check S&D Pattern preconditions over the remaining S&D Artifacts in order to select the most suitable S&D Pattern first, and later the appropriate S&D Implementation for the environment conditions. Once the S&D Implementation is selected, the SERENITY Runtime Framework instantiates an Executable Component (EC) and provides the application with the necessary information and mechanism to make use of the EC.
Yu, Peng; Shaw, Chad A
2014-06-01
The Dirichlet-multinomial (DMN) distribution is a fundamental model for multicategory count data with overdispersion. This distribution has many uses in bioinformatics including applications to metagenomics data, transctriptomics and alternative splicing. The DMN distribution reduces to the multinomial distribution when the overdispersion parameter ψ is 0. Unfortunately, numerical computation of the DMN log-likelihood function by conventional methods results in instability in the neighborhood of [Formula: see text]. An alternative formulation circumvents this instability, but it leads to long runtimes that make it impractical for large count data common in bioinformatics. We have developed a new method for computation of the DMN log-likelihood to solve the instability problem without incurring long runtimes. The new approach is composed of a novel formula and an algorithm to extend its applicability. Our numerical experiments show that this new method both improves the accuracy of log-likelihood evaluation and the runtime by several orders of magnitude, especially in high-count data situations that are common in deep sequencing data. Using real metagenomic data, our method achieves manyfold runtime improvement. Our method increases the feasibility of using the DMN distribution to model many high-throughput problems in bioinformatics. We have included in our work an R package giving access to this method and a vingette applying this approach to metagenomic data. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools
Cheng, Yinhe; Tzeng, Tzy-Hwa Kathy
2016-01-01
This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156–186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators, if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting input data are provided by using plug-in tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of next generation sequencing (NGS) data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime of NGS data pre-processing from about 20 hours to about nine minutes for a whole-genome sequencing data set on the same system using up to 711 GB of memory. PMID:27861637
2015-09-30
DISTRIBUTION STATEMENT A: Distribution approved for public release; distribution is unlimited. NPS-NRL- Rice -UIUC Collaboration on Navy Atmosphere...portability. There is still a gap in the OCCA support for Fortran programmers who do not have accelerator experience. Activities at Rice /Virginia Tech are...for automated data movement and for kernel optimization using source code analysis and run-time detective work. In this quarter the Rice /Virginia
A Model-Driven Co-Design Framework for Fusing Control and Scheduling Viewpoints.
Sundharam, Sakthivel Manikandan; Navet, Nicolas; Altmeyer, Sebastian; Havet, Lionel
2018-02-20
Model-Driven Engineering (MDE) is widely applied in the industry to develop new software functions and integrate them into the existing run-time environment of a Cyber-Physical System (CPS). The design of a software component involves designers from various viewpoints such as control theory, software engineering, safety, etc. In practice, while a designer from one discipline focuses on the core aspects of his field (for instance, a control engineer concentrates on designing a stable controller), he neglects or considers less importantly the other engineering aspects (for instance, real-time software engineering or energy efficiency). This may cause some of the functional and non-functional requirements not to be met satisfactorily. In this work, we present a co-design framework based on timing tolerance contract to address such design gaps between control and real-time software engineering. The framework consists of three steps: controller design, verified by jitter margin analysis along with co-simulation, software design verified by a novel schedulability analysis, and the run-time verification by monitoring the execution of the models on target. This framework builds on CPAL (Cyber-Physical Action Language), an MDE design environment based on model-interpretation, which enforces a timing-realistic behavior in simulation through timing and scheduling annotations. The application of our framework is exemplified in the design of an automotive cruise control system.
A Model-Driven Co-Design Framework for Fusing Control and Scheduling Viewpoints
Navet, Nicolas; Havet, Lionel
2018-01-01
Model-Driven Engineering (MDE) is widely applied in the industry to develop new software functions and integrate them into the existing run-time environment of a Cyber-Physical System (CPS). The design of a software component involves designers from various viewpoints such as control theory, software engineering, safety, etc. In practice, while a designer from one discipline focuses on the core aspects of his field (for instance, a control engineer concentrates on designing a stable controller), he neglects or considers less importantly the other engineering aspects (for instance, real-time software engineering or energy efficiency). This may cause some of the functional and non-functional requirements not to be met satisfactorily. In this work, we present a co-design framework based on timing tolerance contract to address such design gaps between control and real-time software engineering. The framework consists of three steps: controller design, verified by jitter margin analysis along with co-simulation, software design verified by a novel schedulability analysis, and the run-time verification by monitoring the execution of the models on target. This framework builds on CPAL (Cyber-Physical Action Language), an MDE design environment based on model-interpretation, which enforces a timing-realistic behavior in simulation through timing and scheduling annotations. The application of our framework is exemplified in the design of an automotive cruise control system. PMID:29461489
COVERT: A Framework for Finding Buffer Overflows in C Programs via Software Verification
2010-08-01
is greater than the allocated size of B. In the case of a type-safe language or a language with runtime bounds checking (such as Java), an overflow...leads either to a (compile-time) type error or a (runtime) exception. In such languages , a buffer overflow can lead to a denial of service attack (i.e...of current and legacy software is written in unsafe languages (such as C or C++) that allow buffers to be overflowed with impunity. For reasons such as
A Runtime Performance Predictor for Selecting Tabu Tenures
NASA Technical Reports Server (NTRS)
Allen, John A.; Minton, Steven N.
1997-01-01
One of the drawbacks of parameter based systems, such as tabu search, is the difficulty of finding the correct parameter for a particular problem. Often, rule-of-thumb advice is given which may have little or no applicability to the domain or problem instance at hand. This paper describes the application of a general technique, Runtime Performance Predictors (RPP) which can be used to determine, in an efficient manner, the correct tabu tenure for a particular problem instance. The details of the approach and a demonstration using a variant of GSAT are presented.
Moreira, Fernando X; Silva, Renata; André, Maria B; de Pinho, Paula G; Bastos, Maria L; Ruivo, João; Ruivo, Patrícia; Carmo, Helena
2018-07-01
The use of performance enhancing drugs is not only common in humans, but also in animal sports, including racing of horses, greyhounds and pigeons. The development of accurate analytical procedures to detect doping agents in sports is crucial in order to protect the fair-play of the game, avoid financial fraud in the attribution of eventual awards and, even more important, to protect the animals from harmful drugs and/or dangerous dosage regimens. The present study aimed to develop and validate, a method that enabled the screening and confirmation of the presence of a beta-agonist (clenbuterol) and three corticosteroids (betamethasone, prednisolone and budesonide) in faeces from pigeons. The extraction procedure entailed the combination of liquid-liquid extraction with solid-phase extraction and the analysis was performed by liquid- chromatography coupled to tandem mass spectrometry, with a single 15 minute chromatographic run-time. The method was validated concerning selectivity, linearity (with coefficients of determination always >0.99), accuracy (87.5-114.9%), inter-day and intra-day precisions, limits of detection (0.14-1.81 ng/g) and limits of quantification (0.49-6.08 ng/g), stability and extraction recovery (71.0%-99.3%). The method was successfully applied for the analysis of samples from two pigeons that had been orally administered betamethasone, demonstrating its suitability for doping control purposes. Copyright © 2018. Published by Elsevier B.V.
InterFace: A software package for face image warping, averaging, and principal components analysis.
Kramer, Robin S S; Jenkins, Rob; Burton, A Mike
2017-12-01
We describe InterFace, a software package for research in face recognition. The package supports image warping, reshaping, averaging of multiple face images, and morphing between faces. It also supports principal components analysis (PCA) of face images, along with tools for exploring the "face space" produced by PCA. The package uses a simple graphical user interface, allowing users to perform these sophisticated image manipulations without any need for programming knowledge. The program is available for download in the form of an app, which requires that users also have access to the (freely available) MATLAB Runtime environment.
Implementing Parquet equations using HPX
NASA Astrophysics Data System (ADS)
Kellar, Samuel; Wagle, Bibek; Yang, Shuxiang; Tam, Ka-Ming; Kaiser, Hartmut; Moreno, Juana; Jarrell, Mark
A new C++ runtime system (HPX) enables simulations of complex systems to run more efficiently on parallel and heterogeneous systems. This increased efficiency allows for solutions to larger simulations of the parquet approximation for a system with impurities. The relevancy of the parquet equations depends upon the ability to solve systems which require long runs and large amounts of memory. These limitations, in addition to numerical complications arising from stability of the solutions, necessitate running on large distributed systems. As the computational resources trend towards the exascale and the limitations arising from computational resources vanish efficiency of large scale simulations becomes a focus. HPX facilitates efficient simulations through intelligent overlapping of computation and communication. Simulations such as the parquet equations which require the transfer of large amounts of data should benefit from HPX implementations. Supported by the the NSF EPSCoR Cooperative Agreement No. EPS-1003897 with additional support from the Louisiana Board of Regents.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lim, Hyun; Loiseau, Julien
FleCSI is a compile-time con gurable framework designed to support multi-physics application development. As such, FleCSI provides a very general set of infrastructure design patterns that can be specialized and extended to suit the needs of a broad variety of solver and data requirements. FleCSI currently supports multi-dimensional mesh topology, geometry, and adjacency information, as well as n-dimensional hashed-tree data structures, graph partitioning interfaces, and dependency closures. FleCSI introduces a functional programming model with control, execution, and data abstractions that are consistent both with MPI and with state-of-the-art, task-based runtimes such as Legion and Charm++. The abstraction layer insulates developersmore » from the underlying runtime, while allowing support for multiple runtime systems including conventional models like asynchronous MPI. The intent is to provide developers with a concrete set of user-friendly programming tools that can be used now, while allowing exibility in choosing runtime implementations and optimization that can be applied to future architectures and runtimes. FleCSI's control and execution models provide formal nomenclature for describing poorly understood concepts such as kernels and tasks. FleCSI's data model provides a low-buy-in approach that makes it an attractive option for many application projects, as developers are not locked into particular layouts or data structure representations. FleCSI currently provides a parallel but not distributed implementation of Binary, Quad and Oct-tree topology. This implementation is base on space lling curves domain decomposition, the Morton order. The current FleCSI version requires the implementation of a driver and a specialization driver. The role of the specialization driver is to provide the data distribution. This feature is not complete in FleCSI code and we provide it. The next step will be to incorporate it directly from FleCSPH to FleCSI as we reach a good level of performance. Then the driver represent the general execution of the resolution without worrying of the data locality and communications. As FleCSI is an On-Development code the structure may change in the future and we keep track of these changes in FleCSPH.« less
A Hartree-Fock Application Using UPC++ and the New DArray Library
Ozog, David; Kamil, Amir; Zheng, Yili; ...
2016-07-21
The Hartree-Fock (HF) method is the fundamental first step for incorporating quantum mechanics into many-electron simulations of atoms and molecules, and it is an important component of computational chemistry toolkits like NWChem. The GTFock code is an HF implementation that, while it does not have all the features in NWChem, represents crucial algorithmic advances that reduce communication and improve load balance by doing an up-front static partitioning of tasks, followed by work stealing whenever necessary. To enable innovations in algorithms and exploit next generation exascale systems, it is crucial to support quantum chemistry codes using expressive and convenient programming modelsmore » and runtime systems that are also efficient and scalable. Here, this paper presents an HF implementation similar to GTFock using UPC++, a partitioned global address space model that includes flexible communication, asynchronous remote computation, and a powerful multidimensional array library. UPC++ offers runtime features that are useful for HF such as active messages, a rich calculus for array operations, hardware-supported fetch-and-add, and functions for ensuring asynchronous runtime progress. We present a new distributed array abstraction, DArray, that is convenient for the kinds of random-access array updates and linear algebra operations on block-distributed arrays with irregular data ownership. Finally, we analyze the performance of atomic fetch-and-add operations (relevant for load balancing) and runtime attentiveness, then compare various techniques and optimizations for each. Our optimized implementation of HF using UPC++ and the DArrays library shows up to 20% improvement over GTFock with Global Arrays at scales up to 24,000 cores.« less
A fundamental study of suction for Laminar Flow Control (LFC)
NASA Astrophysics Data System (ADS)
Watmuff, Jonathan H.
1992-10-01
This report covers the period forming the first year of the project. The aim is to experimentally investigate the effects of suction as a technique for Laminar Flow Control. Experiments are to be performed which require substantial modifications to be made to the experimental facility. Considerable effort has been spent developing new high performance constant temperature hot-wire anemometers for general purpose use in the Fluid Mechanics Laboratory. Twenty instruments have been delivered. An important feature of the facility is that it is totally automated under computer control. Unprecedently large quantities of data can be acquired and the results examined using the visualization tools developed specifically for studying the results of numerical simulations on graphics works stations. The experiment must be run for periods of up to a month at a time since the data is collected on a point-by-point basis. Several techniques were implemented to reduce the experimental run-time by a significant factor. Extra probes have been constructed and modifications have been made to the traverse hardware and to the real-time experimental code to enable multiple probes to be used. This will reduce the experimental run-time by the appropriate factor. Hot-wire calibration drift has been a frustrating problem owing to the large range of ambient temperatures experienced in the laboratory. The solution has been to repeat the calibrations at frequent intervals. However the calibration process has consumed up to 40 percent of the run-time. A new method of correcting the drift is very nearly finalized and when implemented it will also lead to a significant reduction in the experimental run-time.
A Hartree-Fock Application Using UPC++ and the New DArray Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ozog, David; Kamil, Amir; Zheng, Yili
The Hartree-Fock (HF) method is the fundamental first step for incorporating quantum mechanics into many-electron simulations of atoms and molecules, and it is an important component of computational chemistry toolkits like NWChem. The GTFock code is an HF implementation that, while it does not have all the features in NWChem, represents crucial algorithmic advances that reduce communication and improve load balance by doing an up-front static partitioning of tasks, followed by work stealing whenever necessary. To enable innovations in algorithms and exploit next generation exascale systems, it is crucial to support quantum chemistry codes using expressive and convenient programming modelsmore » and runtime systems that are also efficient and scalable. Here, this paper presents an HF implementation similar to GTFock using UPC++, a partitioned global address space model that includes flexible communication, asynchronous remote computation, and a powerful multidimensional array library. UPC++ offers runtime features that are useful for HF such as active messages, a rich calculus for array operations, hardware-supported fetch-and-add, and functions for ensuring asynchronous runtime progress. We present a new distributed array abstraction, DArray, that is convenient for the kinds of random-access array updates and linear algebra operations on block-distributed arrays with irregular data ownership. Finally, we analyze the performance of atomic fetch-and-add operations (relevant for load balancing) and runtime attentiveness, then compare various techniques and optimizations for each. Our optimized implementation of HF using UPC++ and the DArrays library shows up to 20% improvement over GTFock with Global Arrays at scales up to 24,000 cores.« less
A fundamental study of suction for Laminar Flow Control (LFC)
NASA Technical Reports Server (NTRS)
Watmuff, Jonathan H.
1992-01-01
This report covers the period forming the first year of the project. The aim is to experimentally investigate the effects of suction as a technique for Laminar Flow Control. Experiments are to be performed which require substantial modifications to be made to the experimental facility. Considerable effort has been spent developing new high performance constant temperature hot-wire anemometers for general purpose use in the Fluid Mechanics Laboratory. Twenty instruments have been delivered. An important feature of the facility is that it is totally automated under computer control. Unprecedently large quantities of data can be acquired and the results examined using the visualization tools developed specifically for studying the results of numerical simulations on graphics works stations. The experiment must be run for periods of up to a month at a time since the data is collected on a point-by-point basis. Several techniques were implemented to reduce the experimental run-time by a significant factor. Extra probes have been constructed and modifications have been made to the traverse hardware and to the real-time experimental code to enable multiple probes to be used. This will reduce the experimental run-time by the appropriate factor. Hot-wire calibration drift has been a frustrating problem owing to the large range of ambient temperatures experienced in the laboratory. The solution has been to repeat the calibrations at frequent intervals. However the calibration process has consumed up to 40 percent of the run-time. A new method of correcting the drift is very nearly finalized and when implemented it will also lead to a significant reduction in the experimental run-time.
Fault-Tolerant and Elastic Streaming MapReduce with Decentralized Coordination
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumbhare, Alok; Frincu, Marc; Simmhan, Yogesh
2015-06-29
The MapReduce programming model, due to its simplicity and scalability, has become an essential tool for processing large data volumes in distributed environments. Recent Stream Processing Systems (SPS) extend this model to provide low-latency analysis of high-velocity continuous data streams. However, integrating MapReduce with streaming poses challenges: first, the runtime variations in data characteristics such as data-rates and key-distribution cause resource overload, that inturn leads to fluctuations in the Quality of the Service (QoS); and second, the stateful reducers, whose state depends on the complete tuple history, necessitates efficient fault-recovery mechanisms to maintain the desired QoS in the presence ofmore » resource failures. We propose an integrated streaming MapReduce architecture leveraging the concept of consistent hashing to support runtime elasticity along with locality-aware data and state replication to provide efficient load-balancing with low-overhead fault-tolerance and parallel fault-recovery from multiple simultaneous failures. Our evaluation on a private cloud shows up to 2:8 improvement in peak throughput compared to Apache Storm SPS, and a low recovery latency of 700 -1500 ms from multiple failures.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seal, Sudip K; Perumalla, Kalyan S; Hirshman, Steven Paul
2013-01-01
Simulations that require solutions of block tridiagonal systems of equations rely on fast parallel solvers for runtime efficiency. Leading parallel solvers that are highly effective for general systems of equations, dense or sparse, are limited in scalability when applied to block tridiagonal systems. This paper presents scalability results as well as detailed analyses of two parallel solvers that exploit the special structure of block tridiagonal matrices to deliver superior performance, often by orders of magnitude. A rigorous analysis of their relative parallel runtimes is shown to reveal the existence of a critical block size that separates the parameter space spannedmore » by the number of block rows, the block size and the processor count, into distinct regions that favor one or the other of the two solvers. Dependence of this critical block size on the above parameters as well as on machine-specific constants is established. These formal insights are supported by empirical results on up to 2,048 cores of a Cray XT4 system. To the best of our knowledge, this is the highest reported scalability for parallel block tridiagonal solvers to date.« less
Discrete Event Modeling and Massively Parallel Execution of Epidemic Outbreak Phenomena
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S; Seal, Sudip K
2011-01-01
In complex phenomena such as epidemiological outbreaks, the intensity of inherent feedback effects and the significant role of transients in the dynamics make simulation the only effective method for proactive, reactive or post-facto analysis. The spatial scale, runtime speed, and behavioral detail needed in detailed simulations of epidemic outbreaks make it necessary to use large-scale parallel processing. Here, an optimistic parallel execution of a new discrete event formulation of a reaction-diffusion simulation model of epidemic propagation is presented to facilitate in dramatically increasing the fidelity and speed by which epidemiological simulations can be performed. Rollback support needed during optimistic parallelmore » execution is achieved by combining reverse computation with a small amount of incremental state saving. Parallel speedup of over 5,500 and other runtime performance metrics of the system are observed with weak-scaling execution on a small (8,192-core) Blue Gene / P system, while scalability with a weak-scaling speedup of over 10,000 is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes exceeding several hundreds of millions of individuals in the largest cases are successfully exercised to verify model scalability.« less
Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing
Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon
2011-01-01
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811
Design and development of a run-time monitor for multi-core architectures in cloud computing.
Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon
2011-01-01
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.
Mentat/A: Medium grain parallel processing
NASA Technical Reports Server (NTRS)
Grimshaw, Andrew S.
1992-01-01
The objective of this project is to test the Algorithm to Architecture Mapping Model (ATAMM) firing rules using the Mentat run-time system and the Mentat Programming Language (MPL). A special version of Mentat, Mentat/A (Mentat/ATAMM) was constructed. This required changes to: (1) modify the run-time system to control queue length and inhibit actor firing until required data tokens are available and space is available in the input queues of all of the direct descendent actors; (2) disallow the specification of persistent object classes in the MPL; and (3) permit only decision free graphs in the MPL. We were successful in implementing the spirit of the plan, although some goals changed as we came to better understand the problem. We report on what we accomplished and the lessons we learned. The Mentat/A run-time system is discussed, and we briefly present the compiler. We present results for three applications and conclude with a summary and some observations. Appendix A contains a list of technical reports and published papers partially supported by the grant. Appendix B contains listings for the three applications.
Optimization and Control of Cyber-Physical Vehicle Systems
Bradley, Justin M.; Atkins, Ella M.
2015-01-01
A cyber-physical system (CPS) is composed of tightly-integrated computation, communication and physical elements. Medical devices, buildings, mobile devices, robots, transportation and energy systems can benefit from CPS co-design and optimization techniques. Cyber-physical vehicle systems (CPVSs) are rapidly advancing due to progress in real-time computing, control and artificial intelligence. Multidisciplinary or multi-objective design optimization maximizes CPS efficiency, capability and safety, while online regulation enables the vehicle to be responsive to disturbances, modeling errors and uncertainties. CPVS optimization occurs at design-time and at run-time. This paper surveys the run-time cooperative optimization or co-optimization of cyber and physical systems, which have historically been considered separately. A run-time CPVS is also cooperatively regulated or co-regulated when cyber and physical resources are utilized in a manner that is responsive to both cyber and physical system requirements. This paper surveys research that considers both cyber and physical resources in co-optimization and co-regulation schemes with applications to mobile robotic and vehicle systems. Time-varying sampling patterns, sensor scheduling, anytime control, feedback scheduling, task and motion planning and resource sharing are examined. PMID:26378541
Optimization and Control of Cyber-Physical Vehicle Systems.
Bradley, Justin M; Atkins, Ella M
2015-09-11
A cyber-physical system (CPS) is composed of tightly-integrated computation, communication and physical elements. Medical devices, buildings, mobile devices, robots, transportation and energy systems can benefit from CPS co-design and optimization techniques. Cyber-physical vehicle systems (CPVSs) are rapidly advancing due to progress in real-time computing, control and artificial intelligence. Multidisciplinary or multi-objective design optimization maximizes CPS efficiency, capability and safety, while online regulation enables the vehicle to be responsive to disturbances, modeling errors and uncertainties. CPVS optimization occurs at design-time and at run-time. This paper surveys the run-time cooperative optimization or co-optimization of cyber and physical systems, which have historically been considered separately. A run-time CPVS is also cooperatively regulated or co-regulated when cyber and physical resources are utilized in a manner that is responsive to both cyber and physical system requirements. This paper surveys research that considers both cyber and physical resources in co-optimization and co-regulation schemes with applications to mobile robotic and vehicle systems. Time-varying sampling patterns, sensor scheduling, anytime control, feedback scheduling, task and motion planning and resource sharing are examined.
A Concept for Run-Time Support of the Chapel Language
NASA Technical Reports Server (NTRS)
James, Mark
2006-01-01
A document presents a concept for run-time implementation of other concepts embodied in the Chapel programming language. (Now undergoing development, Chapel is intended to become a standard language for parallel computing that would surpass older such languages in both computational performance in the efficiency with which pre-existing code can be reused and new code written.) The aforementioned other concepts are those of distributions, domains, allocations, and access, as defined in a separate document called "A Semantic Framework for Domains and Distributions in Chapel" and linked to a language specification defined in another separate document called "Chapel Specification 0.3." The concept presented in the instant report is recognition that a data domain that was invented for Chapel offers a novel approach to distributing and processing data in a massively parallel environment. The concept is offered as a starting point for development of working descriptions of functions and data structures that would be necessary to implement interfaces to a compiler for transforming the aforementioned other concepts from their representations in Chapel source code to their run-time implementations.
Final Project Report. Scalable fault tolerance runtime technology for petascale computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnamoorthy, Sriram; Sadayappan, P
With the massive number of components comprising the forthcoming petascale computer systems, hardware failures will be routinely encountered during execution of large-scale applications. Due to the multidisciplinary, multiresolution, and multiscale nature of scientific problems that drive the demand for high end systems, applications place increasingly differing demands on the system resources: disk, network, memory, and CPU. In addition to MPI, future applications are expected to use advanced programming models such as those developed under the DARPA HPCS program as well as existing global address space programming models such as Global Arrays, UPC, and Co-Array Fortran. While there has been amore » considerable amount of work in fault tolerant MPI with a number of strategies and extensions for fault tolerance proposed, virtually none of advanced models proposed for emerging petascale systems is currently fault aware. To achieve fault tolerance, development of underlying runtime and OS technologies able to scale to petascale level is needed. This project has evaluated range of runtime techniques for fault tolerance for advanced programming models.« less
Power-constrained supercomputing
NASA Astrophysics Data System (ADS)
Bailey, Peter E.
As we approach exascale systems, power is turning from an optimization goal to a critical operating constraint. With power bounds imposed by both stakeholders and the limitations of existing infrastructure, achieving practical exascale computing will therefore rely on optimizing performance subject to a power constraint. However, this requirement should not add to the burden of application developers; optimizing the runtime environment given restricted power will primarily be the job of high-performance system software. In this dissertation, we explore this area and develop new techniques that extract maximum performance subject to a particular power constraint. These techniques include a method to find theoretical optimal performance, a runtime system that shifts power in real time to improve performance, and a node-level prediction model for selecting power-efficient operating points. We use a linear programming (LP) formulation to optimize application schedules under various power constraints, where a schedule consists of a DVFS state and number of OpenMP threads for each section of computation between consecutive message passing events. We also provide a more flexible mixed integer-linear (ILP) formulation and show that the resulting schedules closely match schedules from the LP formulation. Across four applications, we use our LP-derived upper bounds to show that current approaches trail optimal, power-constrained performance by up to 41%. This demonstrates limitations of current systems, and our LP formulation provides future optimization approaches with a quantitative optimization target. We also introduce Conductor, a run-time system that intelligently distributes available power to nodes and cores to improve performance. The key techniques used are configuration space exploration and adaptive power balancing. Configuration exploration dynamically selects the optimal thread concurrency level and DVFS state subject to a hardware-enforced power bound. Adaptive power balancing efficiently predicts where critical paths are likely to occur and distributes power to those paths. Greater power, in turn, allows increased thread concurrency levels, CPU frequency/voltage, or both. We describe these techniques in detail and show that, compared to the state-of-the-art technique of using statically predetermined, per-node power caps, Conductor leads to a best-case performance improvement of up to 30%, and an average improvement of 19.1%. At the node level, an accurate power/performance model will aid in selecting the right configuration from a large set of available configurations. We present a novel approach to generate such a model offline using kernel clustering and multivariate linear regression. Our model requires only two iterations to select a configuration, which provides a significant advantage over exhaustive search-based strategies. We apply our model to predict power and performance for different applications using arbitrary configurations, and show that our model, when used with hardware frequency-limiting in a runtime system, selects configurations with significantly higher performance at a given power limit than those chosen by frequency-limiting alone. When applied to a set of 36 computational kernels from a range of applications, our model accurately predicts power and performance; our runtime system based on the model maintains 91% of optimal performance while meeting power constraints 88% of the time. When the runtime system violates a power constraint, it exceeds the constraint by only 6% in the average case, while simultaneously achieving 54% more performance than an oracle. Through the combination of the above contributions, we hope to provide guidance and inspiration to research practitioners working on runtime systems for power-constrained environments. We also hope this dissertation will draw attention to the need for software and runtime-controlled power management under power constraints at various levels, from the processor level to the cluster level.
TECA: A Parallel Toolkit for Extreme Climate Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prabhat, Mr; Ruebel, Oliver; Byna, Surendra
2012-03-12
We present TECA, a parallel toolkit for detecting extreme events in large climate datasets. Modern climate datasets expose parallelism across a number of dimensions: spatial locations, timesteps and ensemble members. We design TECA to exploit these modes of parallelism and demonstrate a prototype implementation for detecting and tracking three classes of extreme events: tropical cyclones, extra-tropical cyclones and atmospheric rivers. We process a modern TB-sized CAM5 simulation dataset with TECA, and demonstrate good runtime performance for the three case studies.
Next-Generation Climate Modeling Science Challenges for Simulation, Workflow and Analysis Systems
NASA Astrophysics Data System (ADS)
Koch, D. M.; Anantharaj, V. G.; Bader, D. C.; Krishnan, H.; Leung, L. R.; Ringler, T.; Taylor, M.; Wehner, M. F.; Williams, D. N.
2016-12-01
We will present two examples of current and future high-resolution climate-modeling research that are challenging existing simulation run-time I/O, model-data movement, storage and publishing, and analysis. In each case, we will consider lessons learned as current workflow systems are broken by these large-data science challenges, as well as strategies to repair or rebuild the systems. First we consider the science and workflow challenges to be posed by the CMIP6 multi-model HighResMIP, involving around a dozen modeling groups performing quarter-degree simulations, in 3-member ensembles for 100 years, with high-frequency (1-6 hourly) diagnostics, which is expected to generate over 4PB of data. An example of science derived from these experiments will be to study how resolution affects the ability of models to capture extreme-events such as hurricanes or atmospheric rivers. Expected methods to transfer (using parallel Globus) and analyze (using parallel "TECA" software tools) HighResMIP data for such feature-tracking by the DOE CASCADE project will be presented. A second example will be from the Accelerated Climate Modeling for Energy (ACME) project, which is currently addressing challenges involving multiple century-scale coupled high resolution (quarter-degree) climate simulations on DOE Leadership Class computers. ACME is anticipating production of over 5PB of data during the next 2 years of simulations, in order to investigate the drivers of water cycle changes, sea-level-rise, and carbon cycle evolution. The ACME workflow, from simulation to data transfer, storage, analysis and publication will be presented. Current and planned methods to accelerate the workflow, including implementing run-time diagnostics, and implementing server-side analysis to avoid moving large datasets will be presented.
Parcels v0.9: prototyping a Lagrangian ocean analysis framework for the petascale age
NASA Astrophysics Data System (ADS)
Lange, Michael; van Sebille, Erik
2017-11-01
As ocean general circulation models (OGCMs) move into the petascale age, where the output of single simulations exceeds petabytes of storage space, tools to analyse the output of these models will need to scale up too. Lagrangian ocean analysis, where virtual particles are tracked through hydrodynamic fields, is an increasingly popular way to analyse OGCM output, by mapping pathways and connectivity of biotic and abiotic particulates. However, the current software stack of Lagrangian ocean analysis codes is not dynamic enough to cope with the increasing complexity, scale and need for customization of use-cases. Furthermore, most community codes are developed for stand-alone use, making it a nontrivial task to integrate virtual particles at runtime of the OGCM. Here, we introduce the new Parcels code, which was designed from the ground up to be sufficiently scalable to cope with petascale computing. We highlight its API design that combines flexibility and customization with the ability to optimize for HPC workflows, following the paradigm of domain-specific languages. Parcels is primarily written in Python, utilizing the wide range of tools available in the scientific Python ecosystem, while generating low-level C code and using just-in-time compilation for performance-critical computation. We show a worked-out example of its API, and validate the accuracy of the code against seven idealized test cases. This version 0.9 of Parcels is focused on laying out the API, with future work concentrating on support for curvilinear grids, optimization, efficiency and at-runtime coupling with OGCMs.
Intensive care window: real-time monitoring and analysis in the intensive care environment.
Stylianides, Nikolas; Dikaiakos, Marios D; Gjermundrød, Harald; Panayi, George; Kyprianou, Theodoros
2011-01-01
This paper introduces a novel, open-source middleware framework for communication with medical devices and an application using the middleware named intensive care window (ICW). The middleware enables communication with intensive care unit bedside-installed medical devices over standard and proprietary communication protocol stacks. The ICW application facilitates the acquisition of vital signs and physiological parameters exported from patient-attached medical devices and sensors. Moreover, ICW provides runtime and post-analysis procedures for data annotation, data visualization, data query, and analysis. The ICW application can be deployed as a stand-alone solution or in conjunction with existing clinical information systems providing a holistic solution to inpatient medical condition monitoring, early diagnosis, and prognosis.
Evolving binary classifiers through parallel computation of multiple fitness cases.
Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni
2005-06-01
This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Distributed memory compiler design for sparse problems
NASA Technical Reports Server (NTRS)
Wu, Janet; Saltz, Joel; Berryman, Harry; Hiranandani, Seema
1991-01-01
A compiler and runtime support mechanism is described and demonstrated. The methods presented are capable of solving a wide range of sparse and unstructured problems in scientific computing. The compiler takes as input a FORTRAN 77 program enhanced with specifications for distributing data, and the compiler outputs a message passing program that runs on a distributed memory computer. The runtime support for this compiler is a library of primitives designed to efficiently support irregular patterns of distributed array accesses and irregular distributed array partitions. A variety of Intel iPSC/860 performance results obtained through the use of this compiler are presented.
Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian
2011-01-01
The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.
Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian
2011-01-01
Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928
Provenance for Runtime Workflow Steering and Validation in Computational Seismology
NASA Astrophysics Data System (ADS)
Spinuso, A.; Krischer, L.; Krause, A.; Filgueira, R.; Magnoni, F.; Muraleedharan, V.; David, M.
2014-12-01
Provenance systems may be offered by modern workflow engines to collect metadata about the data transformations at runtime. If combined with effective visualisation and monitoring interfaces, these provenance recordings can speed up the validation process of an experiment, suggesting interactive or automated interventions with immediate effects on the lifecycle of a workflow run. For instance, in the field of computational seismology, if we consider research applications performing long lasting cross correlation analysis and high resolution simulations, the immediate notification of logical errors and the rapid access to intermediate results, can produce reactions which foster a more efficient progress of the research. These applications are often executed in secured and sophisticated HPC and HTC infrastructures, highlighting the need for a comprehensive framework that facilitates the extraction of fine grained provenance and the development of provenance aware components, leveraging the scalability characteristics of the adopted workflow engines, whose enactment can be mapped to different technologies (MPI, Storm clusters, etc). This work looks at the adoption of W3C-PROV concepts and data model within a user driven processing and validation framework for seismic data, supporting also computational and data management steering. Validation needs to balance automation with user intervention, considering the scientist as part of the archiving process. Therefore, the provenance data is enriched with community-specific metadata vocabularies and control messages, making an experiment reproducible and its description consistent with the community understandings. Moreover, it can contain user defined terms and annotations. The current implementation of the system is supported by the EU-Funded VERCE (http://verce.eu). It provides, as well as the provenance generation mechanisms, a prototypal browser-based user interface and a web API built on top of a NoSQL storage technology, experimenting ways to ensure a rapid and flexible access to the lineage traces. It supports the users with the visualisation of graphical products and offers combined operations to access and download the data which may be selectively stored at runtime, into dedicated data archives.
A performance model for GPUs with caches
Dao, Thanh Tuan; Kim, Jungwon; Seo, Sangmin; ...
2014-06-24
To exploit the abundant computational power of the world's fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, accurate performance estimation models for modern GPUs do not exist. This paper presents two accurate models for modern GPUs: a sampling-based linear model, and a model based on machine-learning (ML) techniques which improves the accuracy of the linear model and is applicable to modern GPUs with and without caches. We first construct the sampling-based linear model to predict the runtime of an arbitrary OpenCL kernel. Based on anmore » analysis of NVIDIA GPUs' scheduling policies we determine the earliest sampling points that allow an accurate estimation. The linear model cannot capture well the significant effects that memory coalescing or caching as implemented in modern GPUs have on performance. We therefore propose a model based on ML techniques that takes several compiler-generated statistics about the kernel as well as the GPU's hardware performance counters as additional inputs to obtain a more accurate runtime performance estimation for modern GPUs. We demonstrate the effectiveness and broad applicability of the model by applying it to three different NVIDIA GPU architectures and one AMD GPU architecture. On an extensive set of OpenCL benchmarks, on average, the proposed model estimates the runtime performance with less than 7 percent error for a second-generation GTX 280 with no on-chip caches and less than 5 percent for the Fermi-based GTX 580 with hardware caches. On the Kepler-based GTX 680, the linear model has an error of less than 10 percent. On an AMD GPU architecture, Radeon HD 6970, the model estimates with 8 percent of error rates. As a result, the proposed technique outperforms existing models by a factor of 5 to 6 in terms of accuracy.« less
Impact of backwashing procedures on deep bed filtration productivity in drinking water treatment.
Slavik, Irene; Jehmlich, Alexander; Uhl, Wolfgang
2013-10-15
Backwash procedures for deep bed filters were evaluated and compared by means of a new integrated approach based on productivity. For this, different backwash procedures were experimentally evaluated by using a pilot plant for direct filtration. A standard backwash mode as applied in practice served as a reference and effluent turbidity was used as the criterion for filter run termination. The backwash water volumes needed, duration of the filter-to-waste period, time out of operation, total volume discharged and filter run-time were determined and used to calculate average filtration velocity and average productivity. Results for filter run-times, filter backwash volumes, and filter-to-waste volumes showed considerable differences between the backwash procedures. Thus, backwash procedures with additional clear flushing phases were characterised by an increased need for backwash water. However, this additional water consumption could not be compensated by savings during filter ripening. Compared to the reference backwash procedure, filter run-times were longer for both single-media and dual-media filters when air scour and air/water flush were optimised with respect to flow rates and the proportion of air and water. This means that drinking water production time is longer and less water is needed for filter bed cleaning. Also, backwashing with additional clear flushing phases resulted in longer filter run-times before turbidity breakthrough. However, regarding the productivity of the filtration process, it was shown that it was almost the same for all of the backwash procedures investigated in this study. Due to this unexpected finding, the relationships between filter bed cleaning, filter ripening and filtration performance were considered and important conclusions and new approaches for process optimisation and resource savings were derived. Copyright © 2013 Elsevier Ltd. All rights reserved.
Architectural Analysis of Dynamically Reconfigurable Systems
NASA Technical Reports Server (NTRS)
Lindvall, Mikael; Godfrey, Sally; Ackermann, Chris; Ray, Arnab; Yonkwa, Lyly
2010-01-01
oTpics include: the problem (increased flexibility of architectural styles decrease analyzability, behavior emerges and varies depending on the configuration, does the resulting system run according to the intended design, and architectural decisions can impede or facilitate testing); top down approach to architecture analysis, detection of defects and deviations, and architecture and its testability; currently targeted projects GMSEC and CFS; analyzing software architectures; analyzing runtime events; actual architecture recognition; GMPUB in Dynamic SAVE; sample output from new approach; taking message timing delays into account; CFS examples of architecture and testability; some recommendations for improved testablity; and CFS examples of abstract interfaces and testability; CFS example of opening some internal details.
Feature Selection Methods for Zero-Shot Learning of Neural Activity.
Caceres, Carlos A; Roos, Matthew J; Rupp, Kyle M; Milsap, Griffin; Crone, Nathan E; Wolmetz, Michael E; Ratto, Christopher R
2017-01-01
Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy.
Highlights of X-Stack ExM Deliverable: MosaStore
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ripeanu, Matei
2016-07-20
This brief report highlights the experience gained with MosaStore, an exploratory part of the X-Stack project “ExM: System support for extreme-scale, many-task applications”. The ExM project proposed to use concurrent workflows supported by the Swift language and runtime as an innovative programming model to exploit parallelism in exascale computers. MosaStore aims to support this endeavor by improving storage support for workflow-based applications, more precisely by exploring the gains that can be obtained from co-designing the storage system and the workflow runtime engine. MosaStore has been developed primarily at the University of British Columbia.
A new order-theoretic characterisation of the polytime computable functions☆
Avanzini, Martin; Eguchi, Naohi; Moser, Georg
2015-01-01
We propose a new order-theoretic characterisation of the class of polytime computable functions. To this avail we define the small polynomial path order (sPOP⁎ for short). This termination order entails a new syntactic method to analyse the innermost runtime complexity of term rewrite systems fully automatically: for any rewrite system compatible with sPOP⁎ that employs recursion up to depth d, the (innermost) runtime complexity is polynomially bounded of degree d. This bound is tight. Thus we obtain a direct correspondence between a syntactic (and easily verifiable) condition of a program and the asymptotic worst-case complexity of the program. PMID:26412933
Monitoring Object Library Usage and Changes
NASA Technical Reports Server (NTRS)
Owen, R. K.; Craw, James M. (Technical Monitor)
1995-01-01
The NASA Ames Numerical Aerodynamic Simulation program Aeronautics Consolidated Supercomputing Facility (NAS/ACSF) supercomputing center services over 1600 users, and has numerous analysts with root access. Several tools have been developed to monitor object library usage and changes. Some of the tools do "noninvasive" monitoring and other tools implement run-time logging even for object-only libraries. The run-time logging identifies who, when, and what is being used. The benefits are that real usage can be measured, unused libraries can be discontinued, training and optimization efforts can be focused at those numerical methods that are actually used. An overview of the tools will be given and the results will be discussed.
Onboard Run-Time Goal Selection for Autonomous Operations
NASA Technical Reports Server (NTRS)
Rabideau, Gregg; Chien, Steve; McLaren, David
2010-01-01
We describe an efficient, online goal selection algorithm for use onboard spacecraft and its use for selecting goals at runtime. Our focus is on the re-planning that must be performed in a timely manner on the embedded system where computational resources are limited. In particular, our algorithm generates near optimal solutions to problems with fully specified goal requests that oversubscribe available resources but have no temporal flexibility. By using a fast, incremental algorithm, goal selection can be postponed in a "just-in-time" fashion allowing requests to be changed or added at the last minute. This enables shorter response cycles and greater autonomy for the system under control.
Holistic Context-Sensitivity for Run-Time Optimization of Flexible Manufacturing Systems.
Scholze, Sebastian; Barata, Jose; Stokic, Dragan
2017-02-24
Highly flexible manufacturing systems require continuous run-time (self-) optimization of processes with respect to diverse parameters, e.g., efficiency, availability, energy consumption etc. A promising approach for achieving (self-) optimization in manufacturing systems is the usage of the context sensitivity approach based on data streaming from high amount of sensors and other data sources. Cyber-physical systems play an important role as sources of information to achieve context sensitivity. Cyber-physical systems can be seen as complex intelligent sensors providing data needed to identify the current context under which the manufacturing system is operating. In this paper, it is demonstrated how context sensitivity can be used to realize a holistic solution for (self-) optimization of discrete flexible manufacturing systems, by making use of cyber-physical systems integrated in manufacturing systems/processes. A generic approach for context sensitivity, based on self-learning algorithms, is proposed aiming at a various manufacturing systems. The new solution encompasses run-time context extractor and optimizer. Based on the self-learning module both context extraction and optimizer are continuously learning and improving their performance. The solution is following Service Oriented Architecture principles. The generic solution is developed and then applied to two very different manufacturing processes.
Holistic Context-Sensitivity for Run-Time Optimization of Flexible Manufacturing Systems
Scholze, Sebastian; Barata, Jose; Stokic, Dragan
2017-01-01
Highly flexible manufacturing systems require continuous run-time (self-) optimization of processes with respect to diverse parameters, e.g., efficiency, availability, energy consumption etc. A promising approach for achieving (self-) optimization in manufacturing systems is the usage of the context sensitivity approach based on data streaming from high amount of sensors and other data sources. Cyber-physical systems play an important role as sources of information to achieve context sensitivity. Cyber-physical systems can be seen as complex intelligent sensors providing data needed to identify the current context under which the manufacturing system is operating. In this paper, it is demonstrated how context sensitivity can be used to realize a holistic solution for (self-) optimization of discrete flexible manufacturing systems, by making use of cyber-physical systems integrated in manufacturing systems/processes. A generic approach for context sensitivity, based on self-learning algorithms, is proposed aiming at a various manufacturing systems. The new solution encompasses run-time context extractor and optimizer. Based on the self-learning module both context extraction and optimizer are continuously learning and improving their performance. The solution is following Service Oriented Architecture principles. The generic solution is developed and then applied to two very different manufacturing processes. PMID:28245564
Preventing SQL Code Injection by Combining Static and Runtime Analysis
2008-05-01
attacker changes the developer’s intended structure of an SQ L com- mand by inserting new SQ L keywords or operators. (Su and Wasser - mann provide a...FROM b o o k s WHERE a u t h o r = ’ ’ GROUP BY r a t i n g We use symbol as a placeholder for the indeterminate part of the command (in this...dialects of SQL.) In our model, we mark transitions that correspond to externally defined strings with the symbol . To illustrate, Figure 2 shows the SQL
Synthesizing Dynamic Programming Algorithms from Linear Temporal Logic Formulae
NASA Technical Reports Server (NTRS)
Rosu, Grigore; Havelund, Klaus
2001-01-01
The problem of testing a linear temporal logic (LTL) formula on a finite execution trace of events, generated by an executing program, occurs naturally in runtime analysis of software. We present an algorithm which takes an LTL formula and generates an efficient dynamic programming algorithm. The generated algorithm tests whether the LTL formula is satisfied by a finite trace of events given as input. The generated algorithm runs in linear time, its constant depending on the size of the LTL formula. The memory needed is constant, also depending on the size of the formula.
An object oriented Python interface for atomistic simulations
NASA Astrophysics Data System (ADS)
Hynninen, T.; Himanen, L.; Parkkinen, V.; Musso, T.; Corander, J.; Foster, A. S.
2016-01-01
Programmable simulation environments allow one to monitor and control calculations efficiently and automatically before, during, and after runtime. Environments directly accessible in a programming environment can be interfaced with powerful external analysis tools and extensions to enhance the functionality of the core program, and by incorporating a flexible object based structure, the environments make building and analysing computational setups intuitive. In this work, we present a classical atomistic force field with an interface written in Python language. The program is an extension for an existing object based atomistic simulation environment.
Lightweight Adaptation of Classifiers to Users and Contexts: Trends of the Emerging Domain
Vildjiounaite, Elena; Gimel'farb, Georgy; Kyllönen, Vesa; Peltola, Johannes
2015-01-01
Intelligent computer applications need to adapt their behaviour to contexts and users, but conventional classifier adaptation methods require long data collection and/or training times. Therefore classifier adaptation is often performed as follows: at design time application developers define typical usage contexts and provide reasoning models for each of these contexts, and then at runtime an appropriate model is selected from available ones. Typically, definition of usage contexts and reasoning models heavily relies on domain knowledge. However, in practice many applications are used in so diverse situations that no developer can predict them all and collect for each situation adequate training and test databases. Such applications have to adapt to a new user or unknown context at runtime just from interaction with the user, preferably in fairly lightweight ways, that is, requiring limited user effort to collect training data and limited time of performing the adaptation. This paper analyses adaptation trends in several emerging domains and outlines promising ideas, proposed for making multimodal classifiers user-specific and context-specific without significant user efforts, detailed domain knowledge, and/or complete retraining of the classifiers. Based on this analysis, this paper identifies important application characteristics and presents guidelines to consider these characteristics in adaptation design. PMID:26473165
Guaranteeing Isochronous Control of Networked Motion Control Systems Using Phase Offset Adjustment
Kim, Ikhwan; Kim, Taehyoun
2015-01-01
Guaranteeing isochronous transfer of control commands is an essential function for networked motion control systems. The adoption of real-time Ethernet (RTE) technologies may be profitable in guaranteeing deterministic transfer of control messages. However, unpredictable behavior of software in the motion controller often results in unexpectedly large deviation in control message transmission intervals, and thus leads to imprecise motion. This paper presents a simple and efficient heuristic to guarantee the end-to-end isochronous control with very small jitter. The key idea of our approach is to adjust the phase offset of control message transmission time in the motion controller by investigating the behavior of motion control task. In realizing the idea, we performed a pre-runtime analysis to determine a safe and reliable phase offset and applied the phase offset to the runtime code of motion controller by customizing an open-source based integrated development environment (IDE). We also constructed an EtherCAT-based motion control system testbed and performed extensive experiments on the testbed to verify the effectiveness of our approach. The experimental results show that our heuristic is highly effective even for low-end embedded controller implemented in open-source software components under various configurations of control period and the number of motor drives. PMID:26076407
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hansen, Timothy M.; Palmintier, Bryan; Suryanarayanan, Siddharth
As more Smart Grid technologies (e.g., distributed photovoltaic, spatially distributed electric vehicle charging) are integrated into distribution grids, static distribution simulations are no longer sufficient for performing modeling and analysis. GridLAB-D is an agent-based distribution system simulation environment that allows fine-grained end-user models, including geospatial and network topology detail. A problem exists in that, without outside intervention, once the GridLAB-D simulation begins execution, it will run to completion without allowing the real-time interaction of Smart Grid controls, such as home energy management systems and aggregator control. We address this lack of runtime interaction by designing a flexible communication interface, Bus.pymore » (pronounced bus-dot-pie), that uses Python to pass messages between one or more GridLAB-D instances and a Smart Grid simulator. This work describes the design and implementation of Bus.py, discusses its usefulness in terms of some Smart Grid scenarios, and provides an example of an aggregator-based residential demand response system interacting with GridLAB-D through Bus.py. The small scale example demonstrates the validity of the interface and shows that an aggregator using said interface is able to control residential loads in GridLAB-D during runtime to cause a reduction in the peak load on the distribution system in (a) peak reduction and (b) time-of-use pricing cases.« less
Equation-free analysis of agent-based models and systematic parameter determination
NASA Astrophysics Data System (ADS)
Thomas, Spencer A.; Lloyd, David J. B.; Skeldon, Anne C.
2016-12-01
Agent based models (ABM)s are increasingly used in social science, economics, mathematics, biology and computer science to describe time dependent systems in circumstances where a description in terms of equations is difficult. Yet few tools are currently available for the systematic analysis of ABM behaviour. Numerical continuation and bifurcation analysis is a well-established tool for the study of deterministic systems. Recently, equation-free (EF) methods have been developed to extend numerical continuation techniques to systems where the dynamics are described at a microscopic scale and continuation of a macroscopic property of the system is considered. To date, the practical use of EF methods has been limited by; (1) the over-head of application-specific implementation; (2) the laborious configuration of problem-specific parameters; and (3) large ensemble sizes (potentially) leading to computationally restrictive run-times. In this paper we address these issues with our tool for the EF continuation of stochastic systems, which includes algorithms to systematically configuration problem specific parameters and enhance robustness to noise. Our tool is generic and can be applied to any 'black-box' simulator and determines the essential EF parameters prior to EF analysis. Robustness is significantly improved using our convergence-constraint with a corrector-repeat (C3R) method. This algorithm automatically detects outliers based on the dynamics of the underlying system enabling both an order of magnitude reduction in ensemble size and continuation of systems at much higher levels of noise than classical approaches. We demonstrate our method with application to several ABM models, revealing parameter dependence, bifurcation and stability analysis of these complex systems giving a deep understanding of the dynamical behaviour of the models in a way that is not otherwise easily obtainable. In each case we demonstrate our systematic parameter determination stage for configuring the system specific EF parameters.
Downey, Mark O; Rochfort, Simone
2008-08-01
A limitation of large-scale viticultural trials is the time and cost of comprehensive compositional analysis of the fruit by high-performance liquid chromatography (HPLC). In addition, separate methods have generally been required to identify and quantify different classes of metabolites. To address these shortcomings a reversed-phase HPLC method was developed to simultaneously separate the anthocyanins and flavonols present in grape skins. The method employs a methanol and water gradient acidified with 10% formic acid with a run-time of 48 min including re-equilibration. Identity of anthocyanins and flavonols in Shiraz (Vitis vinifera L.) skin was confirmed by mass spectral analysis.
Extracting the Essential Cartographic Functionality of Programs on the Web
NASA Astrophysics Data System (ADS)
Ledermann, Florian
2018-05-01
Following Aristotle, F. P. Brooks (1987) emphasizes the distinction between "essential difficulties" and "accidental difficulties" as a key challenge in software engineering. From the point of view of cartography, it would be desirable to identify the cartographic essence of a program, and subject it to additional scrutiny, while its accidental proper-ties, again from the point of view of cartography, are usually of lesser relevance to cartographic analysis. In this paper, two methods that facilitate extracting the cartographic essence of programs are presented: close reading of their source code, and the automated analysis of their runtime behavior. The advantages and shortcomings of both methods are discussed, followed by an outlook to future developments and potential applications.
Hettiarachchi, Kanaka; Talu, Esra; Longo, Marjorie L.; Dayton, Paul A.; Lee, Abraham P.
2007-01-01
This paper presents a new manufacturing method to generate monodisperse microbubble contrast agents with polydispersity index (σ) values of <2% through microfluidic flow-focusing. Micron-sized lipid shell-based perfluorocarbon (PFC) gas microbubbles for use as ultrasound contrast agents were produced using this method. The poly(dimethylsiloxane) (PDMS)-based devices feature expanding nozzle geometry with a 7 μm orifice width, and are robust enough for consistent production of microbubbles with runtimes lasting several hours. With high-speed imaging, we characterized relationships between channel geometry, liquid flow rate Q, and gas pressure P in controlling bubble sizes. By a simple optimization of the channel geometry and Q and P, bubbles with a mean diameter of <5 μm can be obtained, ideal for various ultrasonic imaging applications. This method demonstrates the potential of microfluidics as an efficient means for custom-designing ultrasound contrast agents with precise size distributions, different gas compositions and new shell materials for stabilization, and for future targeted imaging and therapeutic applications. PMID:17389962
Execution time supports for adaptive scientific algorithms on distributed memory machines
NASA Technical Reports Server (NTRS)
Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey
1990-01-01
Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
Rapid Processing of Radio Interferometer Data for Transient Surveys
NASA Astrophysics Data System (ADS)
Bourke, S.; Mooley, K.; Hallinan, G.
2014-05-01
We report on a software infrastructure and pipeline developed to process large radio interferometer datasets. The pipeline is implemented using a radical redesign of the AIPS processing model. An infrastructure we have named AIPSlite is used to spawn, at runtime, minimal AIPS environments across a cluster. The pipeline then distributes and processes its data in parallel. The system is entirely free of the traditional AIPS distribution and is self configuring at runtime. This software has so far been used to process a EVLA Stripe 82 transient survey, the data for the JVLA-COSMOS project, and has been used to process most of the EVLA L-Band data archive imaging each integration to search for short duration transients.
Execution time support for scientific programs on distributed memory machines
NASA Technical Reports Server (NTRS)
Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey
1990-01-01
Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
An extension of the OpenModelica compiler for using Modelica models in a discrete event simulation
Nutaro, James
2014-11-03
In this article, a new back-end and run-time system is described for the OpenModelica compiler. This new back-end transforms a Modelica model into a module for the adevs discrete event simulation package, thereby extending adevs to encompass complex, hybrid dynamical systems. The new run-time system that has been built within the adevs simulation package supports models with state-events and time-events and that comprise differential-algebraic systems with high index. Finally, although the procedure for effecting this transformation is based on adevs and the Discrete Event System Specification, it can be adapted to any discrete event simulation package.
SERENITY in e-Business and Smart Item Scenarios
NASA Astrophysics Data System (ADS)
Benameur, Azzedine; Khoury, Paul El; Seguran, Magali; Sinha, Smriti Kumar
SERENITY Artefacts, like Class, Patterns, Implementations and Executable Components for Security & Dependability (S&D) in addition to Serenity Runtime Framework (SRF) are discussed in previous chapters. How to integrate these artefacts with applications in Serenity approach is discussed here with two scenarios. The e-Business scenario is a standard loan origination process in a bank. The Smart Item scenario is an Ambient intelligence case study where we take advantage of Smart Items to provide an electronic healthcare infrastructure for remote healthcare assistance. In both cases, we detail how the prototype implementations of the scenarios select proper executable components through Serenity Runtime Framework and then demonstrate how these executable components of the S&D Patterns are deployed.
Satyanarayana Raju, T; Vishweshwari Kutty, O; Ganesh, V; Yadagiri Swamy, P
2012-08-01
Although a number of methods are available for evaluating Linezolid and its possible impurities, a common method for separation if its potential impurities, degradants and enantiomer in a single method with good efficiency remain unavailable. With the objective of developing an advanced method with shorter runtimes, a simple, precise, accurate stability-indicating LC method was developed for the determination of purity of Linezolid drug substance and drug products in bulk samples and pharmaceutical dosage forms in the presence of its impurities and degradation products. This method is capable of separating all the related substances of Linezolid along with the chiral impurity. This method can also be used for the estimation of assay of Linezolid in drug substance as well as in drug product. The method was developed using Chiralpak IA (250 mm×4.6 mm, 5 μm) column. A mixture of acetonitrile, ethanol, n-butyl amine and trifluoro acetic acid in 96:4:0.10:0.16 (v/v/v/v) ratio was used as a mobile phase. The eluted compounds were monitored at 254 nm. Linezolid was subjected to the stress conditions of oxidative, acid, base, hydrolytic, thermal and photolytic degradation. The degradation products were well resolved from main peak and its impurities, proving the stability-indicating power of the method. The developed method was validated as per International Conference on Harmonization (ICH) guidelines with respect to specificity, limit of detection, limit of quantification, precision, linearity, accuracy, robustness and system suitability.
The Automated Instrumentation and Monitoring System (AIMS) reference manual
NASA Technical Reports Server (NTRS)
Yan, Jerry; Hontalas, Philip; Listgarten, Sherry
1993-01-01
Whether a researcher is designing the 'next parallel programming paradigm,' another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of execution traces can help computer designers and software architects to uncover system behavior and to take advantage of specific application characteristics and hardware features. A software tool kit that facilitates performance evaluation of parallel applications on multiprocessors is described. The Automated Instrumentation and Monitoring System (AIMS) has four major software components: a source code instrumentor which automatically inserts active event recorders into the program's source code before compilation; a run time performance-monitoring library, which collects performance data; a trace file animation and analysis tool kit which reconstructs program execution from the trace file; and a trace post-processor which compensate for data collection overhead. Besides being used as prototype for developing new techniques for instrumenting, monitoring, and visualizing parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware test beds to evaluate their impact on user productivity. Currently, AIMS instrumentors accept FORTRAN and C parallel programs written for Intel's NX operating system on the iPSC family of multi computers. A run-time performance-monitoring library for the iPSC/860 is included in this release. We plan to release monitors for other platforms (such as PVM and TMC's CM-5) in the near future. Performance data collected can be graphically displayed on workstations (e.g. Sun Sparc and SGI) supporting X-Windows (in particular, Xl IR5, Motif 1.1.3).
pong: fast analysis and visualization of latent clusters in population genetic data.
Behr, Aaron A; Liu, Katherine Z; Liu-Fang, Gracie; Nakka, Priyanka; Ramachandran, Sohini
2016-09-15
A series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is often varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from mixed-membership models are bottlenecks for investigators across multiple disciplines from ecology to text data mining. We introduce pong, a network-graphical approach for analyzing and visualizing membership in latent clusters with a native interactive D3.js visualization. pong leverages efficient algorithms for solving the Assignment Problem to dramatically reduce runtime while increasing accuracy compared with other methods that process output from mixed-membership models. We apply pong to 225 705 unlinked genome-wide single-nucleotide variants from 2426 unrelated individuals in the 1000 Genomes Project, and identify previously overlooked aspects of global human population structure. We show that pong outpaces current solutions by more than an order of magnitude in runtime while providing a customizable and interactive visualization of population structure that is more accurate than those produced by current tools. pong is freely available and can be installed using the Python package management system pip. pong's source code is available at https://github.com/abehr/pong aaron_behr@alumni.brown.edu or sramachandran@brown.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
On the Modeling and Management of Cloud Data Analytics
NASA Astrophysics Data System (ADS)
Castillo, Claris; Tantawi, Asser; Steinder, Malgorzata; Pacifici, Giovanni
A new era is dawning where vast amount of data is subjected to intensive analysis in a cloud computing environment. Over the years, data about a myriad of things, ranging from user clicks to galaxies, have been accumulated, and continue to be collected, on storage media. The increasing availability of such data, along with the abundant supply of compute power and the urge to create useful knowledge, gave rise to a new data analytics paradigm in which data is subjected to intensive analysis, and additional data is created in the process. Meanwhile, a new cloud computing environment has emerged where seemingly limitless compute and storage resources are being provided to host computation and data for multiple users through virtualization technologies. Such a cloud environment is becoming the home for data analytics. Consequently, providing good performance at run-time to data analytics workload is an important issue for cloud management. In this paper, we provide an overview of the data analytics and cloud environment landscapes, and investigate the performance management issues related to running data analytics in the cloud. In particular, we focus on topics such as workload characterization, profiling analytics applications and their pattern of data usage, cloud resource allocation, placement of computation and data and their dynamic migration in the cloud, and performance prediction. In solving such management problems one relies on various run-time analytic models. We discuss approaches for modeling and optimizing the dynamic data analytics workload in the cloud environment. All along, we use the Map-Reduce paradigm as an illustration of data analytics.
Implications of Responsive Space on the Flight Software Architecture
NASA Technical Reports Server (NTRS)
Wilmot, Jonathan
2006-01-01
The Responsive Space initiative has several implications for flight software that need to be addressed not only within the run-time element, but the development infrastructure and software life-cycle process elements as well. The runtime element must at a minimum support Plug & Play, while the development and process elements need to incorporate methods to quickly generate the needed documentation, code, tests, and all of the artifacts required of flight quality software. Very rapid response times go even further, and imply little or no new software development, requiring instead, using only predeveloped and certified software modules that can be integrated and tested through automated methods. These elements have typically been addressed individually with significant benefits, but it is when they are combined that they can have the greatest impact to Responsive Space. The Flight Software Branch at NASA's Goddard Space Flight Center has been developing the runtime, infrastructure and process elements needed for rapid integration with the Core Flight software System (CFS) architecture. The CFS architecture consists of three main components; the core Flight Executive (cFE), the component catalog, and the Integrated Development Environment (DE). This paper will discuss the design of the components, how they facilitate rapid integration, and lessons learned as the architecture is utilized for an upcoming spacecraft.
Traleika Glacier X-Stack Extension Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fryman, Joshua
The XStack Extension Project continued along the direction of the XStack program in exploring the software tools and frameworks to support a task-based community runtime towards the goal of Exascale programming. The momentum built as part of the XStack project, with the development of the task-based Open Community Runtime (OCR) and related tools, was carried through during the XStack Extension with the focus areas of easing application development, improving performance and supporting more features. The infrastructure set up for a community-driven open-source development continued to be used towards these areas, with continued co-development of runtime and applications. A variety ofmore » OCR programming environments were studied, as described in Sections Revolutionary Programming Environments & Applications – to assist with application development on OCR, and we develop OCR Translator, a ROSE-based source-to-source compiler that parses high-level annotations in an MPI program to generate equivalent OCR code. Figure 2 compares the number of OCR objects needed to generate the 2D stencil workload using the translator, against manual approaches based on SPMD library or native coding. The rate of increase with the translator, with an increase in number of ranks, is consistent with other approaches. This is explored further in Section OCR Translator.« less
Optimized distributed computing environment for mask data preparation
NASA Astrophysics Data System (ADS)
Ahn, Byoung-Sup; Bang, Ju-Mi; Ji, Min-Kyu; Kang, Sun; Jang, Sung-Hoon; Choi, Yo-Han; Ki, Won-Tai; Choi, Seong-Woon; Han, Woo-Sung
2005-11-01
As the critical dimension (CD) becomes smaller, various resolution enhancement techniques (RET) are widely adopted. In developing sub-100nm devices, the complexity of optical proximity correction (OPC) is severely increased and applied OPC layers are expanded to non-critical layers. The transformation of designed pattern data by OPC operation causes complexity, which cause runtime overheads to following steps such as mask data preparation (MDP), and collapse of existing design hierarchy. Therefore, many mask shops exploit the distributed computing method in order to reduce the runtime of mask data preparation rather than exploit the design hierarchy. Distributed computing uses a cluster of computers that are connected to local network system. However, there are two things to limit the benefit of the distributing computing method in MDP. First, every sequential MDP job, which uses maximum number of available CPUs, is not efficient compared to parallel MDP job execution due to the input data characteristics. Second, the runtime enhancement over input cost is not sufficient enough since the scalability of fracturing tools is limited. In this paper, we will discuss optimum load balancing environment that is useful in increasing the uptime of distributed computing system by assigning appropriate number of CPUs for each input design data. We will also describe the distributed processing (DP) parameter optimization to obtain maximum throughput in MDP job processing.
Bayesian Model Selection under Time Constraints
NASA Astrophysics Data System (ADS)
Hoege, M.; Nowak, W.; Illman, W. A.
2017-12-01
Bayesian model selection (BMS) provides a consistent framework for rating and comparing models in multi-model inference. In cases where models of vastly different complexity compete with each other, we also face vastly different computational runtimes of such models. For instance, time series of a quantity of interest can be simulated by an autoregressive process model that takes even less than a second for one run, or by a partial differential equations-based model with runtimes up to several hours or even days. The classical BMS is based on a quantity called Bayesian model evidence (BME). It determines the model weights in the selection process and resembles a trade-off between bias of a model and its complexity. However, in practice, the runtime of models is another weight relevant factor for model selection. Hence, we believe that it should be included, leading to an overall trade-off problem between bias, variance and computing effort. We approach this triple trade-off from the viewpoint of our ability to generate realizations of the models under a given computational budget. One way to obtain BME values is through sampling-based integration techniques. We argue with the fact that more expensive models can be sampled much less under time constraints than faster models (in straight proportion to their runtime). The computed evidence in favor of a more expensive model is statistically less significant than the evidence computed in favor of a faster model, since sampling-based strategies are always subject to statistical sampling error. We present a straightforward way to include this misbalance into the model weights that are the basis for model selection. Our approach follows directly from the idea of insufficient significance. It is based on a computationally cheap bootstrapping error estimate of model evidence and is easy to implement. The approach is illustrated in a small synthetic modeling study.
Analysis and synthesis of abstract data types through generalization from examples
NASA Technical Reports Server (NTRS)
Wild, Christian
1987-01-01
The discovery of general patterns of behavior from a set of input/output examples can be a useful technique in the automated analysis and synthesis of software systems. These generalized descriptions of the behavior form a set of assertions which can be used for validation, program synthesis, program testing and run-time monitoring. Describing the behavior is characterized as a learning process in which general patterns can be easily characterized. The learning algorithm must choose a transform function and define a subset of the transform space which is related to equivalence classes of behavior in the original domain. An algorithm for analyzing the behavior of abstract data types is presented and several examples are given. The use of the analysis for purposes of program synthesis is also discussed.
Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms.
De Sa, Christopher; Zhang, Ce; Olukotun, Kunle; Ré, Christopher
2015-12-01
Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning problems. Researchers and industry have developed several techniques to optimize SGD's runtime performance, including asynchronous execution and reduced precision. Our main result is a martingale-based analysis that enables us to capture the rich noise models that may arise from such techniques. Specifically, we use our new analysis in three ways: (1) we derive convergence rates for the convex case (Hogwild!) with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called Buckwild!, that uses lower-precision arithmetic. We show experimentally that our algorithms run efficiently for a variety of problems on modern hardware.
Using Block-local Atomicity to Detect Stale-value Concurrency Errors
NASA Technical Reports Server (NTRS)
Artho, Cyrille; Havelund, Klaus; Biere, Armin
2004-01-01
Data races do not cover all kinds of concurrency errors. This paper presents a data-flow-based technique to find stale-value errors, which are not found by low-level and high-level data race algorithms. Stale values denote copies of shared data where the copy is no longer synchronized. The algorithm to detect such values works as a consistency check that does not require any assumptions or annotations of the program. It has been implemented as a static analysis in JNuke. The analysis is sound and requires only a single execution trace if implemented as a run-time checking algorithm. Being based on an analysis of Java bytecode, it encompasses the full program semantics, including arbitrarily complex expressions. Related techniques are more complex and more prone to over-reporting.
An efficient pseudomedian filter for tiling microrrays.
Royce, Thomas E; Carriero, Nicholas J; Gerstein, Mark B
2007-06-07
Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a O(n2logn) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution. We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of n numbers to O(nlogn) from O(n2logn). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple O(log n) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets. Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at http://tiling.gersteinlab.org/pseudomedian/.
An efficient pseudomedian filter for tiling microrrays
Royce, Thomas E; Carriero, Nicholas J; Gerstein, Mark B
2007-01-01
Background Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a O(n2logn) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution. Results We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of n numbers to O(nlogn) from O(n2logn). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple O(log n) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets. Conclusion Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at . PMID:17555595
Detecting Heap-Spraying Code Injection Attacks in Malicious Web Pages Using Runtime Execution
NASA Astrophysics Data System (ADS)
Choi, Younghan; Kim, Hyoungchun; Lee, Donghoon
The growing use of web services is increasing web browser attacks exponentially. Most attacks use a technique called heap spraying because of its high success rate. Heap spraying executes a malicious code without indicating the exact address of the code by copying it into many heap objects. For this reason, the attack has a high potential to succeed if only the vulnerability is exploited. Thus, attackers have recently begun using this technique because it is easy to use JavaScript to allocate the heap memory area. This paper proposes a novel technique that detects heap spraying attacks by executing a heap object in a real environment, irrespective of the version and patch status of the web browser. This runtime execution is used to detect various forms of heap spraying attacks, such as encoding and polymorphism. Heap objects are executed after being filtered on the basis of patterns of heap spraying attacks in order to reduce the overhead of the runtime execution. Patterns of heap spraying attacks are based on analysis of how an web browser accesses benign web sites. The heap objects are executed forcibly by changing the instruction register into the address of them after being loaded into memory. Thus, we can execute the malicious code without having to consider the version and patch status of the browser. An object is considered to contain a malicious code if the execution reaches a call instruction and then the instruction accesses the API of system libraries, such as kernel32.dll and ws_32.dll. To change registers and monitor execution flow, we used a debugger engine. A prototype, named HERAD(HEap spRAying Detector), is implemented and evaluated. In experiments, HERAD detects various forms of exploit code that an emulation cannot detect, and some heap spraying attacks that NOZZLE cannot detect. Although it has an execution overhead, HERAD produces a low number of false alarms. The processing time of several minutes is negligible because our research focuses on detecting heap spraying. This research can be applied to existing systems that collect malicious codes, such as Honeypot.
TOOKUIL: A case study in user interface development for safety code application
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gray, D.L.; Harkins, C.K.; Hoole, J.G.
1997-07-01
Traditionally, there has been a very high learning curve associated with using nuclear power plant (NPP) analysis codes. Even for seasoned plant analysts and engineers, the process of building or modifying an input model for present day NPP analysis codes is tedious, error prone, and time consuming. Current cost constraints and performance demands place an additional burden on today`s safety analysis community. Advances in graphical user interface (GUI) technology have been applied to obtain significant productivity and quality assurance improvements for the Transient Reactor Analysis Code (TRAC) input model development. KAPL Inc. has developed an X Windows-based graphical user interfacemore » named TOOKUIL which supports the design and analysis process, acting as a preprocessor, runtime editor, help system, and post processor for TRAC. This paper summarizes the objectives of the project, the GUI development process and experiences, and the resulting end product, TOOKUIL.« less
Generalizing the extensibility of a dynamic geometry software
NASA Astrophysics Data System (ADS)
Herceg, Đorđe; Radaković, Davorka; Herceg, Dejana
2012-09-01
Plug-and-play visual components in a Dynamic Geometry Software (DGS) enable development of visually attractive, rich and highly interactive dynamic drawings. We are developing SLGeometry, a DGS that contains a custom programming language, a computer algebra system (CAS engine) and a graphics subsystem. The basic extensibility framework on SLGeometry supports dynamic addition of new functions from attribute annotated classes that implement runtime metadata registration in code. We present a general plug-in framework for dynamic importing of arbitrary Silverlight user interface (UI) controls into SLGeometry at runtime. The CAS engine maintains a metadata storage that describes each imported visual component and enables two-way communication between the expressions stored in the engine and the UI controls on the screen.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collins, Benjamin S.; Hamilton, Steven P.; Jarrett, Michael G.
This report describes the performance improvements made to the VERA Core Simulator (VERA-CS) during FY2016. The development of the VERA Core Simulator has focused on the capability needed to deplete physical reactors and help solve various problems; this capability required the accurate simulation of many operating cycles of a nuclear power plant. The first section of this report introduces two test problems used to assess the run-time performance of VERA-CS using a source dated February 2016. The next section provides a brief overview of the major modifications made to decrease the computational cost. Following the descriptions of the major improvements,more » the run-time for each improvement is shown. Conclusions on the work are presented, and further follow-on performance improvements are suggested.« less
Fiia: A Model-Based Approach to Engineering Collaborative Augmented Reality
NASA Astrophysics Data System (ADS)
Wolfe, Christopher; Smith, J. David; Phillips, W. Greg; Graham, T. C. Nicholas
Augmented reality systems often involve collaboration among groups of people. While there are numerous toolkits that aid the development of such augmented reality groupware systems (e.g., ARToolkit and Groupkit), there remains an enormous gap between the specification of an AR groupware application and its implementation. In this chapter, we present Fiia, a toolkit which simplifies the development of collaborative AR applications. Developers specify the structure of their applications using the Fiia modeling language, which abstracts details of networking and provides high-level support for specifying adapters between the physical and virtual world. The Fiia.Net runtime system then maps this conceptual model to a runtime implementation. We illustrate Fiia via Raptor, an augmented reality application used to help small groups collaboratively prototype video games.
GPU accelerated FDTD solver and its application in MRI.
Chi, J; Liu, F; Jin, J; Mason, D G; Crozier, S
2010-01-01
The finite difference time domain (FDTD) method is a popular technique for computational electromagnetics (CEM). The large computational power often required, however, has been a limiting factor for its applications. In this paper, we will present a graphics processing unit (GPU)-based parallel FDTD solver and its successful application to the investigation of a novel B1 shimming scheme for high-field magnetic resonance imaging (MRI). The optimized shimming scheme exhibits considerably improved transmit B(1) profiles. The GPU implementation dramatically shortened the runtime of FDTD simulation of electromagnetic field compared with its CPU counterpart. The acceleration in runtime has made such investigation possible, and will pave the way for other studies of large-scale computational electromagnetic problems in modern MRI which were previously impractical.
Feature Selection Methods for Zero-Shot Learning of Neural Activity
Caceres, Carlos A.; Roos, Matthew J.; Rupp, Kyle M.; Milsap, Griffin; Crone, Nathan E.; Wolmetz, Michael E.; Ratto, Christopher R.
2017-01-01
Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy. PMID:28690513
Lessons Learned from Autonomous Sciencecraft Experiment
NASA Technical Reports Server (NTRS)
Chien, Steve A.; Sherwood, Rob; Tran, Daniel; Cichy, Benjamin; Rabideau, Gregg; Castano, Rebecca; Davies, Ashley; Mandl, Dan; Frye, Stuart; Trout, Bruce;
2005-01-01
An Autonomous Science Agent has been flying onboard the Earth Observing One Spacecraft since 2003. This software enables the spacecraft to autonomously detect and responds to science events occurring on the Earth such as volcanoes, flooding, and snow melt. The package includes AI-based software systems that perform science data analysis, deliberative planning, and run-time robust execution. This software is in routine use to fly the EO-l mission. In this paper we briefly review the agent architecture and discuss lessons learned from this multi-year flight effort pertinent to deployment of software agents to critical applications.
Salko, Robert K.; Schmidt, Rodney C.; Avramova, Maria N.
2014-11-23
This study describes major improvements to the computational infrastructure of the CTF subchannel code so that full-core, pincell-resolved (i.e., one computational subchannel per real bundle flow channel) simulations can now be performed in much shorter run-times, either in stand-alone mode or as part of coupled-code multi-physics calculations. These improvements support the goals of the Department Of Energy Consortium for Advanced Simulation of Light Water Reactors (CASL) Energy Innovation Hub to develop high fidelity multi-physics simulation tools for nuclear energy design and analysis.
Experiences in autotuning matrix multiplication for energy minimization on GPUs
Anzt, Hartwig; Haugen, Blake; Kurzak, Jakub; ...
2015-05-20
In this study, we report extensive results and analysis of autotuning the computationally intensive graphics processing units kernel for dense matrix–matrix multiplication in double precision. In contrast to traditional autotuning and/or optimization for runtime performance only, we also take the energy efficiency into account. For kernels achieving equal performance, we show significant differences in their energy balance. We also identify the memory throughput as the most influential metric that trades off performance and energy efficiency. Finally, as a result, the performance optimal case ends up not being the most efficient kernel in overall resource use.
Automata-Based Verification of Temporal Properties on Running Programs
NASA Technical Reports Server (NTRS)
Giannakopoulou, Dimitra; Havelund, Klaus; Lan, Sonie (Technical Monitor)
2001-01-01
This paper presents an approach to checking a running program against its Linear Temporal Logic (LTL) specifications. LTL is a widely used logic for expressing properties of programs viewed as sets of executions. Our approach consists of translating LTL formulae to finite-state automata, which are used as observers of the program behavior. The translation algorithm we propose modifies standard LTL to Buchi automata conversion techniques to generate automata that check finite program traces. The algorithm has been implemented in a tool, which has been integrated with the generic JPaX framework for runtime analysis of Java programs.
Fitness Probability Distribution of Bit-Flip Mutation.
Chicano, Francisco; Sutton, Andrew M; Whitley, L Darrell; Alba, Enrique
2015-01-01
Bit-flip mutation is a common mutation operator for evolutionary algorithms applied to optimize functions over binary strings. In this paper, we develop results from the theory of landscapes and Krawtchouk polynomials to exactly compute the probability distribution of fitness values of a binary string undergoing uniform bit-flip mutation. We prove that this probability distribution can be expressed as a polynomial in p, the probability of flipping each bit. We analyze these polynomials and provide closed-form expressions for an easy linear problem (Onemax), and an NP-hard problem, MAX-SAT. We also discuss a connection of the results with runtime analysis.
Vector radiative transfer code SORD: Performance analysis and quick start guide
NASA Astrophysics Data System (ADS)
Korkin, Sergey; Lyapustin, Alexei; Sinyuk, Alexander; Holben, Brent; Kokhanovsky, Alexander
2017-10-01
We present a new open source polarized radiative transfer code SORD written in Fortran 90/95. SORD numerically simulates propagation of monochromatic solar radiation in a plane-parallel atmosphere over a reflecting surface using the method of successive orders of scattering (hence the name). Thermal emission is ignored. We did not improve the method in any way, but report the accuracy and runtime in 52 benchmark scenarios. This paper also serves as a quick start user's guide for the code available from ftp://maiac.gsfc.nasa.gov/pub/skorkin, from the JQSRT website, or from the corresponding (first) author.
Jamwal, Rohitash; Topletz, Ariel R.; Ramratnam, Bharat; Akhlaghi, Fatemeh
2017-01-01
Cannabis is used widely in the United States, both recreationally and for medical purposes. Current methods for analysis of cannabinoids in human biological specimens rely on complex extraction process and lengthy analysis time. We established a rapid and simple assay for quantification of Δ9-tetrahydrocannabinol (THC), cannabidiol (CBD), 11-hydroxy Δ9-tetrahydrocannabinol (11-OH THC) and 11-nor-9-carboxy-Δ9-tetrahydrocannbinol (THC-COOH) in human plasma by U-HPLC-MS/MS using Δ9-tetrahydrocannabinol-D3 as the internal standard. Chromatographic separation was achieved on an Acquity BEH C18 column using a gradient comprising of water (0.1% formic acid) and methanol (0.1% formic acid) over a 6 min run-time. Analytes from 200 µL plasma were extracted using acetonitrile (containing 1% formic acid and THC-D3). Mass spectrometry was performed in positive ionization mode, and total ion chromatogram was used for quantification of analytes. The assay was validated according to guidelines set forth by Food and Drug Administration of United States. An eight-point calibration curve was fitted with quadratic regression (r2>0.99) from 1.56 to 100 ng mL−1 and a lower limit of quantification (LLOQ) of 1.56 ng mL−1 was achieved. Accuracy and precision calculated from six calibration curves was between 85 to 115% while the mean extraction recovery was >90% for all the analytes. Several plasma phospholipids eluted after the analytes thus did not interfere with the assay. Bench-top, freeze-thaw, auto-sampler and short-term stability ranged from 92.7 to 106.8% of nominal values. Application of the method was evaluated by quantification of analytes in human plasma from six subjects. PMID:28192758
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gentile, Ann C.; Brandt, James M.; Tucker, Thomas
2011-09-01
This report provides documentation for the completion of the Sandia Level II milestone 'Develop feedback system for intelligent dynamic resource allocation to improve application performance'. This milestone demonstrates the use of a scalable data collection analysis and feedback system that enables insight into how an application is utilizing the hardware resources of a high performance computing (HPC) platform in a lightweight fashion. Further we demonstrate utilizing the same mechanisms used for transporting data for remote analysis and visualization to provide low latency run-time feedback to applications. The ultimate goal of this body of work is performance optimization in the facemore » of the ever increasing size and complexity of HPC systems.« less
GenomePeek—an online tool for prokaryotic genome and metagenome analysis
McNair, Katelyn; Edwards, Robert A.
2015-06-16
As increases in prokaryotic sequencing take place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping errormore » rates low, as well as offering unique data visualization options.« less
Implementation of the Domino Sampling Waveform digitizer in the PIBETA experiment
NASA Astrophysics Data System (ADS)
Wang, Ying
The Domino Sampling Chip(DSC)-Waveform digitization system is a significant addition to electronics arsenal of PIBETA experiment. It is used to digitize waveforms from every photo tube in the detector. Through carefully programmed offline analysis of its raw data collected during regular runtime, better timing and energy resolution are achieved compared with feast's results. And more importantly, the geometric character of the digitized waveform which contains information of energy deposition of particle decays can be utilized for particle identification, a great advantage that regular unit could not possess. In addition to fastbus, incorporate DSC data through its offline analysis including timing and energy offset, scale calibration will contribute a final more precise result of PIBETA experiment.
Fixed-angle plate osteosynthesis of the patella - an alternative to tension wiring?
Wild, M; Eichler, C; Thelen, S; Jungbluth, P; Windolf, J; Hakimi, M
2010-05-01
The goal of this study is carry out a biomechanical evaluation of the stability of a bilateral, polyaxial, fixed-angle 2.7 mm plate system specifically designed for use on the patella. The results of this approach are then compared to the two currently most commonly used surgical techniques for patella fractures: modified anterior tension wiring with K-wires and cannulated lag screws with anterior tension wiring. A transient biomechanical analysis determining material failure points of all osteosyntheses were conducted on 21 identical left polyurethane foam patellae, which were osteotomized horizontally. Evaluated were load (N), displacement (mm) and run-time (s) as well as elastic modulus (MPa), tensile strength (MPa) and strain at failure (%). With a maximum load capacity of 2396 (SD 492) N, the fixed-angle plate proved to be significantly stronger than the cannulated lag screws with anterior tension wiring (1015 (SD 246) N) and the modified anterior tension wiring (625 (SD 84.9) N). The fixed-angle plate displayed significantly greater stiffness and lower fracture gap dehiscence than the other osteosyntheses. Additionally, osteosynthesis deformation was found to be lower for the fixed-angle plate. A bilateral fixed-angle plate was the most rigid and stable osteosynthesis for horizontal patella fractures with the least amount of fracture gap dehiscence. Further biomechanical trials performed under cycling loading with fresh cadaver specimen should be done to figure out if a fixed-angle plate may be an alternative in the surgical treatment of patella fractures. Copyright 2009 Elsevier Ltd. All rights reserved.
Boulet, Lysiane; Faure, Patrice; Flore, Patrice; Montérémal, Julien; Ducros, Véronique
2017-06-01
Tryptophan (Trp) is an essential amino-acid and the precursor of many biologically active substances such as kynurenine (KYN) and serotonin (5HT). Its metabolism is involved in different physiopathological states, such as cardiovascular diseases, cancer, immunomodulation or depression. Hence, the quantification of Trp catabolites, from both KYN and 5HT pathways, might be usefulfor the discovery of novel diagnostic and follow-up biomarkers. We have developed a simple method for quantification of Trp and 8 of its metabolites,involved in both KYN and 5HT pathways, using liquid chromatography coupled to tandem mass spectrometry. We also validated the methodin human plasma samples, according to NF EN ISO 15189 criteria. Our method shows acceptable intra- and inter-day coefficients of variation (CV) (<12% and <16% respectively). The linearity entirelycovers the human plasma range. Stabilities of whole blood and of residues weredetermined, as well as the use of 2 different types of collectiontube, enabling us to adapt our process. Matrix effects and reference values showed good agreement compared to the literature. We propose here a method allowing the simultaneous quantification of a panel of Trp catabolites, never used before to our knowledge. This method, witha quickchromatographic runtime (15min) and simple sample preparation, has beenvalidated according to NF EN ISO 15189 criteria. The method enables the detailed analysis of these metabolic pathways, which are thought to be involved in a number of pathological conditions. Copyright © 2017 Elsevier B.V. All rights reserved.
Parallel Clustering Algorithm for Large-Scale Biological Data Sets
Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang
2014-01-01
Backgrounds Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Methods Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. Result A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. PMID:24705246
Flow-Centric, Back-in-Time Debugging
NASA Astrophysics Data System (ADS)
Lienhard, Adrian; Fierz, Julien; Nierstrasz, Oscar
Conventional debugging tools present developers with means to explore the run-time context in which an error has occurred. In many cases this is enough to help the developer discover the faulty source code and correct it. However, rather often errors occur due to code that has executed in the past, leaving certain objects in an inconsistent state. The actual run-time error only occurs when these inconsistent objects are used later in the program. So-called back-in-time debuggers help developers step back through earlier states of the program and explore execution contexts not available to conventional debuggers. Nevertheless, even Back-in-Time Debuggers do not help answer the question, “Where did this object come from?” The Object-Flow Virtual Machine, which we have proposed in previous work, tracks the flow of objects to answer precisely such questions, but this VM does not provide dedicated debugging support to explore faulty programs. In this paper we present a novel debugger, called Compass, to navigate between conventional run-time stack-oriented control flow views and object flows. Compass enables a developer to effectively navigate from an object contributing to an error back-in-time through all the code that has touched the object. We present the design and implementation of Compass, and we demonstrate how flow-centric, back-in-time debugging can be used to effectively locate the source of hard-to-find bugs.
Hari, Pradip; Ko, Kevin; Koukoumidis, Emmanouil; Kremer, Ulrich; Martonosi, Margaret; Ottoni, Desiree; Peh, Li-Shiuan; Zhang, Pei
2008-10-28
Increasingly, spatial awareness plays a central role in many distributed and mobile computing applications. Spatially aware applications rely on information about the geographical position of compute devices and their supported services in order to support novel functionality. While many spatial application drivers already exist in mobile and distributed computing, very little systems research has explored how best to program these applications, to express their spatial and temporal constraints, and to allow efficient implementations on highly dynamic real-world platforms. This paper proposes the SARANA system architecture, which includes language and run-time system support for spatially aware and resource-aware applications. SARANA allows users to express spatial regions of interest, as well as trade-offs between quality of result (QoR), latency and cost. The goal is to produce applications that use resources efficiently and that can be run on diverse resource-constrained platforms ranging from laptops to personal digital assistants and to smart phones. SARANA's run-time system manages QoR and cost trade-offs dynamically by tracking resource availability and locations, brokering usage/pricing agreements and migrating programs to nodes accordingly. A resource cost model permeates the SARANA system layers, permitting users to express their resource needs and QoR expectations in units that make sense to them. Although we are still early in the system development, initial versions have been demonstrated on a nine-node system prototype.
Detecting Payload Attacks on Programmable Logic Controllers (PLCs)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Huan
Programmable logic controllers (PLCs) play critical roles in industrial control systems (ICS). Providing hardware peripherals and firmware support for control programs (i.e., a PLC’s “payload”) written in languages such as ladder logic, PLCs directly receive sensor readings and control ICS physical processes. An attacker with access to PLC development software (e.g., by compromising an engineering workstation) can modify the payload program and cause severe physical damages to the ICS. To protect critical ICS infrastructure, we propose to model runtime behaviors of legitimate PLC payload program and use runtime behavior monitoring in PLC firmware to detect payload attacks. By monitoring themore » I/O access patterns, network access patterns, as well as payload program timing characteristics, our proposed firmware-level detection mechanism can detect abnormal runtime behaviors of malicious PLC payload. Using our proof-of-concept implementation, we evaluate the memory and execution time overhead of implementing our proposed method and find that it is feasible to incorporate our method into existing PLC firmware. In addition, our evaluation results show that a wide variety of payload attacks can be effectively detected by our proposed approach. The proposed firmware-level payload attack detection scheme complements existing bumpin- the-wire solutions (e.g., external temporal-logic-based model checkers) in that it can detect payload attacks that violate realtime requirements of ICS operations and does not require any additional apparatus.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoginath, Srikanth B; Perumalla, Kalyan S
2013-01-01
Virtual machine (VM) technologies, especially those offered via Cloud platforms, present new dimensions with respect to performance and cost in executing parallel discrete event simulation (PDES) applications. Due to the introduction of overall cost as a metric, the choice of the highest-end computing configuration is no longer the most economical one. Moreover, runtime dynamics unique to VM platforms introduce new performance characteristics, and the variety of possible VM configurations give rise to a range of choices for hosting a PDES run. Here, an empirical study of these issues is undertaken to guide an understanding of the dynamics, trends and trade-offsmore » in executing PDES on VM/Cloud platforms. Performance results and cost measures are obtained from actual execution of a range of scenarios in two PDES benchmark applications on the Amazon Cloud offerings and on a high-end VM host machine. The data reveals interesting insights into the new VM-PDES dynamics that come into play and also leads to counter-intuitive guidelines with respect to choosing the best and second-best configurations when overall cost of execution is considered. In particular, it is found that choosing the highest-end VM configuration guarantees neither the best runtime nor the least cost. Interestingly, choosing a (suitably scaled) low-end VM configuration provides the least overall cost without adversely affecting the total runtime.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Allan Ray
1987-05-01
Increases in high speed hardware have mandated studies in software techniques to exploit the parallel capabilities. This thesis examines the effects a run-time scheduler has on a multiprocessor. The model consists of directed, acyclic graphs, generated from serial FORTRAN benchmark programs by the parallel compiler Parafrase. A multitasked, multiprogrammed environment is created. Dependencies are generated by the compiler. Tasks are bidimensional, i.e., they may specify both time and processor requests. Processor requests may be folded into execution time by the scheduler. The graphs may arrive at arbitrary time intervals. The general case is NP-hard, thus, a variety of heuristics aremore » examined by a simulator. Multiprogramming demonstrates a greater need for a run-time scheduler than does monoprogramming for a variety of reasons, e.g., greater stress on the processors, a larger number of independent control paths, more variety in the task parameters, etc. The dynamic critical path series of algorithms perform well. Dynamic critical volume did not add much. Unfortunately, dynamic critical path maximizes turnaround time as well as throughput. Two schedulers are presented which balance throughput and turnaround time. The first requires classification of jobs by type; the second requires selection of a ratio value which is dependent upon system parameters. 45 refs., 19 figs., 20 tabs.« less
González-Sánchez, Manuel; Ruiz-Muñoz, Maria; Ávila-Bolívar, Ana Belén; Cuesta-Vargas, Antonio I
2016-10-06
To analyse the effect of real-time kinematic feedback (KRTF) when learning two ankle joint mobilisation techniques comparing the results with the traditional teaching method. Double-blind randomized trial. Faculty of Health Sciences. undergraduate students with no experience in manual therapy. Each student practised intensely for 90 min (45 min for each mobilisation) according to the random methodology assigned (G1: traditional method group and G2: KRTF group). G1: an expert professor supervising the student's practice, the professorstudent ratio was 1:8. G2: placed in front of a station where, while they performed the manoeuvre, they received a KRTF on a laptop. total time of mobilisation, time to reach maximum amplitude, maximum angular displacement in the three axes, maximum and average velocity to reach the maximum angular displacement, average velocity during the mobilisation. Among the pre-post intervention measurements, there were significant differences within the two groups for all outcome variables, however, G2 (KRTF) achieved significantly greater improvements in kinematic parameters for the two mobilisations (significant increase in displacement, velocity and significant reduction in the mobilisations runtime) than G1. Ankle plantar flexion: G1's measurement stability (post-intervention) ranged between 0.491 and 0.687, while G2's measurement stability ranged between 0.899 and 0.984. Ankle dorsal flexion mobilisation: G1 the measurement stability (post-intervention) ranged from 0.543 and 0.684 while G2 ranged between 0.899 and 0.974. KRTF was proven to be more effective tool than traditional teaching method in the teaching - learning process of two joint mobilisation techniques. NCT02504710.
Faster search by lackadaisical quantum walk
NASA Astrophysics Data System (ADS)
Wong, Thomas G.
2018-03-01
In the typical model, a discrete-time coined quantum walk searching the 2D grid for a marked vertex achieves a success probability of O(1/log N) in O(√{N log N}) steps, which with amplitude amplification yields an overall runtime of O(√{N} log N). We show that making the quantum walk lackadaisical or lazy by adding a self-loop of weight 4 / N to each vertex speeds up the search, causing the success probability to reach a constant near 1 in O(√{N log N}) steps, thus yielding an O(√{log N}) improvement over the typical, loopless algorithm. This improved runtime matches the best known quantum algorithms for this search problem. Our results are based on numerical simulations since the algorithm is not an instance of the abstract search algorithm.
Ultrafast adiabatic quantum algorithm for the NP-complete exact cover problem
Wang, Hefeng; Wu, Lian-Ao
2016-01-01
An adiabatic quantum algorithm may lose quantumness such as quantum coherence entirely in its long runtime, and consequently the expected quantum speedup of the algorithm does not show up. Here we present a general ultrafast adiabatic quantum algorithm. We show that by applying a sequence of fast random or regular signals during evolution, the runtime can be reduced substantially, whereas advantages of the adiabatic algorithm remain intact. We also propose a randomized Trotter formula and show that the driving Hamiltonian and the proposed sequence of fast signals can be implemented simultaneously. We illustrate the algorithm by solving the NP-complete 3-bit exact cover problem (EC3), where NP stands for nondeterministic polynomial time, and put forward an approach to implementing the problem with trapped ions. PMID:26923834
Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S; Seal, Sudip K
2010-01-01
The spatial scale, runtime speed and behavioral detail of epidemic outbreak simulations together require the use of large-scale parallel processing. In this paper, an optimistic parallel discrete event execution of a reaction-diffusion simulation model of epidemic outbreaks is presented, with an implementation over themore » $$\\mu$$sik simulator. Rollback support is achieved with the development of a novel reversible model that combines reverse computation with a small amount of incremental state saving. Parallel speedup and other runtime performance metrics of the simulation are tested on a small (8,192-core) Blue Gene / P system, while scalability is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes (up to several hundred million individuals in the largest case) are exercised.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dubois, P.F.
1989-05-16
This paper discusses the basis system. Basis is a program development system for scientific programs. It has been developed over the last five years at Lawrence Livermore National Laboratory (LLNL), where it is now used in about twenty major programming efforts. The Basis System includes two major components, a program development system and a run-time package. The run-time package provides the Basis Language interpreter, through which the user does input, output, plotting, and control of the program's subroutines and functions. Variables in the scientific packages are known to this interpreter, so that the user may arbitrarily print, plot, and calculatemore » with, any major program variables. Also provided are facilities for dynamic memory management, terminal logs, error recovery, text-file i/o, and the attachment of non-Basis-developed packages.« less
Real-Time MENTAT programming language and architecture
NASA Technical Reports Server (NTRS)
Grimshaw, Andrew S.; Silberman, Ami; Liu, Jane W. S.
1989-01-01
Real-time MENTAT, a programming environment designed to simplify the task of programming real-time applications in distributed and parallel environments, is described. It is based on the same data-driven computation model and object-oriented programming paradigm as MENTAT. It provides an easy-to-use mechanism to exploit parallelism, language constructs for the expression and enforcement of timing constraints, and run-time support for scheduling and exciting real-time programs. The real-time MENTAT programming language is an extended C++. The extensions are added to facilitate automatic detection of data flow and generation of data flow graphs, to express the timing constraints of individual granules of computation, and to provide scheduling directives for the runtime system. A high-level view of the real-time MENTAT system architecture and programming language constructs is provided.
Novel Framework for Reduced Order Modeling of Aero-engine Components
NASA Astrophysics Data System (ADS)
Safi, Ali
The present study focuses on the popular dynamic reduction methods used in design of complex assemblies (millions of Degrees of Freedom) where numerous iterations are involved to achieve the final design. Aerospace manufacturers such as Rolls Royce and Pratt & Whitney are actively seeking techniques that reduce computational time while maintaining accuracy of the models. This involves modal analysis of components with complex geometries to determine the dynamic behavior due to non-linearity and complicated loading conditions. In such a case the sub-structuring and dynamic reduction techniques prove to be an efficient tool to reduce design cycle time. The components whose designs are finalized can be dynamically reduced to mass and stiffness matrices at the boundary nodes in the assembly. These matrices conserve the dynamics of the component in the assembly, and thus avoid repeated calculations during the analysis runs for design modification of other components. This thesis presents a novel framework in terms of modeling and meshing of any complex structure, in this case an aero-engine casing. In this study the affect of meshing techniques on the run time are highlighted. The modal analysis is carried out using an extremely fine mesh to ensure all minor details in the structure are captured correctly in the Finite Element (FE) model. This is used as the reference model, to compare against the results of the reduced model. The study also shows the conditions/criteria under which dynamic reduction can be implemented effectively, proving the accuracy of Criag-Bampton (C.B.) method and limitations of Static Condensation. The study highlights the longer runtime needed to produce the reduced matrices of components compared to the overall runtime of the complete unreduced model. Although once the components are reduced, the assembly run is significantly. Hence the decision to use Component Mode Synthesis (CMS) is to be taken judiciously considering the number of iterations that may be required during the design cycle.
Scalability Analysis of Gleipnir: A Memory Tracing and Profiling Tool, on Titan
DOE Office of Scientific and Technical Information (OSTI.GOV)
Janjusic, Tommy; Kartsaklis, Christos; Wang, Dali
2013-01-01
Application performance is hindered by a variety of factors but most notably driven by the well know CPU-memory speed gap (also known as the memory wall). Understanding application s memory behavior is key if we are trying to optimize performance. Understanding application performance properties is facilitated with various performance profiling tools. The scope of profiling tools varies in complexity, ease of deployment, profiling performance, and the detail of profiled information. Specifically, using profiling tools for performance analysis is a common task when optimizing and understanding scientific applications on complex and large scale systems such as Cray s XK7. This papermore » describes the performance characteristics of using Gleipnir, a memory tracing tool, on the Titan Cray XK7 system when instrumenting large applications such as the Community Earth System Model. Gleipnir is a memory tracing tool built as a plug-in tool for the Valgrind instrumentation framework. The goal of Gleipnir is to provide fine-grained trace information. The generated traces are a stream of executed memory transactions mapped to internal structures per process, thread, function, and finally the data structure or variable. Our focus was to expose tool performance characteristics when using Gleipnir with a combination of an external tools such as a cache simulator, Gl CSim, to characterize the tool s overall performance. In this paper we describe our experience with deploying Gleipnir on the Titan Cray XK7 system, report on the tool s ease-of-use, and analyze run-time performance characteristics under various workloads. While all performance aspects are important we mainly focus on I/O characteristics analysis due to the emphasis on the tools output which are trace-files. Moreover, the tool is dependent on the run-time system to provide the necessary infrastructure to expose low level system detail; therefore, we also discuss any theoretical benefits that can be achieved if such modules were present.« less
Simulated fault injection - A methodology to evaluate fault tolerant microprocessor architectures
NASA Technical Reports Server (NTRS)
Choi, Gwan S.; Iyer, Ravishankar K.; Carreno, Victor A.
1990-01-01
A simulation-based fault-injection method for validating fault-tolerant microprocessor architectures is described. The approach uses mixed-mode simulation (electrical/logic analysis), and injects transient errors in run-time to assess the resulting fault impact. As an example, a fault-tolerant architecture which models the digital aspects of a dual-channel real-time jet-engine controller is used. The level of effectiveness of the dual configuration with respect to single and multiple transients is measured. The results indicate 100 percent coverage of single transients. Approximately 12 percent of the multiple transients affect both channels; none result in controller failure since two additional levels of redundancy exist.
Extending Iris: The VAO SED Analysis Tool
NASA Astrophysics Data System (ADS)
Laurino, O.; Busko, I.; Cresitello-Dittmar, M.; D'Abrusco, R.; Doe, S.; Evans, J.; Pevunova, O.
2013-10-01
Iris is a tool developed by the Virtual Astronomical Observatory (VAO) for building and analyzing Spectral Energy Distributions (SEDs). Iris was designed to be extensible, so that new components and models can be developed by third parties and then included at runtime. Iris can be extended in different ways: new file readers allow users to integrate data in custom formats into Iris SEDs; new models can be fitted to the data, in the form of template libraries for template fitting, data tables, and arbitrary Python functions. The interoperability-centered design of Iris and the Virtual Observatory standards and protocols can enable new science functionalities involving SED data.
Technology for Space Station Evolution: the Data Management System
NASA Technical Reports Server (NTRS)
Abbott, L.
1990-01-01
Viewgraphs on the data management system (DMS) for the space station evolution are presented. Topics covered include DMS architecture and implementation approach; and an overview of the runtime object database.
Performance Analysis of Garbage Collection and Dynamic Reordering in a Lisp System. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Llames, Rene Lim
1991-01-01
Generation based garbage collection and dynamic reordering of objects are two techniques for improving the efficiency of memory management in Lisp and similar dynamic language systems. An analysis of the effect of generation configuration is presented, focusing on the effect of a number of generations and generation capabilities. Analytic timing and survival models are used to represent garbage collection runtime and to derive structural results on its behavior. The survival model provides bounds on the age of objects surviving a garbage collection at a particular level. Empirical results show that execution time is most sensitive to the capacity of the youngest generation. A technique called scanning for transport statistics, for evaluating the effectiveness of reordering independent of main memory size, is presented.
PiCO QL: A software library for runtime interactive queries on program data
NASA Astrophysics Data System (ADS)
Fragkoulis, Marios; Spinellis, Diomidis; Louridas, Panos
PiCO QL is an open source C/C++ software whose scientific scope is real-time interactive analysis of in-memory data through SQL queries. It exposes a relational view of a system's or application's data structures, which is queryable through SQL. While the application or system is executing, users can input queries through a web-based interface or issue web service requests. Queries execute on the live data structures through the respective relational views. PiCO QL makes a good candidate for ad-hoc data analysis in applications and for diagnostics in systems settings. Applications of PiCO QL include the Linux kernel, the Valgrind instrumentation framework, a GIS application, a virtual real-time observatory of stellar objects, and a source code analyser.
NASA Astrophysics Data System (ADS)
Lin, Pei-Chun; Yu, Chun-Chang; Chen, Charlie Chung-Ping
2015-01-01
As one of the critical stages of a very large scale integration fabrication process, postexposure bake (PEB) plays a crucial role in determining the final three-dimensional (3-D) profiles and lessening the standing wave effects. However, the full 3-D chemically amplified resist simulation is not widely adopted during the postlayout optimization due to the long run-time and huge memory usage. An efficient simulation method is proposed to simulate the PEB while considering standing wave effects and resolution enhancement techniques, such as source mask optimization and subresolution assist features based on the Sylvester equation and Abbe-principal component analysis method. Simulation results show that our algorithm is 20× faster than the conventional Gaussian convolution method.
Compiling global name-space programs for distributed execution
NASA Technical Reports Server (NTRS)
Koelbel, Charles; Mehrotra, Piyush
1990-01-01
Distributed memory machines do not provide hardware support for a global address space. Thus programmers are forced to partition the data across the memories of the architecture and use explicit message passing to communicate data between processors. The compiler support required to allow programmers to express their algorithms using a global name-space is examined. A general method is presented for analysis of a high level source program and along with its translation to a set of independently executing tasks communicating via messages. If the compiler has enough information, this translation can be carried out at compile-time. Otherwise run-time code is generated to implement the required data movement. The analysis required in both situations is described and the performance of the generated code on the Intel iPSC/2 is presented.
Comparative study of feature selection with ensemble learning using SOM variants
NASA Astrophysics Data System (ADS)
Filali, Ameni; Jlassi, Chiraz; Arous, Najet
2017-03-01
Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.
Bergh, Marianne Skov-Skov; Bogen, Inger Lise; Andersen, Jannike Mørch; Øiestad, Åse Marit Leere; Berg, Thomas
2018-01-01
A novel ion pair reversed phase ultra high performance liquid chromatography-tandem mass spectrometry (UHPLC-MS/MS) method for simultaneous determination of the stress hormones adrenaline, noradrenaline and corticosterone in rodent blood was developed and fully validated. Separations were performed on an Acquity HSS T3 column (2.1mm i.d.×100mm, 1.8μm) with gradient elution and a runtime of 5.5min. The retention of adrenaline and noradrenaline was substantially increased by employing the ion pair reagent heptafluorobutyric acid (HFBA). Ion pair reagents are usually added to the mobile phase only, but we demonstrate for the first time that including HFBA to the sample reconstitution solvent as well, has a major impact on the chromatography of these compounds. The stability of adrenaline and corticosterone in rodent blood was investigated using the surrogate analytes adrenaline-d 3 and corticosterone-d 8 . The applicability of the described method was demonstrated by measuring the concentration of stress hormones in rodent blood samples. Copyright © 2017 Elsevier B.V. All rights reserved.
A Core Plug and Play Architecture for Reusable Flight Software Systems
NASA Technical Reports Server (NTRS)
Wilmot, Jonathan
2006-01-01
The Flight Software Branch, at Goddard Space Flight Center (GSFC), has been working on a run-time approach to facilitate a formal software reuse process. The reuse process is designed to enable rapid development and integration of high-quality software systems and to more accurately predict development costs and schedule. Previous reuse practices have been somewhat successful when the same teams are moved from project to project. But this typically requires taking the software system in an all-or-nothing approach where useful components cannot be easily extracted from the whole. As a result, the system is less flexible and scalable with limited applicability to new projects. This paper will focus on the rationale behind, and implementation of the run-time executive. This executive is the core for the component-based flight software commonality and reuse process adopted at Goddard.
Runtime support for parallelizing data mining algorithms
NASA Astrophysics Data System (ADS)
Jin, Ruoming; Agrawal, Gagan
2002-03-01
With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale
Huang, Muhuan; Wu, Di; Yu, Cody Hao; Fang, Zhenman; Interlandi, Matteo; Condie, Tyson; Cong, Jason
2017-01-01
With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microsoft’s FPGA deployment in its Bing search engine and Intel’s 16.7 billion acquisition of Altera, integrating FPGAs into datacenters is considered one of the most promising approaches to sustain future datacenter growth. However, it is quite challenging for existing big data computing systems—like Apache Spark and Hadoop—to access the performance and energy benefits of FPGA accelerators. In this paper we design and implement Blaze to provide programming and runtime support for enabling easy and efficient deployments of FPGA accelerators in datacenters. In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators. Our Blaze runtime implements an FaaS framework to efficiently share FPGA accelerators among multiple heterogeneous threads on a single node, and extends Hadoop YARN with accelerator-centric scheduling to efficiently share them among multiple computing tasks in the cluster. Experimental results using four representative big data applications demonstrate that Blaze greatly reduces the programming efforts to access FPGA accelerators in systems like Apache Spark and YARN, and improves the system throughput by 1.7 × to 3× (and energy efficiency by 1.5× to 2.7×) compared to a conventional CPU-only cluster. PMID:28317049
Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale.
Huang, Muhuan; Wu, Di; Yu, Cody Hao; Fang, Zhenman; Interlandi, Matteo; Condie, Tyson; Cong, Jason
2016-10-01
With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microsoft's FPGA deployment in its Bing search engine and Intel's 16.7 billion acquisition of Altera, integrating FPGAs into datacenters is considered one of the most promising approaches to sustain future datacenter growth. However, it is quite challenging for existing big data computing systems-like Apache Spark and Hadoop-to access the performance and energy benefits of FPGA accelerators. In this paper we design and implement Blaze to provide programming and runtime support for enabling easy and efficient deployments of FPGA accelerators in datacenters. In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators. Our Blaze runtime implements an FaaS framework to efficiently share FPGA accelerators among multiple heterogeneous threads on a single node, and extends Hadoop YARN with accelerator-centric scheduling to efficiently share them among multiple computing tasks in the cluster. Experimental results using four representative big data applications demonstrate that Blaze greatly reduces the programming efforts to access FPGA accelerators in systems like Apache Spark and YARN, and improves the system throughput by 1.7 × to 3× (and energy efficiency by 1.5× to 2.7×) compared to a conventional CPU-only cluster.
Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies
Huang, Jim C.; Meek, Christopher; Kadie, Carl; Heckerman, David
2011-01-01
Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science. PMID:21765897
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dentz, J.; Henderson, H.; Varshney, K.
2014-09-01
The ARIES Collaborative, a U.S. Department of Energy Building America research team, partnered with NeighborWorks America affiliate Homeowners' Rehab Inc. (HRI) of Cambridge, Massachusetts, to study improvements to the central hydronic heating system in one of the nonprofit's housing developments. The heating controls in the three-building, 42-unit Columbia Cambridge Alliance for Spanish Tenants housing development were upgraded. Fuel use in the development was excessive compared to similar properties. A poorly insulated thermal envelope contributed to high energy bills, but adding wall insulation was not cost-effective or practical. The more cost-effective option was improving heating system efficiency. Efficient operation of themore » heating system faced several obstacles, including inflexible boiler controls and failed thermostatic radiator valves. Boiler controls were replaced with systems that offer temperature setbacks and one that controls heat based on apartment temperature in addition to outdoor temperature. Utility bill analysis shows that post-retrofit weather-normalized heating energy use was reduced by 10%-31% (average of 19%). Indoor temperature cutoff reduced boiler runtime (and therefore heating fuel consumption) by 28% in the one building in which it was implemented. Nearly all savings were obtained during night which had a lower indoor temperature cut off (68 degrees F) than day (73 degrees F). This implies that the outdoor reset curve was appropriately adjusted for this building for daytime operation. Nighttime setback of heating system supply water temperature had no discernable impact on boiler runtime or gas bills.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
None
2014-11-01
The ARIES Collaborative, a U.S. Department of Energy Building America research team, partnered with NeighborWorks America affiliate Homeowners' Rehab Inc. (HRI) of Cambridge, Massachusetts, to study improvements to the central hydronic heating system in one of the nonprofit's housing developments. The heating controls in the three-building, 42-unit Columbia Cambridge Alliance for Spanish Tenants housing development were upgraded. Fuel use in the development was excessive compared to similar properties. A poorly insulated thermal envelope contributed to high energy bills, but adding wall insulation was not cost-effective or practical. The more cost-effective option was improving heating system efficiency, which faced several obstacles,more » including inflexible boiler controls and failed thermostatic radiator valves. Boiler controls were replaced with systems that offer temperature setbacks and one that controls heat based on apartment temperature in addition to outdoor temperature. Utility bill analysis shows that post-retrofit weather-normalized heating energy use was reduced by 10%-31% (average of 19%). Indoor temperature cutoff reduced boiler runtime (and therefore heating fuel consumption) by 28% in the one building in which it was implemented. Nearly all savings were obtained during night which had a lower indoor temperature cut off (68°F) than day (73° F). This implies that the outdoor reset curve was appropriately adjusted for this building for daytime operation. Nighttime setback of heating system supply water temperature had no discernable impact on boiler runtime or gas bills.« less
Analysis and synthesis of abstract data types through generalization from examples
NASA Technical Reports Server (NTRS)
Wild, Christian
1987-01-01
The discovery of general patterns of behavior from a set of input/output examples can be a useful technique in the automated analysis and synthesis of software systems. These generalized descriptions of the behavior form a set of assertions which can be used for validation, program synthesis, program testing, and run-time monitoring. Describing the behavior is characterized as a learning process in which the set of inputs is mapped into an appropriate transform space such that general patterns can be easily characterized. The learning algorithm must chose a transform function and define a subset of the transform space which is related to equivalence classes of behavior in the original domain. An algorithm for analyzing the behavior of abstract data types is presented and several examples are given. The use of the analysis for purposes of program synthesis is also discussed.
An efficient current-based logic cell model for crosstalk delay analysis
NASA Astrophysics Data System (ADS)
Nazarian, Shahin; Das, Debasish
2013-04-01
Logic cell modelling is an important component in the analysis and design of CMOS integrated circuits, mostly due to nonlinear behaviour of CMOS cells with respect to the voltage signal at their input and output pins. A current-based model for CMOS logic cells is presented, which can be used for effective crosstalk noise and delta delay analysis in CMOS VLSI circuits. Existing current source models are expensive and need a new set of Spice-based characterisation, which is not compatible with typical EDA tools. In this article we present Imodel, a simple nonlinear logic cell model that can be derived from the typical cell libraries such as NLDM, with accuracy much higher than NLDM-based cell delay models. In fact, our experiments show an average error of 3% compared to Spice. This level of accuracy comes with a maximum runtime penalty of 19% compared to NLDM-based cell delay models on medium-sized industrial designs.
In situ visualization and data analysis for turbidity currents simulation
NASA Astrophysics Data System (ADS)
Camata, Jose J.; Silva, Vítor; Valduriez, Patrick; Mattoso, Marta; Coutinho, Alvaro L. G. A.
2018-01-01
Turbidity currents are underflows responsible for sediment deposits that generate geological formations of interest for the oil and gas industry. LibMesh-sedimentation is an application built upon the libMesh library to simulate turbidity currents. In this work, we present the integration of libMesh-sedimentation with in situ visualization and in transit data analysis tools. DfAnalyzer is a solution based on provenance data to extract and relate strategic simulation data in transit from multiple data for online queries. We integrate libMesh-sedimentation and ParaView Catalyst to perform in situ data analysis and visualization. We present a parallel performance analysis for two turbidity currents simulations showing that the overhead for both in situ visualization and in transit data analysis is negligible. We show that our tools enable monitoring the sediments appearance at runtime and steer the simulation based on the solver convergence and visual information on the sediment deposits, thus enhancing the analytical power of turbidity currents simulations.
Ishikawa, Hiroshi; Kasahara, Kohei; Sato, Sumie; Shimakawa, Yasuhisa; Watanabe, Koichi
2014-05-16
Yeast contamination is a serious problem in the food industry and a major cause of food spoilage. Several yeasts, such as Filobasidiella neoformans, which cause cryptococcosis in humans, are also opportunistic pathogens, so a simple and rapid method for monitoring yeast contamination in food is essential. Here, we developed a simple and rapid method that utilizes loop-mediated isothermal amplification (LAMP) for the detection of F. neoformans. A set of five specific LAMP primers was designed that targeted the 5.8S-26S rDNA internal transcribed spacer 2 region of F. neoformans, and the primer set's specificity was confirmed. In a pure culture of F. neoformans, the LAMP assay had a lower sensitivity threshold of 10(2)cells/mL at a runtime of 60min. In a probiotic dairy product artificially contaminated with F. neoformans, the LAMP assay also had a lower sensitivity threshold of 10(2)cells/mL, which was comparable to the sensitivity of a quantitative PCR (qPCR) assay. We also developed a simple two-step method for the extraction of DNA from a probiotic dairy product that can be performed within 15min. This method involves initial protease treatment of the test sample at 45°C for 3min followed by boiling at 100°C for 5min under alkaline conditions. In a probiotic dairy product artificially contaminated with F. neoformans, analysis by means of our novel DNA extraction method followed by LAMP with our specific primer set had a lower sensitivity threshold of 10(3)cells/mL at a runtime of 60min. In contrast, use of our novel method of DNA extraction followed by qPCR assay had a lower sensitivity threshold of only 10(5)cells/mL at a runtime of 3 to 4h. Therefore, unlike the PCR assay, our LAMP assay can be used to quickly evaluate yeast contamination and is sensitive even for crude samples containing bacteria or background impurities. Our study provides a powerful tool for the primary screening of large numbers of food samples for yeast contamination. Copyright © 2014 Elsevier B.V. All rights reserved.
Compiled MPI: Cost-Effective Exascale Applications Development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bronevetsky, G; Quinlan, D; Lumsdaine, A
2012-04-10
The complexity of petascale and exascale machines makes it increasingly difficult to develop applications that can take advantage of them. Future systems are expected to feature billion-way parallelism, complex heterogeneous compute nodes and poor availability of memory (Peter Kogge, 2008). This new challenge for application development is motivating a significant amount of research and development on new programming models and runtime systems designed to simplify large-scale application development. Unfortunately, DoE has significant multi-decadal investment in a large family of mission-critical scientific applications. Scaling these applications to exascale machines will require a significant investment that will dwarf the costs of hardwaremore » procurement. A key reason for the difficulty in transitioning today's applications to exascale hardware is their reliance on explicit programming techniques, such as the Message Passing Interface (MPI) programming model to enable parallelism. MPI provides a portable and high performance message-passing system that enables scalable performance on a wide variety of platforms. However, it also forces developers to lock the details of parallelization together with application logic, making it very difficult to adapt the application to significant changes in the underlying system. Further, MPI's explicit interface makes it difficult to separate the application's synchronization and communication structure, reducing the amount of support that can be provided by compiler and run-time tools. This is in contrast to the recent research on more implicit parallel programming models such as Chapel, OpenMP and OpenCL, which promise to provide significantly more flexibility at the cost of reimplementing significant portions of the application. We are developing CoMPI, a novel compiler-driven approach to enable existing MPI applications to scale to exascale systems with minimal modifications that can be made incrementally over the application's lifetime. It includes: (1) New set of source code annotations, inserted either manually or automatically, that will clarify the application's use of MPI to the compiler infrastructure, enabling greater accuracy where needed; (2) A compiler transformation framework that leverages these annotations to transform the original MPI source code to improve its performance and scalability; (3) Novel MPI runtime implementation techniques that will provide a rich set of functionality extensions to be used by applications that have been transformed by our compiler; and (4) A novel compiler analysis that leverages simple user annotations to automatically extract the application's communication structure and synthesize most complex code annotations.« less
Semantic Web Infrastructure Supporting NextFrAMES Modeling Platform
NASA Astrophysics Data System (ADS)
Lakhankar, T.; Fekete, B. M.; Vörösmarty, C. J.
2008-12-01
Emerging modeling frameworks offer new ways to modelers to develop model applications by offering a wide range of software components to handle common modeling tasks such as managing space and time, distributing computational tasks in parallel processing environment, performing input/output and providing diagnostic facilities. NextFrAMES, the next generation updates to the Framework for Aquatic Modeling of the Earth System originally developed at University of New Hampshire and currently hosted at The City College of New York takes a step further by hiding most of these services from modeler behind a platform agnostic modeling platform that allows scientists to focus on the implementation of scientific concepts in the form of a new modeling markup language and through a minimalist application programming interface that provide means to implement model processes. At the core of the NextFrAMES modeling platform there is a run-time engine that interprets the modeling markup language loads the module plugins establishes the model I/O and executes the model defined by the modeling XML and the accompanying plugins. The current implementation of the run-time engine is designed for single processor or symmetric multi processing (SMP) systems but future implementation of the run-time engine optimized for different hardware architectures are anticipated. The modeling XML and the accompanying plugins define the model structure and the computational processes in a highly abstract manner, which is not only suitable for the run-time engine, but has the potential to integrate into semantic web infrastructure, where intelligent parsers can extract information about the model configurations such as input/output requirements applicable space and time scales and underlying modeling processes. The NextFrAMES run-time engine itself is also designed to tap into web enabled data services directly, therefore it can be incorporated into complex workflow to implement End-to-End application from observation to the delivery of highly aggregated information. Our presentation will discuss the web services ranging from OpenDAP and WaterOneFlow data services to metadata provided through catalog services that could serve NextFrAMES modeling applications. We will also discuss the support infrastructure needed to streamline the integration of NextFrAMES into an End-to-End application to deliver highly processed information to end users. The End-to-End application will be demonstrated through examples from the State-of-the Global Water System effort that builds on data services provided through WMO's Global Terrestrial Network for Hydrology to deliver water resources related information to policy makers for better water management. Key components of this E2E system are promoted as Community of Practice examples for the Global Observing System of Systems therefore the State-of-the Global Water System can be viewed as test case for the interoperability of the incorporated web service components.
PC graphics generation and management tool for real-time applications
NASA Technical Reports Server (NTRS)
Truong, Long V.
1992-01-01
A graphics tool was designed and developed for easy generation and management of personal computer graphics. It also provides methods and 'run-time' software for many common artificial intelligence (AI) or expert system (ES) applications.
Distributed memory compiler methods for irregular problems: Data copy reuse and runtime partitioning
NASA Technical Reports Server (NTRS)
Das, Raja; Ponnusamy, Ravi; Saltz, Joel; Mavriplis, Dimitri
1991-01-01
Outlined here are two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on iPSC/860 to demonstrate the usefulness of our methods.
NASA Astrophysics Data System (ADS)
Noor-E-Alam, Md.; Doucette, John
2015-08-01
Grid-based location problems (GBLPs) can be used to solve location problems in business, engineering, resource exploitation, and even in the field of medical sciences. To solve these decision problems, an integer linear programming (ILP) model is designed and developed to provide the optimal solution for GBLPs considering fixed cost criteria. Preliminary results show that the ILP model is efficient in solving small to moderate-sized problems. However, this ILP model becomes intractable in solving large-scale instances. Therefore, a decomposition heuristic is proposed to solve these large-scale GBLPs, which demonstrates significant reduction of solution runtimes. To benchmark the proposed heuristic, results are compared with the exact solution via ILP. The experimental results show that the proposed method significantly outperforms the exact method in runtime with minimal (and in most cases, no) loss of optimality.
Pybus -- A Python Software Bus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lavrijsen, Wim T.L.P.
2004-10-14
A software bus, just like its hardware equivalent, allows for the discovery, installation, configuration, loading, unloading, and run-time replacement of software components, as well as channeling of inter-component communication. Python, a popular open-source programming language, encourages a modular design on software written in it, but it offers little or no component functionality. However, the language and its interpreter provide sufficient hooks to implement a thin, integral layer of component support. This functionality can be presented to the developer in the form of a module, making it very easy to use. This paper describes a Pythonmodule, PyBus, with which the conceptmore » of a ''software bus'' can be realized in Python. It demonstrates, within the context of the ATLAS software framework Athena, how PyBus can be used for the installation and (run-time) configuration of software, not necessarily Python modules, from a Python application in a way that is transparent to the end-user.« less
R2U2: Monitoring and Diagnosis of Security Threats for Unmanned Aerial Systems
NASA Technical Reports Server (NTRS)
Schumann, Johann; Moosbruger, Patrick; Rozier, Kristin Y.
2015-01-01
We present R2U2, a novel framework for runtime monitoring of security properties and diagnosing of security threats on-board Unmanned Aerial Systems (UAS). R2U2, implemented in FPGA hardware, is a real-time, REALIZABLE, RESPONSIVE, UNOBTRUSIVE Unit for security threat detection. R2U2 is designed to continuously monitor inputs from the GPS and the ground control station, sensor readings, actuator outputs, and flight software status. By simultaneously monitoring and performing statistical reasoning, attack patterns and post-attack discrepancies in the UAS behavior can be detected. R2U2 uses runtime observer pairs for linear and metric temporal logics for property monitoring and Bayesian networks for diagnosis of security threats. We discuss the design and implementation that now enables R2U2 to handle security threats and present simulation results of several attack scenarios on the NASA DragonEye UAS.
NASA Astrophysics Data System (ADS)
Torres, Hilario; Iaccarino, Gianluca
2017-11-01
Soleil-X is a multi-physics solver being developed at Stanford University as a part of the Predictive Science Academic Alliance Program II. Our goal is to conduct high fidelity simulations of particle laden turbulent flows in a radiation environment for solar energy receiver applications as well as to demonstrate our readiness to effectively utilize next generation Exascale machines. The novel aspect of Soleil-X is that it is built upon the Legion runtime system to enable easy portability to different parallel distributed heterogeneous architectures while also being written entirely in high-level/high-productivity languages (Ebb and Regent). An overview of the Soleil-X software architecture will be given. Results from coupled fluid flow, Lagrangian point particle tracking, and thermal radiation simulations will be presented. Performance diagnostic tools and metrics corresponding the the same cases will also be discussed. US Department of Energy, National Nuclear Security Administration.
FLASH Interface; a GUI for managing runtime parameters in FLASH simulations
NASA Astrophysics Data System (ADS)
Walker, Christopher; Tzeferacos, Petros; Weide, Klaus; Lamb, Donald; Flocke, Norbert; Feister, Scott
2017-10-01
We present FLASH Interface, a novel graphical user interface (GUI) for managing runtime parameters in simulations performed with the FLASH code. FLASH Interface supports full text search of available parameters; provides descriptions of each parameter's role and function; allows for the filtering of parameters based on categories; performs input validation; and maintains all comments and non-parameter information already present in existing parameter files. The GUI can be used to edit existing parameter files or generate new ones. FLASH Interface is open source and was implemented with the Electron framework, making it available on Mac OSX, Windows, and Linux operating systems. The new interface lowers the entry barrier for new FLASH users and provides an easy-to-use tool for experienced FLASH simulators. U.S. Department of Energy (DOE), NNSA ASC/Alliances Center for Astrophysical Thermonuclear Flashes, U.S. DOE NNSA ASC through the Argonne Institute for Computing in Science, U.S. National Science Foundation.
A Mediator-Based Approach to Resolving Interface Heterogeneity of Web Services
NASA Astrophysics Data System (ADS)
Leitner, Philipp; Rosenberg, Florian; Michlmayr, Anton; Huber, Andreas; Dustdar, Schahram
In theory, service-oriented architectures are based on the idea of increasing flexibility in the selection of internal and external business partners using loosely-coupled services. However, in practice this flexibility is limited by the fact that partners need not only to provide the same service, but to do so via virtually the same interface in order to actually be interchangeable easily. Invocation-level mediation may be used to overcome this issue — by using mediation interface differences can be resolved transparently at runtime. In this chapter we discuss the basic ideas of mediation, with a focus on interface-level mediation. We show how interface mediation is integrated into our dynamic Web service invocation framework DAIOS, and present three different mediation strategies, one based on structural message similarity, one based on semantically annotated WSDL, and one which is embedded into the VRESCo SOA runtime, a larger research project with explicit support for service mediation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
None, None
The mobile PV/generator hybrid system deployed at Bechler Meadows provides a number of advantages. It reduces on-site air emissions from the generator. Batteries allow the generator to operate only at its rated power, reducing run-time and fuel consumption. Energy provided by the solar array reduces fuel consumption and run-time of the generator. The generator is off for most hours providing peace and quiet at the site. Maintenance trips from Mammoth Hot Springs to the remote site are reduced. The frequency of intrusive fuel deliveries to the pristine site is reduced. And the system gives rangers a chance to interpret Greenmore » Park values to the visiting public. As an added bonus, the system provides all these benefits at a lower cost than the basecase of using only a propane-fueled generator, reducing life cycle cost by about 26%.« less
Case Study: Mobile Photovoltaic System at Bechler Meadows Ranger Station, Yellowstone National Park
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andy Walker
The mobile PV/generator hybrid system deployed at Bechler Meadows provides a number of advantages. It reduces on-site air emissions from the generator. Batteries allow the generator to operate only at its rated power, reducing run-time and fuel consumption. Energy provided by the solar array reduces fuel consumption and run-time of the generator. The generator is off for most hours providing peace and quiet at the site. Maintenance trips from Mammoth Hot Springs to the remote site are reduced. The frequency of intrusive fuel deliveries to the pristine site is reduced. And the system gives rangers a chance to interpret Greenmore » Park values to the visiting public. As an added bonus, the system provides all these benefits at a lower cost than the basecase of using only a propane-fueled generator, reducing life cycle cost by about 26%.« less
Declarative language design for interactive visualization.
Heer, Jeffrey; Bostock, Michael
2010-01-01
We investigate the design of declarative, domain-specific languages for constructing interactive visualizations. By separating specification from execution, declarative languages can simplify development, enable unobtrusive optimization, and support retargeting across platforms. We describe the design of the Protovis specification language and its implementation within an object-oriented, statically-typed programming language (Java). We demonstrate how to support rich visualizations without requiring a toolkit-specific data model and extend Protovis to enable declarative specification of animated transitions. To support cross-platform deployment, we introduce rendering and event-handling infrastructures decoupled from the runtime platform, letting designers retarget visualization specifications (e.g., from desktop to mobile phone) with reduced effort. We also explore optimizations such as runtime compilation of visualization specifications, parallelized execution, and hardware-accelerated rendering. We present benchmark studies measuring the performance gains provided by these optimizations and compare performance to existing Java-based visualization tools, demonstrating scalability improvements exceeding an order of magnitude.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Luszczek, Piotr R; Tomov, Stanimire Z; Dongarra, Jack J
We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments. The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA). Examples are given as the basis for solving linear systems' algorithms - the LU, QR, and Cholesky factorizations. To generate the extreme level of parallelism needed for the efficient use of coprocessors, algorithms of interest are redesigned and then split into well-chosen computational tasks. The tasks execution is scheduled over the computational components of a hybrid system of multi-core CPUs andmore » coprocessors using a light-weight runtime system. The use of lightweight runtime systems keeps scheduling overhead low, while enabling the expression of parallelism through otherwise sequential code. This simplifies the development efforts and allows the exploration of the unique strengths of the various hardware components.« less
Expected Fitness Gains of Randomized Search Heuristics for the Traveling Salesperson Problem.
Nallaperuma, Samadhi; Neumann, Frank; Sudholt, Dirk
2017-01-01
Randomized search heuristics are frequently applied to NP-hard combinatorial optimization problems. The runtime analysis of randomized search heuristics has contributed tremendously to our theoretical understanding. Recently, randomized search heuristics have been examined regarding their achievable progress within a fixed-time budget. We follow this approach and present a fixed-budget analysis for an NP-hard combinatorial optimization problem. We consider the well-known Traveling Salesperson Problem (TSP) and analyze the fitness increase that randomized search heuristics are able to achieve within a given fixed-time budget. In particular, we analyze Manhattan and Euclidean TSP instances and Randomized Local Search (RLS), (1+1) EA and (1+[Formula: see text]) EA algorithms for the TSP in a smoothed complexity setting, and derive the lower bounds of the expected fitness gain for a specified number of generations.
GrigoraSNPs: Optimized Analysis of SNPs for DNA Forensics.
Ricke, Darrell O; Shcherbina, Anna; Michaleas, Adam; Fremont-Smith, Philip
2018-04-16
High-throughput sequencing (HTS) of single nucleotide polymorphisms (SNPs) enables additional DNA forensic capabilities not attainable using traditional STR panels. However, the inclusion of sets of loci selected for mixture analysis, extended kinship, phenotype, biogeographic ancestry prediction, etc., can result in large panel sizes that are difficult to analyze in a rapid fashion. GrigoraSNP was developed to address the allele-calling bottleneck that was encountered when analyzing SNP panels with more than 5000 loci using HTS. GrigoraSNPs uses a MapReduce parallel data processing on multiple computational threads plus a novel locus-identification hashing strategy leveraging target sequence tags. This tool optimizes the SNP calling module of the DNA analysis pipeline with runtimes that scale linearly with the number of HTS reads. Results are compared with SNP analysis pipelines implemented with SAMtools and GATK. GrigoraSNPs removes a computational bottleneck for processing forensic samples with large HTS SNP panels. Published 2018. This article is a U.S. Government work and is in the public domain in the USA.
Advanced complex trait analysis.
Gray, A; Stewart, I; Tenesa, A
2012-12-01
The Genome-wide Complex Trait Analysis (GCTA) software package can quantify the contribution of genetic variation to phenotypic variation for complex traits. However, as those datasets of interest continue to increase in size, GCTA becomes increasingly computationally prohibitive. We present an adapted version, Advanced Complex Trait Analysis (ACTA), demonstrating dramatically improved performance. We restructure the genetic relationship matrix (GRM) estimation phase of the code and introduce the highly optimized parallel Basic Linear Algebra Subprograms (BLAS) library combined with manual parallelization and optimization. We introduce the Linear Algebra PACKage (LAPACK) library into the restricted maximum likelihood (REML) analysis stage. For a test case with 8999 individuals and 279,435 single nucleotide polymorphisms (SNPs), we reduce the total runtime, using a compute node with two multi-core Intel Nehalem CPUs, from ∼17 h to ∼11 min. The source code is fully available under the GNU Public License, along with Linux binaries. For more information see http://www.epcc.ed.ac.uk/software-products/acta. a.gray@ed.ac.uk Supplementary data are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
Gloe, Thomas; Borowka, Karsten; Winkler, Antje
2010-01-01
The analysis of lateral chromatic aberration forms another ingredient for a well equipped toolbox of an image forensic investigator. Previous work proposed its application to forgery detection1 and image source identification.2 This paper takes a closer look on the current state-of-the-art method to analyse lateral chromatic aberration and presents a new approach to estimate lateral chromatic aberration in a runtime-efficient way. Employing a set of 11 different camera models including 43 devices, the characteristic of lateral chromatic aberration is investigated in a large-scale. The reported results point to general difficulties that have to be considered in real world investigations.
Shape prior modeling using sparse representation and online dictionary learning.
Zhang, Shaoting; Zhan, Yiqiang; Zhou, Yan; Uzunbas, Mustafa; Metaxas, Dimitris N
2012-01-01
The recently proposed sparse shape composition (SSC) opens a new avenue for shape prior modeling. Instead of assuming any parametric model of shape statistics, SSC incorporates shape priors on-the-fly by approximating a shape instance (usually derived from appearance cues) by a sparse combination of shapes in a training repository. Theoretically, one can increase the modeling capability of SSC by including as many training shapes in the repository. However, this strategy confronts two limitations in practice. First, since SSC involves an iterative sparse optimization at run-time, the more shape instances contained in the repository, the less run-time efficiency SSC has. Therefore, a compact and informative shape dictionary is preferred to a large shape repository. Second, in medical imaging applications, training shapes seldom come in one batch. It is very time consuming and sometimes infeasible to reconstruct the shape dictionary every time new training shapes appear. In this paper, we propose an online learning method to address these two limitations. Our method starts from constructing an initial shape dictionary using the K-SVD algorithm. When new training shapes come, instead of re-constructing the dictionary from the ground up, we update the existing one using a block-coordinates descent approach. Using the dynamically updated dictionary, sparse shape composition can be gracefully scaled up to model shape priors from a large number of training shapes without sacrificing run-time efficiency. Our method is validated on lung localization in X-Ray and cardiac segmentation in MRI time series. Compared to the original SSC, it shows comparable performance while being significantly more efficient.
Petascale Simulation Initiative Tech Base: FY2007 Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
May, J; Chen, R; Jefferson, D
The Petascale Simulation Initiative began as an LDRD project in the middle of Fiscal Year 2004. The goal of the project was to develop techniques to allow large-scale scientific simulation applications to better exploit the massive parallelism that will come with computers running at petaflops per second. One of the major products of this work was the design and prototype implementation of a programming model and a runtime system that lets applications extend data-parallel applications to use task parallelism. By adopting task parallelism, applications can use processing resources more flexibly, exploit multiple forms of parallelism, and support more sophisticated multiscalemore » and multiphysics models. Our programming model was originally called the Symponents Architecture but is now known as Cooperative Parallelism, and the runtime software that supports it is called Coop. (However, we sometimes refer to the programming model as Coop for brevity.) We have documented the programming model and runtime system in a submitted conference paper [1]. This report focuses on the specific accomplishments of the Cooperative Parallelism project (as we now call it) under Tech Base funding in FY2007. Development and implementation of the model under LDRD funding alone proceeded to the point of demonstrating a large-scale materials modeling application using Coop on more than 1300 processors by the end of FY2006. Beginning in FY2007, the project received funding from both LDRD and the Computation Directorate Tech Base program. Later in the year, after the three-year term of the LDRD funding ended, the ASC program supported the project with additional funds. The goal of the Tech Base effort was to bring Coop from a prototype to a production-ready system that a variety of LLNL users could work with. Specifically, the major tasks that we planned for the project were: (1) Port SARS [former name of the Coop runtime system] to another LLNL platform, probably Thunder or Peloton (depending on when Peloton becomes available); (2) Improve SARS's robustness and ease-of-use, and develop user documentation; and (3) Work with LLNL code teams to help them determine how Symponents could benefit their applications. The original funding request was $296,000 for the year, and we eventually received $252,000. The remainder of this report describes our efforts and accomplishments for each of the goals listed above.« less
Fast and Exact Fiber Surfaces for Tetrahedral Meshes.
Klacansky, Pavol; Tierny, Julien; Carr, Hamish; Zhao Geng
2017-07-01
Isosurfaces are fundamental geometrical objects for the analysis and visualization of volumetric scalar fields. Recent work has generalized them to bivariate volumetric fields with fiber surfaces, the pre-image of polygons in range space. However, the existing algorithm for their computation is approximate, and is limited to closed polygons. Moreover, its runtime performance does not allow instantaneous updates of the fiber surfaces upon user edits of the polygons. Overall, these limitations prevent a reliable and interactive exploration of the space of fiber surfaces. This paper introduces the first algorithm for the exact computation of fiber surfaces in tetrahedral meshes. It assumes no restriction on the topology of the input polygon, handles degenerate cases and better captures sharp features induced by polygon bends. The algorithm also allows visualization of individual fibers on the output surface, better illustrating their relationship with data features in range space. To enable truly interactive exploration sessions, we further improve the runtime performance of this algorithm. In particular, we show that it is trivially parallelizable and that it scales nearly linearly with the number of cores. Further, we study acceleration data-structures both in geometrical domain and range space and we show how to generalize interval trees used in isosurface extraction to fiber surface extraction. Experiments demonstrate the superiority of our algorithm over previous work, both in terms of accuracy and running time, with up to two orders of magnitude speedups. This improvement enables interactive edits of range polygons with instantaneous updates of the fiber surface for exploration purpose. A VTK-based reference implementation is provided as additional material to reproduce our results.
Constructing Neuronal Network Models in Massively Parallel Environments.
Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments
Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
2014-11-01
The ARIES Collaborative, a U.S. Department of Energy Building America research team, partnered with NeighborWorks America affiliate Homeowners' Rehab Inc. (HRI) of Cambridge, Massachusetts, to study improvements to the central hydronic heating system in one of the nonprofit's housing developments. The heating controls in the three-building, 42-unit Columbia Cambridge Alliance for Spanish Tenants housing development were upgraded. Fuel use in the development was excessive compared to similar properties. A poorly insulated thermal envelope contributed to high energy bills, but adding wall insulation was not cost-effective or practical. The more cost-effective option was improving heating system efficiency. Efficient operation of themore » heating system faced several obstacles, including inflexible boiler controls and failed thermostatic radiator valves. Boiler controls were replaced with systems that offer temperature setbacks and one that controls heat based on apartment temperature in addition to outdoor temperature. Utility bill analysis shows that post-retrofit weather-normalized heating energy use was reduced by 10%-31% (average of 19%). Indoor temperature cutoff reduced boiler runtime (and therefore heating fuel consumption) by 28% in the one building in which it was implemented. Nearly all savings were obtained during night which had a lower indoor temperature cut off (68 degrees F) than day (73 degrees F). This implies that the outdoor reset curve was appropriately adjusted for this building for daytime operation. Nighttime setback of heating system supply water temperature had no discernable impact on boiler runtime or gas bills.« less
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)
1997-01-01
Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
Data integrity systems for organ contours in radiation therapy planning.
Shah, Veeraj P; Lakshminarayanan, Pranav; Moore, Joseph; Tran, Phuoc T; Quon, Harry; Deville, Curtiland; McNutt, Todd R
2018-06-12
The purpose of this research is to develop effective data integrity models for contoured anatomy in a radiotherapy workflow for both real-time and retrospective analysis. Within this study, two classes of contour integrity models were developed: data driven models and contiguousness models. The data driven models aim to highlight contours which deviate from a gross set of contours from similar disease sites and encompass the following regions of interest (ROI): bladder, femoral heads, spinal cord, and rectum. The contiguousness models, which individually analyze the geometry of contours to detect possible errors, are applied across many different ROI's and are divided into two metrics: Extent and Region Growing over volume. After analysis, we found that 70% of detected bladder contours were verified as suspicious. The spinal cord and rectum models verified that 73% and 80% of contours were suspicious respectively. The contiguousness models were the most accurate models and the Region Growing model was the most accurate submodel. 100% of the detected noncontiguous contours were verified as suspicious, but in the cases of spinal cord, femoral heads, bladder, and rectum, the Region Growing model detected additional two to five suspicious contours that the Extent model failed to detect. When conducting a blind review to detect false negatives, it was found that all the data driven models failed to detect all suspicious contours. The Region Growing contiguousness model produced zero false negatives in all regions of interest other than prostate. With regards to runtime, the contiguousness via extent model took an average of 0.2 s per contour. On the other hand, the region growing method had a longer runtime which was dependent on the number of voxels in the contour. Both contiguousness models have potential for real-time use in clinical radiotherapy while the data driven models are better suited for retrospective use. © 2018 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
High-throughput landslide modelling using computational grids
NASA Astrophysics Data System (ADS)
Wallace, M.; Metson, S.; Holcombe, L.; Anderson, M.; Newbold, D.; Brook, N.
2012-04-01
Landslides are an increasing problem in developing countries. Multiple landslides can be triggered by heavy rainfall resulting in loss of life, homes and critical infrastructure. Through computer simulation of individual slopes it is possible to predict the causes, timing and magnitude of landslides and estimate the potential physical impact. Geographical scientists at the University of Bristol have developed software that integrates a physically-based slope hydrology and stability model (CHASM) with an econometric model (QUESTA) in order to predict landslide risk over time. These models allow multiple scenarios to be evaluated for each slope, accounting for data uncertainties, different engineering interventions, risk management approaches and rainfall patterns. Individual scenarios can be computationally intensive, however each scenario is independent and so multiple scenarios can be executed in parallel. As more simulations are carried out the overhead involved in managing input and output data becomes significant. This is a greater problem if multiple slopes are considered concurrently, as is required both for landslide research and for effective disaster planning at national levels. There are two critical factors in this context: generated data volumes can be in the order of tens of terabytes, and greater numbers of simulations result in long total runtimes. Users of such models, in both the research community and in developing countries, need to develop a means for handling the generation and submission of landside modelling experiments, and the storage and analysis of the resulting datasets. Additionally, governments in developing countries typically lack the necessary computing resources and infrastructure. Consequently, knowledge that could be gained by aggregating simulation results from many different scenarios across many different slopes remains hidden within the data. To address these data and workload management issues, University of Bristol particle physicists and geographical scientists are collaborating to develop methods for providing simple and effective access to landslide models and associated simulation data. Particle physicists have valuable experience in dealing with data complexity and management due to the scale of data generated by particle accelerators such as the Large Hadron Collider (LHC). The LHC generates tens of petabytes of data every year which is stored and analysed using the Worldwide LHC Computing Grid (WLCG). Tools and concepts from the WLCG are being used to drive the development of a Software-as-a-Service (SaaS) platform to provide access to hosted landslide simulation software and data. It contains advanced data management features and allows landslide simulations to be run on the WLCG, dramatically reducing simulation runtimes by parallel execution. The simulations are accessed using a web page through which users can enter and browse input data, submit jobs and visualise results. Replication of the data ensures a local copy can be accessed should a connection to the platform be unavailable. The platform does not know the details of the simulation software it runs, so it is therefore possible to use it to run alternative models at similar scales. This creates the opportunity for activities such as model sensitivity analysis and performance comparison at scales that are impractical using standalone software.
A hybrid incremental projection method for thermal-hydraulics applications
NASA Astrophysics Data System (ADS)
Christon, Mark A.; Bakosi, Jozsef; Nadiga, Balasubramanya T.; Berndt, Markus; Francois, Marianne M.; Stagg, Alan K.; Xia, Yidong; Luo, Hong
2016-07-01
A new second-order accurate, hybrid, incremental projection method for time-dependent incompressible viscous flow is introduced in this paper. The hybrid finite-element/finite-volume discretization circumvents the well-known Ladyzhenskaya-Babuška-Brezzi conditions for stability, and does not require special treatment to filter pressure modes by either Rhie-Chow interpolation or by using a Petrov-Galerkin finite element formulation. The use of a co-velocity with a high-resolution advection method and a linearly consistent edge-based treatment of viscous/diffusive terms yields a robust algorithm for a broad spectrum of incompressible flows. The high-resolution advection method is shown to deliver second-order spatial convergence on mixed element topology meshes, and the implicit advective treatment significantly increases the stable time-step size. The algorithm is robust and extensible, permitting the incorporation of features such as porous media flow, RANS and LES turbulence models, and semi-/fully-implicit time stepping. A series of verification and validation problems are used to illustrate the convergence properties of the algorithm. The temporal stability properties are demonstrated on a range of problems with 2 ≤ CFL ≤ 100. The new flow solver is built using the Hydra multiphysics toolkit. The Hydra toolkit is written in C++ and provides a rich suite of extensible and fully-parallel components that permit rapid application development, supports multiple discretization techniques, provides I/O interfaces, dynamic run-time load balancing and data migration, and interfaces to scalable popular linear solvers, e.g., in open-source packages such as HYPRE, PETSc, and Trilinos.
A hybrid incremental projection method for thermal-hydraulics applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christon, Mark A.; Bakosi, Jozsef; Nadiga, Balasubramanya T.
In this paper, a new second-order accurate, hybrid, incremental projection method for time-dependent incompressible viscous flow is introduced in this paper. The hybrid finite-element/finite-volume discretization circumvents the well-known Ladyzhenskaya–Babuška–Brezzi conditions for stability, and does not require special treatment to filter pressure modes by either Rhie–Chow interpolation or by using a Petrov–Galerkin finite element formulation. The use of a co-velocity with a high-resolution advection method and a linearly consistent edge-based treatment of viscous/diffusive terms yields a robust algorithm for a broad spectrum of incompressible flows. The high-resolution advection method is shown to deliver second-order spatial convergence on mixed element topology meshes,more » and the implicit advective treatment significantly increases the stable time-step size. The algorithm is robust and extensible, permitting the incorporation of features such as porous media flow, RANS and LES turbulence models, and semi-/fully-implicit time stepping. A series of verification and validation problems are used to illustrate the convergence properties of the algorithm. The temporal stability properties are demonstrated on a range of problems with 2 ≤ CFL ≤ 100. The new flow solver is built using the Hydra multiphysics toolkit. The Hydra toolkit is written in C++ and provides a rich suite of extensible and fully-parallel components that permit rapid application development, supports multiple discretization techniques, provides I/O interfaces, dynamic run-time load balancing and data migration, and interfaces to scalable popular linear solvers, e.g., in open-source packages such as HYPRE, PETSc, and Trilinos.« less
A hybrid incremental projection method for thermal-hydraulics applications
Christon, Mark A.; Bakosi, Jozsef; Nadiga, Balasubramanya T.; ...
2016-07-01
In this paper, a new second-order accurate, hybrid, incremental projection method for time-dependent incompressible viscous flow is introduced in this paper. The hybrid finite-element/finite-volume discretization circumvents the well-known Ladyzhenskaya–Babuška–Brezzi conditions for stability, and does not require special treatment to filter pressure modes by either Rhie–Chow interpolation or by using a Petrov–Galerkin finite element formulation. The use of a co-velocity with a high-resolution advection method and a linearly consistent edge-based treatment of viscous/diffusive terms yields a robust algorithm for a broad spectrum of incompressible flows. The high-resolution advection method is shown to deliver second-order spatial convergence on mixed element topology meshes,more » and the implicit advective treatment significantly increases the stable time-step size. The algorithm is robust and extensible, permitting the incorporation of features such as porous media flow, RANS and LES turbulence models, and semi-/fully-implicit time stepping. A series of verification and validation problems are used to illustrate the convergence properties of the algorithm. The temporal stability properties are demonstrated on a range of problems with 2 ≤ CFL ≤ 100. The new flow solver is built using the Hydra multiphysics toolkit. The Hydra toolkit is written in C++ and provides a rich suite of extensible and fully-parallel components that permit rapid application development, supports multiple discretization techniques, provides I/O interfaces, dynamic run-time load balancing and data migration, and interfaces to scalable popular linear solvers, e.g., in open-source packages such as HYPRE, PETSc, and Trilinos.« less
NASA Astrophysics Data System (ADS)
Witham, Shawn; Boylen, Brett; Owesen, Barr; Rocchia, Walter; Alexov, Emil
2011-03-01
Electrostatic forces and energies are two of the major components that contribute to the stability, function and interaction of biological macromolecules. The calculations of the electrostatic potential distribution in such systems, which are comprised of irregularly shaped objects immersed in a water phase, is not a trivial task. In addition, an accurate model requires any missing hydrogen atoms of the corresponding structural files (Protein Data Bank, or, PDB files) to be generated in silico and, if necessary, missing atoms or residues to be predicted as well. Here we report a comprehensive suite, an academic DelPhi webserver, which allows the users to upload their structural file, calculate the components of the electrostatic energy, generate the corresponding potential (and/or concentration/dielectric constant) distribution map, and choose the appropriate force field. The webserver utilizes modern technology to take user input and construct an algorithm that suits the users specific needs. The webserver uses Clemson University's Palmetto Supercomputer Cluster to handle the DelPhi calculations, which can range anywhere from small and short computation times, to extensive and computationally demanding runtimes. The work was supported by a grant from NIGMS, NIH, grant number 1R01GM093937-01.
Evaluation of bus transit reliability in the District of Columbia.
DOT National Transportation Integrated Search
2013-11-01
Several performance metrics can be used to assess the reliability of a transit system. These include on-time arrivals, travel-time : adherence, run-time adherence, and customer satisfaction, among others. On-time arrival at bus stops is one of the pe...
Static Verification for Code Contracts
NASA Astrophysics Data System (ADS)
Fähndrich, Manuel
The Code Contracts project [3] at Microsoft Research enables programmers on the .NET platform to author specifications in existing languages such as C# and VisualBasic. To take advantage of these specifications, we provide tools for documentation generation, runtime contract checking, and static contract verification.
Dynamic Assembly, Assessment, Assurance, and Adaptation via Heterogeneous Software Connectors
2004-10-01
Versioning Connectors (MVC) Representative of runtime monitoring gauges are multiversioning gauges, which monitor and analyze different versions of...multiple versions of the same component must be merged by the connector before they are forwarded to their target components. The multiversioning
Authoritative Authoring: Software That Makes Multimedia Happen.
ERIC Educational Resources Information Center
Florio, Chris; Murie, Michael
1996-01-01
Compares seven mid- to high-end multimedia authoring software systems that combine graphics, sound, animation, video, and text for Windows and Macintosh platforms. A run-time project was created with each program using video, animation, graphics, sound, formatted text, hypertext, and buttons. (LRW)
Updates to In-Line Calculation of Photolysis Rates
How photolysis rates are calculated affects ozone and aerosol concentrations predicted by the CMAQ model and the model?s run-time. The standard configuration of CMAQ uses the inline option that calculates photolysis rates by solving the radiative transfer equation for the needed ...
NASA Astrophysics Data System (ADS)
Iacobucci, Joseph V.
The research objective for this manuscript is to develop a Rapid Architecture Alternative Modeling (RAAM) methodology to enable traceable Pre-Milestone A decision making during the conceptual phase of design of a system of systems. Rather than following current trends that place an emphasis on adding more analysis which tends to increase the complexity of the decision making problem, RAAM improves on current methods by reducing both runtime and model creation complexity. RAAM draws upon principles from computer science, system architecting, and domain specific languages to enable the automatic generation and evaluation of architecture alternatives. For example, both mission dependent and mission independent metrics are considered. Mission dependent metrics are determined by the performance of systems accomplishing a task, such as Probability of Success. In contrast, mission independent metrics, such as acquisition cost, are solely determined and influenced by the other systems in the portfolio. RAAM also leverages advances in parallel computing to significantly reduce runtime by defining executable models that are readily amendable to parallelization. This allows the use of cloud computing infrastructures such as Amazon's Elastic Compute Cloud and the PASTEC cluster operated by the Georgia Institute of Technology Research Institute (GTRI). Also, the amount of data that can be generated when fully exploring the design space can quickly exceed the typical capacity of computational resources at the analyst's disposal. To counter this, specific algorithms and techniques are employed. Streaming algorithms and recursive architecture alternative evaluation algorithms are used that reduce computer memory requirements. Lastly, a domain specific language is created to provide a reduction in the computational time of executing the system of systems models. A domain specific language is a small, usually declarative language that offers expressive power focused on a particular problem domain by establishing an effective means to communicate the semantics from the RAAM framework. These techniques make it possible to include diverse multi-metric models within the RAAM framework in addition to system and operational level trades. A canonical example was used to explore the uses of the methodology. The canonical example contains all of the features of a full system of systems architecture analysis study but uses fewer tasks and systems. Using RAAM with the canonical example it was possible to consider both system and operational level trades in the same analysis. Once the methodology had been tested with the canonical example, a Suppression of Enemy Air Defenses (SEAD) capability model was developed. Due to the sensitive nature of analyses on that subject, notional data was developed. The notional data has similar trends and properties to realistic Suppression of Enemy Air Defenses data. RAAM was shown to be traceable and provided a mechanism for a unified treatment of a variety of metrics. The SEAD capability model demonstrated lower computer runtimes and reduced model creation complexity as compared to methods currently in use. To determine the usefulness of the implementation of the methodology on current computing hardware, RAAM was tested with system of system architecture studies of different sizes. This was necessary since system of systems may be called upon to accomplish thousands of tasks. It has been clearly demonstrated that RAAM is able to enumerate and evaluate the types of large, complex design spaces usually encountered in capability based design, oftentimes providing the ability to efficiently search the entire decision space. The core algorithms for generation and evaluation of alternatives scale linearly with expected problem sizes. The SEAD capability model outputs prompted the discovery a new issue, the data storage and manipulation requirements for an analysis. Two strategies were developed to counter large data sizes, the use of portfolio views and top 'n' analysis. This proved the usefulness of the RAAM framework and methodology during Pre-Milestone A capability based analysis. (Abstract shortened by UMI.).
The Construction of 3-d Neutral Density for Arbitrary Data Sets
NASA Astrophysics Data System (ADS)
Riha, S.; McDougall, T. J.; Barker, P. M.
2014-12-01
The Neutral Density variable allows inference of water pathways from thermodynamic properties in the global ocean, and is therefore an essential component of global ocean circulation analysis. The widely used algorithm for the computation of Neutral Density yields accurate results for data sets which are close to the observed climatological ocean. Long-term numerical climate simulations, however, often generate a significant drift from present-day climate, which renders the existing algorithm inaccurate. To remedy this problem, new algorithms which operate on arbitrary data have been developed, which may potentially be used to compute Neutral Density during runtime of a numerical model.We review existing approaches for the construction of Neutral Density in arbitrary data sets, detail their algorithmic structure, and present an analysis of the computational cost for implementations on a single-CPU computer. We discuss possible strategies for the implementation in state-of-the-art numerical models, with a focus on distributed computing environments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marquez, Andres; Manzano Franco, Joseph B.; Song, Shuaiwen
With Exascale performance and its challenges in mind, one ubiquitous concern among architects is energy efficiency. Petascale systems projected to Exascale systems are unsustainable at current power consumption rates. One major contributor to system-wide power consumption is the number of memory operations leading to data movement and management techniques applied by the runtime system. To address this problem, we present the concept of the Architected Composite Data Types (ACDT) framework. The framework is made aware of data composites, assigning them a specific layout, transformations and operators. Data manipulation overhead is amortized over a larger number of elements and program performancemore » and power efficiency can be significantly improved. We developed the fundamentals of an ACDT framework on a massively multithreaded adaptive runtime system geared towards Exascale clusters. Showcasing the capability of ACDT, we exercised the framework with two representative processing kernels - Matrix Vector Multiply and the Cholesky Decomposition – applied to sparse matrices. As transformation modules, we applied optimized compress/decompress engines and configured invariant operators for maximum energy/performance efficiency. Additionally, we explored two different approaches based on transformation opaqueness in relation to the application. Under the first approach, the application is agnostic to compression and decompression activity. Such approach entails minimal changes to the original application code, but leaves out potential applicationspecific optimizations. The second approach exposes the decompression process to the application, hereby exposing optimization opportunities that can only be exploited with application knowledge. The experimental results show that the two approaches have their strengths in HW and SW respectively, where the SW approach can yield performance and power improvements that are an order of magnitude better than ACDT-oblivious, hand-optimized implementations.We consider the ACDT runtime framework an important component of compute nodes that will lead towards power efficient Exascale clusters.« less
A Case for Application Oblivious Energy-Efficient MPI Runtime
DOE Office of Scientific and Technical Information (OSTI.GOV)
Venkatesh, Akshay; Vishnu, Abhinav; Hamidouche, Khaled
Power has become the major impediment in designing large scale high-end systems. Message Passing Interface (MPI) is the {\\em de facto} communication interface used as the back-end for designing applications, programming models and runtime for these systems. Slack --- the time spent by an MPI process in a single MPI call --- provides a potential for energy and power savings, if an appropriate power reduction technique such as core-idling/Dynamic Voltage and Frequency Scaling (DVFS) can be applied without perturbing application's execution time. Existing techniques that exploit slack for power savings assume that application behavior repeats across iterations/executions. However, an increasingmore » use of adaptive, data-dependent workloads combined with system factors (OS noise, congestion) makes this assumption invalid. This paper proposes and implements Energy Aware MPI (EAM) --- an application-oblivious energy-efficient MPI runtime. EAM uses a combination of communication models of common MPI primitives (point-to-point, collective, progress, blocking/non-blocking) and an online observation of slack for maximizing energy efficiency. Each power lever incurs time overhead, which must be amortized over slack to minimize degradation. When predicted communication time exceeds a lever overhead, the lever is used {\\em as soon as possible} --- to maximize energy efficiency. When mis-prediction occurs, the lever(s) are used automatically at specific intervals for amortization. We implement EAM using MVAPICH2 and evaluate it on ten applications using up to 4096 processes. Our performance evaluation on an InfiniBand cluster indicates that EAM can reduce energy consumption by 5--41\\% in comparison to the default approach, with negligible (less than 4\\% in all cases) performance loss.« less
Library API for Z-Order Memory Layout
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bethel, E. Wes
This library provides a simple-to-use API for implementing an altnerative to traditional row-major order in-memory layout, one based on a Morton- order space filling curve (SFC) , specifically, a Z-order variant of the Morton order curve. The library enables programmers to, after a simple initialization step, to convert a multidimensional array from row-major to Z- order layouts, then use a single, generic API call to access data from any arbitrary (i,j,k) location from within the array, whether it it be stored in row- major or z-order format. The motivation for using a SFC in-memory layout is for improved spatial locality,more » which results in increased use of local high speed cache memory. The basic idea is that with row-major order layouts, a data access to some location that is nearby in index space is likely far away in physical memory, resulting in poor spatial locality and slow runtime. On the other hand, with a SFC-based layout, accesses that are nearby in index space are much more likely to also be nearby in physical memory, resulting in much better spatial locality, and better runtime performance. Numerous studies over the years have shown significant runtime performance gains are realized by using a SFC-based memory layout compared to a row-major layout, sometimes by as much as 50%, which result from the better use of the memory and cache hierarchy that are attendant with a SFC-based layout (see, for example, [Beth2012]). This library implementation is intended for use with codes that work with structured, array-based data in 2 or 3 dimensions. It is not appropriate for use with unstructured or point-based data.« less
Towards real-time photon Monte Carlo dose calculation in the cloud
NASA Astrophysics Data System (ADS)
Ziegenhein, Peter; Kozin, Igor N.; Kamerling, Cornelis Ph; Oelfke, Uwe
2017-06-01
Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
Towards real-time photon Monte Carlo dose calculation in the cloud.
Ziegenhein, Peter; Kozin, Igor N; Kamerling, Cornelis Ph; Oelfke, Uwe
2017-06-07
Near real-time application of Monte Carlo (MC) dose calculation in clinic and research is hindered by the long computational runtimes of established software. Currently, fast MC software solutions are available utilising accelerators such as graphical processing units (GPUs) or clusters based on central processing units (CPUs). Both platforms are expensive in terms of purchase costs and maintenance and, in case of the GPU, provide only limited scalability. In this work we propose a cloud-based MC solution, which offers high scalability of accurate photon dose calculations. The MC simulations run on a private virtual supercomputer that is formed in the cloud. Computational resources can be provisioned dynamically at low cost without upfront investment in expensive hardware. A client-server software solution has been developed which controls the simulations and transports data to and from the cloud efficiently and securely. The client application integrates seamlessly into a treatment planning system. It runs the MC simulation workflow automatically and securely exchanges simulation data with the server side application that controls the virtual supercomputer. Advanced encryption standards were used to add an additional security layer, which encrypts and decrypts patient data on-the-fly at the processor register level. We could show that our cloud-based MC framework enables near real-time dose computation. It delivers excellent linear scaling for high-resolution datasets with absolute runtimes of 1.1 seconds to 10.9 seconds for simulating a clinical prostate and liver case up to 1% statistical uncertainty. The computation runtimes include the transportation of data to and from the cloud as well as process scheduling and synchronisation overhead. Cloud-based MC simulations offer a fast, affordable and easily accessible alternative for near real-time accurate dose calculations to currently used GPU or cluster solutions.
Generation of large scale urban environments to support advanced sensor and seeker simulation
NASA Astrophysics Data System (ADS)
Giuliani, Joseph; Hershey, Daniel; McKeown, David, Jr.; Willis, Carla; Van, Tan
2009-05-01
One of the key aspects for the design of a next generation weapon system is the need to operate in cluttered and complex urban environments. Simulation systems rely on accurate representation of these environments and require automated software tools to construct the underlying 3D geometry and associated spectral and material properties that are then formatted for various objective seeker simulation systems. Under an Air Force Small Business Innovative Research (SBIR) contract, we have developed an automated process to generate 3D urban environments with user defined properties. These environments can be composed from a wide variety of source materials, including vector source data, pre-existing 3D models, and digital elevation models, and rapidly organized into a geo-specific visual simulation database. This intermediate representation can be easily inspected in the visible spectrum for content and organization and interactively queried for accuracy. Once the database contains the required contents, it can then be exported into specific synthetic scene generation runtime formats, preserving the relationship between geometry and material properties. To date an exporter for the Irma simulation system developed and maintained by AFRL/Eglin has been created and a second exporter to Real Time Composite Hardbody and Missile Plume (CHAMP) simulation system for real-time use is currently being developed. This process supports significantly more complex target environments than previous approaches to database generation. In this paper we describe the capabilities for content creation for advanced seeker processing algorithms simulation and sensor stimulation, including the overall database compilation process and sample databases produced and exported for the Irma runtime system. We also discuss the addition of object dynamics and viewer dynamics within the visual simulation into the Irma runtime environment.
A Formal Methodology to Design and Deploy Dependable Wireless Sensor Networks
Testa, Alessandro; Cinque, Marcello; Coronato, Antonio; Augusto, Juan Carlos
2016-01-01
Wireless Sensor Networks (WSNs) are being increasingly adopted in critical applications, where verifying the correct operation of sensor nodes is a major concern. Undesired events may undermine the mission of the WSNs. Hence, their effects need to be properly assessed before deployment, to obtain a good level of expected performance; and during the operation, in order to avoid dangerous unexpected results. In this paper, we propose a methodology that aims at assessing and improving the dependability level of WSNs by means of an event-based formal verification technique. The methodology includes a process to guide designers towards the realization of a dependable WSN and a tool (“ADVISES”) to simplify its adoption. The tool is applicable to homogeneous WSNs with static routing topologies. It allows the automatic generation of formal specifications used to check correctness properties and evaluate dependability metrics at design time and at runtime for WSNs where an acceptable percentage of faults can be defined. During the runtime, we can check the behavior of the WSN accordingly to the results obtained at design time and we can detect sudden and unexpected failures, in order to trigger recovery procedures. The effectiveness of the methodology is shown in the context of two case studies, as proof-of-concept, aiming to illustrate how the tool is helpful to drive design choices and to check the correctness properties of the WSN at runtime. Although the method scales up to very large WSNs, the applicability of the methodology may be compromised by the state space explosion of the reasoning model, which must be faced by partitioning large topologies into sub-topologies. PMID:28025568
Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoginath, Srikanth B; Perumalla, Kalyan S
2013-01-01
With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results frommore » experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.« less
The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hall, Clifford; School of Physics, Astronomy, and Computational Sciences, George Mason University, 4400 University Dr., Fairfax, VA 22030; Ji, Weixiao
2014-02-01
We present a CPU–GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-point code target, providing an effective means for the manipulation of atomistic or molecular data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform atomistic and molecular simulations are essential. Our system implements a versatile molecular engine, including inter-molecule interactions and orientational variables for performing the Metropolis Monte Carlo (MMC) algorithm,more » which is one type of Markov chain Monte Carlo. By combining memory objects with floating-point code fragments we have implemented an MMC parallel engine that entirely avoids the communication time of molecular data at runtime. Our runtime acceleration system is a forerunner of a new class of CPU–GPU algorithms exploiting memory concepts combined with threading for avoiding bus bandwidth and communication. The testbed molecular system used here is a condensed phase system of oligopyrrole chains. A benchmark shows a size scaling speedup of 60 for systems with 210,000 pyrrole monomers. Our implementation can easily be combined with MPI to connect in parallel several CPU–GPU duets. -- Highlights: •We parallelize the Metropolis Monte Carlo (MMC) algorithm on one CPU—GPU duet. •The Adaptive Tempering Monte Carlo employs MMC and profits from this CPU—GPU implementation. •Our benchmark shows a size scaling-up speedup of 62 for systems with 225,000 particles. •The testbed involves a polymeric system of oligopyrroles in the condensed phase. •The CPU—GPU parallelization includes dipole—dipole and Mie—Jones classic potentials.« less
Virtual Observation System for Earth System Model: An Application to ACME Land Model Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Dali; Yuan, Fengming; Hernandez, Benjamin
Investigating and evaluating physical-chemical-biological processes within an Earth system model (EMS) can be very challenging due to the complexity of both model design and software implementation. A virtual observation system (VOS) is presented to enable interactive observation of these processes during system simulation. Based on advance computing technologies, such as compiler-based software analysis, automatic code instrumentation, and high-performance data transport, the VOS provides run-time observation capability, in-situ data analytics for Earth system model simulation, model behavior adjustment opportunities through simulation steering. A VOS for a terrestrial land model simulation within the Accelerated Climate Modeling for Energy model is also presentedmore » to demonstrate the implementation details and system innovations.« less
Virtual Observation System for Earth System Model: An Application to ACME Land Model Simulations
Wang, Dali; Yuan, Fengming; Hernandez, Benjamin; ...
2017-01-01
Investigating and evaluating physical-chemical-biological processes within an Earth system model (EMS) can be very challenging due to the complexity of both model design and software implementation. A virtual observation system (VOS) is presented to enable interactive observation of these processes during system simulation. Based on advance computing technologies, such as compiler-based software analysis, automatic code instrumentation, and high-performance data transport, the VOS provides run-time observation capability, in-situ data analytics for Earth system model simulation, model behavior adjustment opportunities through simulation steering. A VOS for a terrestrial land model simulation within the Accelerated Climate Modeling for Energy model is also presentedmore » to demonstrate the implementation details and system innovations.« less
Symbolic Execution Enhanced System Testing
NASA Technical Reports Server (NTRS)
Davies, Misty D.; Pasareanu, Corina S.; Raman, Vishwanath
2012-01-01
We describe a testing technique that uses information computed by symbolic execution of a program unit to guide the generation of inputs to the system containing the unit, in such a way that the unit's, and hence the system's, coverage is increased. The symbolic execution computes unit constraints at run-time, along program paths obtained by system simulations. We use machine learning techniques treatment learning and function fitting to approximate the system input constraints that will lead to the satisfaction of the unit constraints. Execution of system input predictions either uncovers new code regions in the unit under analysis or provides information that can be used to improve the approximation. We have implemented the technique and we have demonstrated its effectiveness on several examples, including one from the aerospace domain.
Analog Input Data Acquisition Software
NASA Technical Reports Server (NTRS)
Arens, Ellen
2009-01-01
DAQ Master Software allows users to easily set up a system to monitor up to five analog input channels and save the data after acquisition. This program was written in LabVIEW 8.0, and requires the LabVIEW runtime engine 8.0 to run the executable.
Runtime Simulation for Post-Disaster Data Fusion Visualization
2006-10-01
Center for Multisource Information Fusion ( CMIF ) The State University of New York at Buffalo Buffalo, NY 14260 USA kesh@eng.buffalo.edu ABSTRACT...Fusion ( CMIF ) The State University of New York at Buffalo Buffalo, NY 14260 USA 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING
Planning And Reasoning For A Telerobot
NASA Technical Reports Server (NTRS)
Peters, Stephen F.; Mittman, David S.; Collins, Carol E.; O'Meara Callahan, Jacquelyn S.; Rokey, Mark J.
1992-01-01
Document discusses research and development of Telerobot Interactive Planning System (TIPS). Goal in development of TIPS is to enable it to accept instructions from operator, then command run-time controller to execute operations to execute instructions. Challenges in transferring technology from testbed to operational system discussed.
A Fault Oblivious Extreme-Scale Execution Environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKie, Jim
The FOX project, funded under the ASCR X-stack I program, developed systems software and runtime libraries for a new approach to the data and work distribution for massively parallel, fault oblivious application execution. Our work was motivated by the premise that exascale computing systems will provide a thousand-fold increase in parallelism and a proportional increase in failure rate relative to today’s machines. To deliver the capability of exascale hardware, the systems software must provide the infrastructure to support existing applications while simultaneously enabling efficient execution of new programming models that naturally express dynamic, adaptive, irregular computation; coupled simulations; and massivemore » data analysis in a highly unreliable hardware environment with billions of threads of execution. Our OS research has prototyped new methods to provide efficient resource sharing, synchronization, and protection in a many-core compute node. We have experimented with alternative task/dataflow programming models and shown scalability in some cases to hundreds of thousands of cores. Much of our software is in active development through open source projects. Concepts from FOX are being pursued in next generation exascale operating systems. Our OS work focused on adaptive, application tailored OS services optimized for multi → many core processors. We developed a new operating system NIX that supports role-based allocation of cores to processes which was released to open source. We contributed to the IBM FusedOS project, which promoted the concept of latency-optimized and throughput-optimized cores. We built a task queue library based on distributed, fault tolerant key-value store and identified scaling issues. A second fault tolerant task parallel library was developed, based on the Linda tuple space model, that used low level interconnect primitives for optimized communication. We designed fault tolerance mechanisms for task parallel computations employing work stealing for load balancing that scaled to the largest existing supercomputers. Finally, we implemented the Elastic Building Blocks runtime, a library to manage object-oriented distributed software components. To support the research, we won two INCITE awards for time on Intrepid (BG/P) and Mira (BG/Q). Much of our work has had impact in the OS and runtime community through the ASCR Exascale OS/R workshop and report, leading to the research agenda of the Exascale OS/R program. Our project was, however, also affected by attrition of multiple PIs. While the PIs continued to participate and offer guidance as time permitted, losing these key individuals was unfortunate both for the project and for the DOE HPC community.« less
Testing MODFLOW-LGR for simulating flow around buried Quaternary valleys - synthetic test cases
NASA Astrophysics Data System (ADS)
Vilhelmsen, T. N.; Christensen, S.
2009-12-01
In this study the Local Grid Refinement (LGR) method developed for MODFLOW-2005 (Mehl and Hill, 2005) is utilized to describe groundwater flow in areas containing buried Quaternary valley structures. The tests are conducted as comparative analysis between simulations run with a globally refined model, a locally refined model, and a globally coarse model, respectively. The models vary from simple one layer models to more complex ones with up to 25 model layers. The comparisons of accuracy are conducted within the locally refined area and focus on water budgets, simulated heads, and simulated particle traces. Simulations made with the globally refined model are used as reference (regarded as “true” values). As expected, for all test cases the application of local grid refinement resulted in more accurate results than when using the globally coarse model. A significant advantage of utilizing MODFLOW-LGR was that it allows increased numbers of model layers to better resolve complex geology within local areas. This resulted in more accurate simulations than when using either a globally coarse model grid or a locally refined model with lower geological resolution. Improved accuracy in the latter case could not be expected beforehand because difference in geological resolution between the coarse parent model and the refined child model contradicts the assumptions of the Darcy weighted interpolation used in MODFLOW-LGR. With respect to model runtimes, it was sometimes found that the runtime for the locally refined model is much longer than for the globally refined model. This was the case even when the closure criteria were relaxed compared to the globally refined model. These results are contradictory to those presented by Mehl and Hill (2005). Furthermore, in the complex cases it took some testing (model runs) to identify the closure criteria and the damping factor that secured convergence, accurate solutions, and reasonable runtimes. For our cases this is judged to be a serious disadvantage of applying MODFLOW-LGR. Another disadvantage in the studied cases was that the MODFLOW-LGR results proved to be somewhat dependent on the correction method used at the parent-child model interface. This indicates that when applying MODFLOW-LGR there is a need for thorough and case-specific considerations regarding choice of correction method. References: Mehl, S. and M. C. Hill (2005). "MODFLOW-2005, THE U.S. GEOLOGICAL SURVEY MODULAR GROUND-WATER MODEL - DOCUMENTATION OF SHARED NODE LOCAL GRID REFINEMENT (LGR) AND THE BOUNDARY FLOW AND HEAD (BFH) PACKAGE " U.S. Geological Survey Techniques and Methods 6-A12
Efficient processing of two-dimensional arrays with C or C++
Donato, David I.
2017-07-20
Because fast and efficient serial processing of raster-graphic images and other two-dimensional arrays is a requirement in land-change modeling and other applications, the effects of 10 factors on the runtimes for processing two-dimensional arrays with C and C++ are evaluated in a comparative factorial study. This study’s factors include the choice among three C or C++ source-code techniques for array processing; the choice of Microsoft Windows 7 or a Linux operating system; the choice of 4-byte or 8-byte array elements and indexes; and the choice of 32-bit or 64-bit memory addressing. This study demonstrates how programmer choices can reduce runtimes by 75 percent or more, even after compiler optimizations. Ten points of practical advice for faster processing of two-dimensional arrays are offered to C and C++ programmers. Further study and the development of a C and C++ software test suite are recommended.Key words: array processing, C, C++, compiler, computational speed, land-change modeling, raster-graphic image, two-dimensional array, software efficiency
Jeagle: a JAVA Runtime Verification Tool
NASA Technical Reports Server (NTRS)
DAmorim, Marcelo; Havelund, Klaus
2005-01-01
We introduce the temporal logic Jeagle and its supporting tool for runtime verification of Java programs. A monitor for an Jeagle formula checks if a finite trace of program events satisfies the formula. Jeagle is a programming oriented extension of the rule-based powerful Eagle logic that has been shown to be capable of defining and implementing a range of finite trace monitoring logics, including future and past time temporal logic, real-time and metric temporal logics, interval logics, forms of quantified temporal logics, and so on. Monitoring is achieved on a state-by-state basis avoiding any need to store the input trace. Jeagle extends Eagle with constructs for capturing parameterized program events such as method calls and method returns. Parameters can be the objects that methods are called upon, arguments to methods, and return values. Jeagle allows one to refer to these in formulas. The tool performs automated program instrumentation using AspectJ. We show the transformational semantics of Jeagle.
Behavior-aware cache hierarchy optimization for low-power multi-core embedded systems
NASA Astrophysics Data System (ADS)
Zhao, Huatao; Luo, Xiao; Zhu, Chen; Watanabe, Takahiro; Zhu, Tianbo
2017-07-01
In modern embedded systems, the increasing number of cores requires efficient cache hierarchies to ensure data throughput, but such cache hierarchies are restricted by their tumid size and interference accesses which leads to both performance degradation and wasted energy. In this paper, we firstly propose a behavior-aware cache hierarchy (BACH) which can optimally allocate the multi-level cache resources to many cores and highly improved the efficiency of cache hierarchy, resulting in low energy consumption. The BACH takes full advantage of the explored application behaviors and runtime cache resource demands as the cache allocation bases, so that we can optimally configure the cache hierarchy to meet the runtime demand. The BACH was implemented on the GEM5 simulator. The experimental results show that energy consumption of a three-level cache hierarchy can be saved from 5.29% up to 27.94% compared with other key approaches while the performance of the multi-core system even has a slight improvement counting in hardware overhead.
Cloudweaver: Adaptive and Data-Driven Workload Manager for Generic Clouds
NASA Astrophysics Data System (ADS)
Li, Rui; Chen, Lei; Li, Wen-Syan
Cloud computing denotes the latest trend in application development for parallel computing on massive data volumes. It relies on clouds of servers to handle tasks that used to be managed by an individual server. With cloud computing, software vendors can provide business intelligence and data analytic services for internet scale data sets. Many open source projects, such as Hadoop, offer various software components that are essential for building a cloud infrastructure. Current Hadoop (and many others) requires users to configure cloud infrastructures via programs and APIs and such configuration is fixed during the runtime. In this chapter, we propose a workload manager (WLM), called CloudWeaver, which provides automated configuration of a cloud infrastructure for runtime execution. The workload management is data-driven and can adapt to dynamic nature of operator throughput during different execution phases. CloudWeaver works for a single job and a workload consisting of multiple jobs running concurrently, which aims at maximum throughput using a minimum set of processors.
NASA Technical Reports Server (NTRS)
Meyer, Donald; Uchenik, Igor
2007-01-01
The PPC750 Performance Monitor (Perfmon) is a computer program that helps the user to assess the performance characteristics of application programs running under the Wind River VxWorks real-time operating system on a PPC750 computer. Perfmon generates a user-friendly interface and collects performance data by use of performance registers provided by the PPC750 architecture. It processes and presents run-time statistics on a per-task basis over a repeating time interval (typically, several seconds or minutes) specified by the user. When the Perfmon software module is loaded with the user s software modules, it is available for use through Perfmon commands, without any modification of the user s code and at negligible performance penalty. Per-task run-time performance data made available by Perfmon include percentage time, number of instructions executed per unit time, dispatch ratio, stack high water mark, and level-1 instruction and data cache miss rates. The performance data are written to a file specified by the user or to the serial port of the computer
Multitasking runtime systems for the Cedar Multiprocessor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guzzi, M.D.
1986-07-01
The programming of a MIMD machine is more complex than for SISD and SIMD machines. The multiple computational resources of the machine must be made available to the programming language compiler and to the programmer so that multitasking programs may be written. This thesis will explore the additional complexity of programming a MIMD machine, the Cedar Multiprocessor specifically, and the multitasking runtime system necessary to provide multitasking resources to the user. First, the problem will be well defined: the Cedar machine, its operating system, the programming language, and multitasking concepts will be described. Second, a solution to the problem, calledmore » macrotasking, will be proposed. This solution provides multitasking facilities to the programmer at a very coarse level with many visible machine dependencies. Third, an alternate solution, called microtasking, will be proposed. This solution provides multitasking facilities of a much finer grain. This solution does not depend so rigidly on the specific architecture of the machine. Finally, the two solutions will be compared for effectiveness. 12 refs., 16 figs.« less
A performance comparison of the IBM RS/6000 and the Astronautics ZS-1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, W.M.; Abraham, S.G.; Davidson, E.S.
1991-01-01
Concurrent uniprocessor architectures, of which vector and superscalar are two examples, are designed to capitalize on fine-grain parallelism. The authors have developed a performance evaluation method for comparing and improving these architectures, and in this article they present the methodology and a detailed case study of two machines. The runtime of many programs is dominated by time spent in loop constructs - for example, Fortran Do-loops. Loops generally comprise two logical processes: The access process generates addresses for memory operations while the execute process operates on floating-point data. Memory access patterns typically can be generated independently of the data inmore » the execute process. This independence allows the access process to slip ahead, thereby hiding memory latency. The IBM 360/91 was designed in 1967 to achieve slip dynamically, at runtime. One CPU unit executes integer operations while another handles floating-point operations. Other machines, including the VAX 9000 and the IBM RS/6000, use a similar approach.« less
Certification Strategies using Run-Time Safety Assurance for Part 23 Autopilot Systems
NASA Technical Reports Server (NTRS)
Hook, Loyd R.; Clark, Matthew; Sizoo, David; Skoog, Mark A.; Brady, James
2016-01-01
Part 23 aircraft operation, and in particular general aviation, is relatively unsafe when compared to other common forms of vehicle travel. Currently, there exists technologies that could increase safety statistics for these aircraft; however, the high burden and cost of performing the requisite safety critical certification processes for these systems limits their proliferation. For this reason, many entities, including the Federal Aviation Administration, NASA, and the US Air Force, are considering new options for certification for technologies that will improve aircraft safety. Of particular interest, are low cost autopilot systems for general aviation aircraft, as these systems have the potential to positively and significantly affect safety statistics. This paper proposes new systems and techniques, leveraging run-time verification, for the assurance of general aviation autopilot systems, which would be used to supplement the current certification process and provide a viable path for near-term low-cost implementation. In addition, discussions on preliminary experimentation and building the assurance case for a system, based on these principles, is provided.
Reversible polymorphism-aware phylogenetic models and their application to tree inference.
Schrempf, Dominik; Minh, Bui Quang; De Maio, Nicola; von Haeseler, Arndt; Kosiol, Carolin
2016-10-21
We present a reversible Polymorphism-Aware Phylogenetic Model (revPoMo) for species tree estimation from genome-wide data. revPoMo enables the reconstruction of large scale species trees for many within-species samples. It expands the alphabet of DNA substitution models to include polymorphic states, thereby, naturally accounting for incomplete lineage sorting. We implemented revPoMo in the maximum likelihood software IQ-TREE. A simulation study and an application to great apes data show that the runtimes of our approach and standard substitution models are comparable but that revPoMo has much better accuracy in estimating trees, divergence times and mutation rates. The advantage of revPoMo is that an increase of sample size per species improves estimations but does not increase runtime. Therefore, revPoMo is a valuable tool with several applications, from speciation dating to species tree reconstruction. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Self-Powered Multiparameter Health Sensor.
Tobola, Andreas; Leutheuser, Heike; Pollak, Markus; Spies, Peter; Hofmann, Christian; Weigand, Christian; Eskofier, Bjoern M; Fischer, Georg
2018-01-01
Wearable health sensors are about to change our health system. While several technological improvements have been presented to enhance performance and energy-efficiency, battery runtime is still a critical concern for practical use of wearable biomedical sensor systems. The runtime limitation is directly related to the battery size, which is another concern regarding practicality and customer acceptance. We introduced ULPSEK-Ultra-Low-Power Sensor Evaluation Kit-for evaluation of biomedical sensors and monitoring applications (http://ulpsek.com). ULPSEK includes a multiparameter sensor measuring and processing electrocardiogram, respiration, motion, body temperature, and photoplethysmography. Instead of a battery, ULPSEK is powered using an efficient body heat harvester. The harvester produced 171 W on average, which was sufficient to power the sensor below 25 C ambient temperature. We present design issues regarding the power supply and the power distribution network of the ULPSEK sensor platform. Due to the security aspect of self-powered health sensors, we suggest a hybrid solution consisting of a battery charged by a harvester.
NASA Astrophysics Data System (ADS)
Freund, Richard F.; Braun, Tracy D.; Kussow, Matthew; Godfrey, Michael; Koyama, Terry
2001-07-01
SPANR (Schedule, Plan, Assess Networked Resources) is (i) a pre-run, off-line planning and (ii) a runtime, just-in-time scheduling mechanism. It is designed to support primarily commercial applications in that it optimizes throughput rather than individual jobs (unless they have highest priority). Thus it is a tool for a commercial production manager to maximize total work. First the SPANR Planner is presented showing the ability to do predictive 'what-if' planning. It can answer such questions as, (i) what is the overall effect of acquiring new hardware or (ii) what would be the effect of a different scheduler. The ability of the SPANR Planner to formulate in advance tree-trimming strategies is useful in several commercial applications, such as electronic design or pharmaceutical simulations. The SPANR Planner is demonstrated using a variety of benchmarks. The SPANR Runtime Scheduler (RS) is briefly presented. The SPANR RS can provide benefit for several commercial applications, such as airframe design and financial applications. Finally a design is shown whereby SPANR can provide scheduling advice to most resource management systems.
Loxdale, H D; Rhodes, J A; Fox, J S
1985-07-01
A study of variation in three peptidases (PEP-3 to -5) in a parthenogenetic S. avenae field population at Rothamsted using serial one-dimensional polyacrylamide gel electrophoresis (involving changes of gel concentration and electrophoretic run-time) increased the overall number of "allozymes" (mobility variants) detected from 10 under standard conditions (6% gels, 2 h run-time) to 22, as well as revealing putative heterozygous banding patterns under some test conditions. However, an examination of another enzyme, 6-phosphogluconate dehydrogenase (6-PGD) in a sample collected at Rothamsted the following year failed, using a combination of serial methods (changes of gel concentration) and isoelectric focusing, to increase the total number of 6-PGD bands separated (seven, none of which appeared to be allelic in origin). Nevertheless, some major bands were split into several bands, whilst other infrequent bands were either gained or lost. The findings are briefly discussed.
Luenser, Arne; Kussmann, Jörg; Ochsenfeld, Christian
2016-09-28
We present a (sub)linear-scaling algorithm to determine indirect nuclear spin-spin coupling constants at the Hartree-Fock and Kohn-Sham density functional levels of theory. Employing efficient integral algorithms and sparse algebra routines, an overall (sub)linear scaling behavior can be obtained for systems with a non-vanishing HOMO-LUMO gap. Calculations on systems with over 1000 atoms and 20 000 basis functions illustrate the performance and accuracy of our reference implementation. Specifically, we demonstrate that linear algebra dominates the runtime of conventional algorithms for 10 000 basis functions and above. Attainable speedups of our method exceed 6 × in total runtime and 10 × in the linear algebra steps for the tested systems. Furthermore, a convergence study of spin-spin couplings of an aminopyrazole peptide upon inclusion of the water environment is presented: using the new method it is shown that large solvent spheres are necessary to converge spin-spin coupling values.
NASA Astrophysics Data System (ADS)
Farag, Mohammed; Fleckenstein, Matthias; Habibi, Saeid
2017-02-01
Model-order reduction and minimization of the CPU run-time while maintaining the model accuracy are critical requirements for real-time implementation of lithium-ion electrochemical battery models. In this paper, an isothermal, continuous, piecewise-linear, electrode-average model is developed by using an optimal knot placement technique. The proposed model reduces the univariate nonlinear function of the electrode's open circuit potential dependence on the state of charge to continuous piecewise regions. The parameterization experiments were chosen to provide a trade-off between extensive experimental characterization techniques and purely identifying all parameters using optimization techniques. The model is then parameterized in each continuous, piecewise-linear, region. Applying the proposed technique cuts down the CPU run-time by around 20%, compared to the reduced-order, electrode-average model. Finally, the model validation against real-time driving profiles (FTP-72, WLTP) demonstrates the ability of the model to predict the cell voltage accurately with less than 2% error.
Jiang, Juanjuan; Tian, Lei; Huang, Yiling; Yan, Yan; Li, Yishi
2016-08-01
A liquid chromatography-tandem mass spectrometry (LC-MS) method to quantify tolvaptan and its two main metabolites and applied to human study was first developed and validated as a measure of compliance in clinical research. Because of the structure similarity of tolvaptan and its multiple metabolites, the method was optimized to obtain a chromatographic and MS separation of the endogenous interference and isotope ions as well as high analysis throughput. Tolvaptan, its two main metabolites and the internal standard were extracted from human serum (0.1mL) using solid-phase extraction, separated on a Waters nova-pak C18 column (150×3.9mm, 5μm) using isocratic elution with a mobile phase composed of acetonitrile, water and formic acid (65:35:0.25, v/v/v). The total run-time was shortened to 3.5min. The mass transition ranges under positive electrospray ionisation that were monitored for quantitation included m/z 449-252 for tolvaptan, m/z 479-252 for metabolite DM-4103, m/z 481-252 for metabolite DM-4107 and m/z 463-266 for the internal standard (IS). The limit of quantification in plasma for all three analytes was 1ng/mL. The method was validated over a linear range from 1 to 500ng/mL for all three analytes with acceptable inter- and intra-assay precision and accuracy. The stability of the analytes was determined to be suitable for routine laboratory practices. The method was successfully applied to samples taken from research volunteers who ingested a 15mg tolvaptan tablet. Copyright © 2016. Published by Elsevier B.V.
On the Run-Time Optimization of the Boolean Logic of a Program.
ERIC Educational Resources Information Center
Cadolino, C.; Guazzo, M.
1982-01-01
Considers problem of optimal scheduling of Boolean expression (each Boolean variable represents binary outcome of program module) on single-processor system. Optimization discussed consists of finding operand arrangement that minimizes average execution costs representing consumption of resources (elapsed time, main memory, number of…
Airlift Operation Modeling Using Discrete Event Simulation (DES)
2009-12-01
Java ......................................................................................................20 2. Simkit...JRE Java Runtime Environment JVM Java Virtual Machine lbs Pounds LAM Load Allocation Mode LRM Landing Spot Reassignment Mode LEGO Listener Event...SOFTWARE DEVELOPMENT ENVIRONMENT The following are the software tools and development environment used for constructing the models. 1. Java Java
A run-time control architecture for the JPL telerobot
NASA Technical Reports Server (NTRS)
Balaram, J.; Lokshin, A.; Kreutz, K.; Beahan, J.
1987-01-01
An architecture for implementing the process-level decision making for a hierarchically structured telerobot currently being implemented at the Jet Propolusion Laboratory (JPL) is described. Constraints on the architecture design, architecture partitioning concepts, and a detailed description of the existing and proposed implementations are provided.
Lost in Interaction in IMS Learning Design Runtime Environments
ERIC Educational Resources Information Center
Derntl, Michael; Neumann, Susanne; Oberhuemer, Petra
2014-01-01
Educators are exploiting the advantages of advanced web-based collaboration technologies and massive online interactions. Interactions between learners and human or nonhuman resources therefore play an increasingly important pedagogical role, and the way these interactions are expressed in the user interface of virtual learning environments is…
HAL/S-FC compiler system functional specification
NASA Technical Reports Server (NTRS)
1974-01-01
Compiler organization is discussed, including overall compiler structure, internal data transfer, compiler development, and code optimization. The user, system, and SDL interfaces are described, along with compiler system requirements. Run-time software support package and restrictions and dependencies are also considered of the HAL/S-FC system.
Multi-Scale Peak and Trough Detection Optimised for Periodic and Quasi-Periodic Neuroscience Data.
Bishop, Steven M; Ercole, Ari
2018-01-01
The reliable detection of peaks and troughs in physiological signals is essential to many investigative techniques in medicine and computational biology. Analysis of the intracranial pressure (ICP) waveform is a particular challenge due to multi-scale features, a changing morphology over time and signal-to-noise limitations. Here we present an efficient peak and trough detection algorithm that extends the scalogram approach of Scholkmann et al., and results in greatly improved algorithm runtime performance. Our improved algorithm (modified Scholkmann) was developed and analysed in MATLAB R2015b. Synthesised waveforms (periodic, quasi-periodic and chirp sinusoids) were degraded with white Gaussian noise to achieve signal-to-noise ratios down to 5 dB and were used to compare the performance of the original Scholkmann and modified Scholkmann algorithms. The modified Scholkmann algorithm has false-positive (0%) and false-negative (0%) detection rates identical to the original Scholkmann when applied to our test suite. Actual compute time for a 200-run Monte Carlo simulation over a multicomponent noisy test signal was 40.96 ± 0.020 s (mean ± 95%CI) for the original Scholkmann and 1.81 ± 0.003 s (mean ± 95%CI) for the modified Scholkmann, demonstrating the expected improvement in runtime complexity from [Formula: see text] to [Formula: see text]. The accurate interpretation of waveform data to identify peaks and troughs is crucial in signal parameterisation, feature extraction and waveform identification tasks. Modification of a standard scalogram technique has produced a robust algorithm with linear computational complexity that is particularly suited to the challenges presented by large, noisy physiological datasets. The algorithm is optimised through a single parameter and can identify sub-waveform features with minimal additional overhead, and is easily adapted to run in real time on commodity hardware.
HOPE: A Python just-in-time compiler for astrophysical computations
NASA Astrophysics Data System (ADS)
Akeret, J.; Gamper, L.; Amara, A.; Refregier, A.
2015-04-01
The Python programming language is becoming increasingly popular for scientific applications due to its simplicity, versatility, and the broad range of its libraries. A drawback of this dynamic language, however, is its low runtime performance which limits its applicability for large simulations and for the analysis of large data sets, as is common in astrophysics and cosmology. While various frameworks have been developed to address this limitation, most focus on covering the complete language set, and either force the user to alter the code or are not able to reach the full speed of an optimised native compiled language. In order to combine the ease of Python and the speed of C++, we developed HOPE, a specialised Python just-in-time (JIT) compiler designed for numerical astrophysical applications. HOPE focuses on a subset of the language and is able to translate Python code into C++ while performing numerical optimisation on mathematical expressions at runtime. To enable the JIT compilation, the user only needs to add a decorator to the function definition. We assess the performance of HOPE by performing a series of benchmarks and compare its execution speed with that of plain Python, C++ and the other existing frameworks. We find that HOPE improves the performance compared to plain Python by a factor of 2 to 120, achieves speeds comparable to that of C++, and often exceeds the speed of the existing solutions. We discuss the differences between HOPE and the other frameworks, as well as future extensions of its capabilities. The fully documented HOPE package is available at http://hope.phys.ethz.ch and is published under the GPLv3 license on PyPI and GitHub.
GATE Monte Carlo simulation in a cloud computing environment
NASA Astrophysics Data System (ADS)
Rowedder, Blake Austin
The GEANT4-based GATE is a unique and powerful Monte Carlo (MC) platform, which provides a single code library allowing the simulation of specific medical physics applications, e.g. PET, SPECT, CT, radiotherapy, and hadron therapy. However, this rigorous yet flexible platform is used only sparingly in the clinic due to its lengthy calculation time. By accessing the powerful computational resources of a cloud computing environment, GATE's runtime can be significantly reduced to clinically feasible levels without the sizable investment of a local high performance cluster. This study investigated a reliable and efficient execution of GATE MC simulations using a commercial cloud computing services. Amazon's Elastic Compute Cloud was used to launch several nodes equipped with GATE. Job data was initially broken up on the local computer, then uploaded to the worker nodes on the cloud. The results were automatically downloaded and aggregated on the local computer for display and analysis. Five simulations were repeated for every cluster size between 1 and 20 nodes. Ultimately, increasing cluster size resulted in a decrease in calculation time that could be expressed with an inverse power model. Comparing the benchmark results to the published values and error margins indicated that the simulation results were not affected by the cluster size and thus that integrity of a calculation is preserved in a cloud computing environment. The runtime of a 53 minute long simulation was decreased to 3.11 minutes when run on a 20-node cluster. The ability to improve the speed of simulation suggests that fast MC simulations are viable for imaging and radiotherapy applications. With high power computing continuing to lower in price and accessibility, implementing Monte Carlo techniques with cloud computing for clinical applications will continue to become more attractive.
Modeling variably saturated subsurface solute transport with MODFLOW-UZF and MT3DMS
Morway, Eric D.; Niswonger, Richard G.; Langevin, Christian D.; Bailey, Ryan T.; Healy, Richard W.
2013-01-01
The MT3DMS groundwater solute transport model was modified to simulate solute transport in the unsaturated zone by incorporating the unsaturated-zone flow (UZF1) package developed for MODFLOW. The modified MT3DMS code uses a volume-averaged approach in which Lagrangian-based UZF1 fluid fluxes and storage changes are mapped onto a fixed grid. Referred to as UZF-MT3DMS, the linked model was tested against published benchmarks solved analytically as well as against other published codes, most frequently the U.S. Geological Survey's Variably-Saturated Two-Dimensional Flow and Transport Model. Results from a suite of test cases demonstrate that the modified code accurately simulates solute advection, dispersion, and reaction in the unsaturated zone. Two- and three-dimensional simulations also were investigated to ensure unsaturated-saturated zone interaction was simulated correctly. Because the UZF1 solution is analytical, large-scale flow and transport investigations can be performed free from the computational and data burdens required by numerical solutions to Richards' equation. Results demonstrate that significant simulation runtime savings can be achieved with UZF-MT3DMS, an important development when hundreds or thousands of model runs are required during parameter estimation and uncertainty analysis. Three-dimensional variably saturated flow and transport simulations revealed UZF-MT3DMS to have runtimes that are less than one tenth of the time required by models that rely on Richards' equation. Given its accuracy and efficiency, and the wide-spread use of both MODFLOW and MT3DMS, the added capability of unsaturated-zone transport in this familiar modeling framework stands to benefit a broad user-ship.
Modeling variably saturated subsurface solute transport with MODFLOW-UZF and MT3DMS.
Morway, Eric D; Niswonger, Richard G; Langevin, Christian D; Bailey, Ryan T; Healy, Richard W
2013-03-01
The MT3DMS groundwater solute transport model was modified to simulate solute transport in the unsaturated zone by incorporating the unsaturated-zone flow (UZF1) package developed for MODFLOW. The modified MT3DMS code uses a volume-averaged approach in which Lagrangian-based UZF1 fluid fluxes and storage changes are mapped onto a fixed grid. Referred to as UZF-MT3DMS, the linked model was tested against published benchmarks solved analytically as well as against other published codes, most frequently the U.S. Geological Survey's Variably-Saturated Two-Dimensional Flow and Transport Model. Results from a suite of test cases demonstrate that the modified code accurately simulates solute advection, dispersion, and reaction in the unsaturated zone. Two- and three-dimensional simulations also were investigated to ensure unsaturated-saturated zone interaction was simulated correctly. Because the UZF1 solution is analytical, large-scale flow and transport investigations can be performed free from the computational and data burdens required by numerical solutions to Richards' equation. Results demonstrate that significant simulation runtime savings can be achieved with UZF-MT3DMS, an important development when hundreds or thousands of model runs are required during parameter estimation and uncertainty analysis. Three-dimensional variably saturated flow and transport simulations revealed UZF-MT3DMS to have runtimes that are less than one tenth of the time required by models that rely on Richards' equation. Given its accuracy and efficiency, and the wide-spread use of both MODFLOW and MT3DMS, the added capability of unsaturated-zone transport in this familiar modeling framework stands to benefit a broad user-ship. Published 2012. This article is a U.S. Government work and is in the public domain in the USA.
Issues Involved in Developing Ada Real-Time Systems
1989-02-15
expensive modifications to the compiler or Ada runtime system to fit a particular application. Whether we can solve the problems of programming real - time systems in...lock in solutions to problems that are not yet well understood in standards as rigorous as the Ada language. Moreover, real - time systems typically have
Channels: Runtime System Infrastructure for Security-typed Languages
2008-10-01
Milan , Italy, September 2005. Springer-Verlag. [2] D. E. Bell and L. J. LaPadula. Secure computer system: Uni- fied exposition and Multics...Proceedings of the USENIX Annual Technical Conference, Santa Clara, CA, USA, June 2007. To appear. [9] S. Kamara, S. Fahmy, E. Schultz , F. Kerschbaum, and
NASA Technical Reports Server (NTRS)
Smith, R.
1975-01-01
SAIL, a high level ALGOL language for the PDP-10, is extended to operate under the TENEX time sharing system without executing DEC system calls. A large set of TENEX oriented runtime routines are added to allow complete access to TENEX. The emphasis is on compatibility of programs across time sharing systems and integrity of the language.
Notional Machines and Introductory Programming Education
ERIC Educational Resources Information Center
Sorva, Juha
2013-01-01
This article brings together, summarizes, and comments on several threads of research that have contributed to our understanding of the challenges that novice programmers face when learning about the runtime dynamics of programs and the role of the computer in program execution. More specifically, the review covers the literature on programming…
Mobile Authoring of Open Educational Resources as Reusable Learning Objects
ERIC Educational Resources Information Center
Kinshuk; Jesse, Ryan
2013-01-01
E-learning technologies have allowed authoring and playback of standardized reusable learning objects (RLO) for several years. Effective mobile learning requires similar functionality at both design time and runtime. Mobile devices can play RLO using applications like SMILE, mobile access to a learning management system (LMS), or other systems…
On polynomial selection for the general number field sieve
NASA Astrophysics Data System (ADS)
Kleinjung, Thorsten
2006-12-01
The general number field sieve (GNFS) is the asymptotically fastest algorithm for factoring large integers. Its runtime depends on a good choice of a polynomial pair. In this article we present an improvement of the polynomial selection method of Montgomery and Murphy which has been used in recent GNFS records.
Multichannel feedforward control schemes with coupling compensation for active sound profiling
NASA Astrophysics Data System (ADS)
Mosquera-Sánchez, Jaime A.; Desmet, Wim; de Oliveira, Leopoldo P. R.
2017-05-01
Active sound profiling includes a number of control techniques that enables the equalization, rather than the mere reduction, of acoustic noise. Challenges may rise when trying to achieve distinct targeted sound profiles simultaneously at multiple locations, e.g., within a vehicle cabin. This paper introduces distributed multichannel control schemes for independently tailoring structural borne sound reaching a number of locations within a cavity. The proposed techniques address the cross interactions amongst feedforward active sound profiling units, which compensate for interferences of the primary sound at each location of interest by exchanging run-time data amongst the control units, while attaining the desired control targets. Computational complexity, convergence, and stability of the proposed multichannel schemes are examined in light of the physical system at which they are implemented. The tuning performance of the proposed algorithms is benchmarked with the centralized and pure-decentralized control schemes through computer simulations on a simplified numerical model, which has also been subjected to plant magnitude variations. Provided that the representation of the plant is accurate enough, the proposed multichannel control schemes have been shown as the only ones that properly deliver targeted active sound profiling tasks at each error sensor location. Experimental results in a 1:3-scaled vehicle mock-up further demonstrate that the proposed schemes are able to attain reductions of more than 60 dB upon periodic disturbances at a number of positions, while resolving cross-channel interferences. Moreover, when the sensor/actuator placement is found as defective at a given frequency, the inclusion of a regularization parameter in the cost function is seen to not hinder the proper operation of the proposed compensation schemes, at the time that it assures their stability, at the expense of losing control performance.
Fast Exact Search in Hamming Space With Multi-Index Hashing.
Norouzi, Mohammad; Punjani, Ali; Fleet, David J
2014-06-01
There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used as such, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. The approach is storage efficient and straight-forward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline for datasets of up to one billion codes of 64, 128, or 256 bits.
Type-Separated Bytecode - Its Construction and Evaluation
NASA Astrophysics Data System (ADS)
Adler, Philipp; Amme, Wolfram
A lot of constrained systems still use interpreters to run mobile applications written in Java. These interpreters demand for only a few resources. On the other hand, it is difficult to apply optimizations during the runtime of the application. Annotations could be used to achieve a simpler and faster code analysis, which would allow optimizations even for interpreters on constrained devices. Unfortunately, there is no viable way of transporting annotations to and verifying them at the code consumer. In this paper we present type-separated bytecode as an intermediate representation which allows to safely transport annotations as type-extensions. We have implemented several versions of this system and show that it is possible to obtain a performance comparable to Java Bytecode, even though we use a type-separated system with annotations.
Automated multi-dimensional purification of tagged proteins.
Sigrell, Jill A; Eklund, Pär; Galin, Markus; Hedkvist, Lotta; Liljedahl, Pia; Johansson, Christine Markeland; Pless, Thomas; Torstenson, Karin
2003-01-01
The capacity for high throughput purification (HTP) is essential in fields such as structural genomics where large numbers of protein samples are routinely characterized in, for example, studies of structural determination, functionality and drug development. Proteins required for such analysis must be pure and homogenous and available in relatively large amounts. AKTA 3D system is a powerful automated protein purification system, which minimizes preparation, run-time and repetitive manual tasks. It has the capacity to purify up to 6 different His6- or GST-tagged proteins per day and can produce 1-50 mg protein per run at >90% purity. The success of automated protein purification increases with careful experimental planning. Protocol, columns and buffers need to be chosen with the final application area for the purified protein in mind.
Three-dimensional murine airway segmentation in micro-CT images
NASA Astrophysics Data System (ADS)
Shi, Lijun; Thiesse, Jacqueline; McLennan, Geoffrey; Hoffman, Eric A.; Reinhardt, Joseph M.
2007-03-01
Thoracic imaging for small animals has emerged as an important tool for monitoring pulmonary disease progression and therapy response in genetically engineered animals. Micro-CT is becoming the standard thoracic imaging modality in small animal imaging because it can produce high-resolution images of the lung parenchyma, vasculature, and airways. Segmentation, measurement, and visualization of the airway tree is an important step in pulmonary image analysis. However, manual analysis of the airway tree in micro-CT images can be extremely time-consuming since a typical dataset is usually on the order of several gigabytes in size. Automated and semi-automated tools for micro-CT airway analysis are desirable. In this paper, we propose an automatic airway segmentation method for in vivo micro-CT images of the murine lung and validate our method by comparing the automatic results to manual tracing. Our method is based primarily on grayscale morphology. The results show good visual matches between manually segmented and automatically segmented trees. The average true positive volume fraction compared to manual analysis is 91.61%. The overall runtime for the automatic method is on the order of 30 minutes per volume compared to several hours to a few days for manual analysis.
Use of a Modern Polymerization Pilot-Plant for Undergraduate Control Projects.
ERIC Educational Resources Information Center
Mendoza-Bustos, S. A.; And Others
1991-01-01
Described is a project where students gain experience in handling large volumes of hazardous materials, process start up and shut down, equipment failures, operational variations, scaling up, equipment cleaning, and run-time scheduling while working in a modern pilot plant. Included are the system design, experimental procedures, and results. (KR)
Three-Dimensional Near Infrared Imaging of Pathophysiological Changes Within the Breast
2008-03-01
StO2: Oxygenation Saturatin (in %); H20: Waiter content (in %); a: Scattering Amplitude; b: Scattering Power Typically in these cases of noisy...estimated from Fig. 2(a) for the NN/NM ratio involved. The deviation in run-time that occurs in practice is likely due to the cost of memory management
The Challenge of Content Creation to Facilitate Personalized E-Learning Experiences
ERIC Educational Resources Information Center
Turker, Ali; Gorgun, Ilhami; Conlan, Owen
2006-01-01
The runtime creation of pedagogically coherent learning content for an individual learner's needs and preferences is a considerable challenge. By selecting and combining appropriate learning assets into a new learning object such needs and preferences may be accounted for. However, to assure coherence, these objects should be consumed within…
Remote mission specialist - A study in real-time, adaptive planning
NASA Technical Reports Server (NTRS)
Rokey, Mark J.
1990-01-01
A high-level planning architecture for robotic operations is presented. The remote mission specialist integrates high-level directives with low-level primitives executable by a run-time controller for command of autonomous servicing activities. The planner has been designed to address such issues as adaptive plan generation, real-time performance, and operator intervention.
An Ontology and a Software Framework for Competency Modeling and Management
ERIC Educational Resources Information Center
Paquette, Gilbert
2007-01-01
The importance given to competency management is well justified. Acquiring new competencies is the central goal of any education or knowledge management process. Thus, it must be embedded in any software framework as an instructional engineering tool, to inform the runtime environment of the knowledge that is processed by actors, and their…
AADL and Model-based Engineering
2014-10-20
and MBE Feiler, Oct 20, 2014 © 2014 Carnegie Mellon University We Rely on Software for Safe Aircraft Operation Embedded software systems ...D eveloper Compute Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency...confusion Hardware Engineer Why do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software
Regulatory Conformance Checking: Logic and Logical Form
ERIC Educational Resources Information Center
Dinesh, Nikhil
2010-01-01
We consider the problem of checking whether an organization conforms to a body of regulation. Conformance is studied in a runtime verification setting. The regulation is translated to a logic, from which we synthesize monitors. The monitors are evaluated as the state of an organization evolves over time, raising an alarm if a violation is…
Raising the Degree of Service-Orientation of a SOA-based Software System: A Case Study
2009-12-01
protocols, as well as executable processes that can be compiled into runtime scripts” [2] The Business Process Modeling Notation ( BPMN ) provides a...Notation ( BPMN ) 1.2. Jan. 2009. URL: http://www.omg.org/spec/ BPMN /1.2/ [25] .NET Framework Developer Center. .NET Remoting Overview. 2003. URL: http
Evaluating COCA--What Do Teachers Think?
ERIC Educational Resources Information Center
Major, Nigel
COCA, which consists of both authoring tools and a runtime shell, is a system intended to provide teachers with genuine access to intelligent tutoring system (ITS) technology and to give them control over domain material and teaching strategies. To evaluate the effectiveness of COCA, 10 subjects (five university teachers and five school teachers)…
HOME PAGE Image of NCEP Logo WHERE AMERICA'S CLIMATE AND WEATHER SERVICES BEGIN NCEP Products Inventory Image of horizontal rule North American Ensemble Forecast System (NAEFS) Products Updated: 02/27 /2014 * Products Information about the NAEFS Models CC is the model cycle runtime (i.e. 00, 06, 12, 18
1991-09-01
addition, support for Saltz was provided by NSF from NSF Grant ASC-8819374. i 1, introduction Over the past fewyers, ,we have devoped -methods needed to... network . In Third Conf. on Hypercube Concurrent Computers and Applications, pages 241-27278, 1988. [17] G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel
Crosscutting Runtime Adaptations of LD Execution
ERIC Educational Resources Information Center
Zarraonandia, Telmo; Dodero, Juan Manuel; Fernandez, Camino
2006-01-01
In this paper, the authors describe a mechanism for the introduction of small variations in the original learning design process defined in a particular Unit of Learning (UoL). The objective is to increase the UoL reusability by offering the designers an alternative to introduce slight variations on the original design instead of creating a new…
Temporal Decompostion of a Distribution System Quasi-Static Time-Series Simulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mather, Barry A; Hunsberger, Randolph J
This paper documents the first phase of an investigation into reducing runtimes of complex OpenDSS models through parallelization. As the method seems promising, future work will quantify - and further mitigate - errors arising from this process. In this initial report, we demonstrate how, through the use of temporal decomposition, the run times of a complex distribution-system-level quasi-static time series simulation can be reduced roughly proportional to the level of parallelization. Using this method, the monolithic model runtime of 51 hours was reduced to a minimum of about 90 minutes. As expected, this comes at the expense of control- andmore » voltage-errors at the time-slice boundaries. All evaluations were performed using a real distribution circuit model with the addition of 50 PV systems - representing a mock complex PV impact study. We are able to reduce induced transition errors through the addition of controls initialization, though small errors persist. The time savings with parallelization are so significant that we feel additional investigation to reduce control errors is warranted.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barbara Chapman
OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close tomore » DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.« less
Adaptive Impact-Driven Detection of Silent Data Corruption for HPC Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Di, Sheng; Cappello, Franck
For exascale HPC applications, silent data corruption (SDC) is one of the most dangerous problems because there is no indication that there are errors during the execution. We propose an adaptive impact-driven method that can detect SDCs dynamically. The key contributions are threefold. (1) We carefully characterize 18 real-world HPC applications and discuss the runtime data features, as well as the impact of the SDCs on their execution results. (2) We propose an impact-driven detection model that does not blindly improve the prediction accuracy, but instead detects only influential SDCs to guarantee user-acceptable execution results. (3) Our solution can adaptmore » to dynamic prediction errors based on local runtime data and can automatically tune detection ranges for guaranteeing low false alarms. Experiments show that our detector can detect 80-99.99% of SDCs with a false alarm rate less that 1% of iterations for most cases. The memory cost and detection overhead are reduced to 15% and 6.3%, respectively, for a large majority of applications.« less
Advanced Software V&V for Civil Aviation and Autonomy
NASA Technical Reports Server (NTRS)
Brat, Guillaume P.
2017-01-01
With the advances in high-computing platform (e.g., advanced graphical processing units or multi-core processors), computationally-intensive software techniques such as the ones used in artificial intelligence or formal methods have provided us with an opportunity to further increase safety in the aviation industry. Some of these techniques have facilitated building safety at design time, like in aircraft engines or software verification and validation, and others can introduce safety benefits during operations as long as we adapt our processes. In this talk, I will present how NASA is taking advantage of these new software techniques to build in safety at design time through advanced software verification and validation, which can be applied earlier and earlier in the design life cycle and thus help also reduce the cost of aviation assurance. I will then show how run-time techniques (such as runtime assurance or data analytics) offer us a chance to catch even more complex problems, even in the face of changing and unpredictable environments. These new techniques will be extremely useful as our aviation systems become more complex and more autonomous.
An ant colony optimization based feature selection for web page classification.
Saraç, Esra; Özel, Selma Ayşe
2014-01-01
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.
BLESS 2: accurate, memory-efficient and fast error correction method.
Heo, Yun; Ramachandran, Anand; Hwu, Wen-Mei; Ma, Jian; Chen, Deming
2016-08-01
The most important features of error correction tools for sequencing data are accuracy, memory efficiency and fast runtime. The previous version of BLESS was highly memory-efficient and accurate, but it was too slow to handle reads from large genomes. We have developed a new version of BLESS to improve runtime and accuracy while maintaining a small memory usage. The new version, called BLESS 2, has an error correction algorithm that is more accurate than BLESS, and the algorithm has been parallelized using hybrid MPI and OpenMP programming. BLESS 2 was compared with five top-performing tools, and it was found to be the fastest when it was executed on two computing nodes using MPI, with each node containing twelve cores. Also, BLESS 2 showed at least 11% higher gain while retaining the memory efficiency of the previous version for large genomes. Freely available at https://sourceforge.net/projects/bless-ec dchen@illinois.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Programmable computing with a single magnetoresistive element
NASA Astrophysics Data System (ADS)
Ney, A.; Pampuch, C.; Koch, R.; Ploog, K. H.
2003-10-01
The development of transistor-based integrated circuits for modern computing is a story of great success. However, the proved concept for enhancing computational power by continuous miniaturization is approaching its fundamental limits. Alternative approaches consider logic elements that are reconfigurable at run-time to overcome the rigid architecture of the present hardware systems. Implementation of parallel algorithms on such `chameleon' processors has the potential to yield a dramatic increase of computational speed, competitive with that of supercomputers. Owing to their functional flexibility, `chameleon' processors can be readily optimized with respect to any computer application. In conventional microprocessors, information must be transferred to a memory to prevent it from getting lost, because electrically processed information is volatile. Therefore the computational performance can be improved if the logic gate is additionally capable of storing the output. Here we describe a simple hardware concept for a programmable logic element that is based on a single magnetic random access memory (MRAM) cell. It combines the inherent advantage of a non-volatile output with flexible functionality which can be selected at run-time to operate as an AND, OR, NAND or NOR gate.
Vasbinder, E; Van der Weken, G; Vander Heyden, Y; Baeyens, W R G; Debunne, A; Remon, J P; García-Campaña, A M
2004-01-01
An ion-pair high performance liquid chromatographic method was developed for the simultaneous determination of p-aminosalicylic acid (PAS) and its degradation product m-aminophenol (MAP) in a newly developed multiparticular drug delivery system. Owing to the concentration differences of PAS and MAP, acetanilide and sulfanilic acid were used as internal standards, respectively. The separation was performed on a Chromolith SpeedROD RP-18e column, a new packing material consisting of monolithic rods of highly porous silica. The mobile phase composition was of 20 mm phosphate buffer, 20 mm tetrabutylammonium hydrogen sulphate and 16% (v/v) methanol adjusted to pH 6.8, at a flow-rate of 1.0 mL/min, resulting in a run-time of about 6 min. Detection was by UV at 233 nm. The method was validated and proved to be useful for stability testing of the new dosage form. Separation efficiency was compared between the new packing material Chromolith SpeedROD RP-18e and the conventional reversed-phase cartridge LiChroCART 125-4 (5 microm). A robustness test was carried out on both columns and different separation parameters (retention, resolution, run time, temperature) were determined. Copyright 2004 John Wiley & Sons, Ltd.
Kravtsova, Oxana Yu; Paramonov, Sergey A; Vasilevich, Natalya I; Kazyulkin, Denis N; Vlasova, Ekaterina; Engsig, Michael
2013-12-01
A specific, sensitive, rapid and reproducible method for the determination of flomoxef in human plasma using high-performance liquid chromatography-tandem mass spectrometry was developed and validated. Flomoxef was detected using an electrospay ionization method operated in negative-ion mode. Chromatographic separation was performed in gradient elution mode on a Luna® C18(2) column (3 μM, 20 × 4.0 mm) at a flow rate of 1 mL/min and runtime 3.5 min. The mobile phase consisted of acetonitrile and water containing 0.1% formic acid as additive. Extraction of flomoxef from plasma and precipitation of plasma proteins was performed with acetonitrile with an absolute recovery of 86.4 ± 1.6%. The calibration curve was linear with a correlation coefficient of 0.999 over the concentration range 10-5000 ng/mL and the lower limit of quantification was 10 ng/mL. The intra- and inter-day precisions were <11.8%, while the accuracy ranged from 99.6 to 109.0%. A stability study of flomoxef revealed that it could be successfully analyzed at 4 ºС over 24 h, but it was unstable in solutions at room temperature during short-term storage (4 h) and several freeze-thaw cycles. Copyright © 2013 John Wiley & Sons, Ltd.
Scalability of Several Asynchronous Many-Task Models for In Situ Statistical Analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pebay, Philippe Pierre; Bennett, Janine Camille; Kolla, Hemanth
This report is a sequel to [PB16], in which we provided a first progress report on research and development towards a scalable, asynchronous many-task, in situ statistical analysis engine using the Legion runtime system. This earlier work included a prototype implementation of a proposed solution, using a proxy mini-application as a surrogate for a full-scale scientific simulation code. The first scalability studies were conducted with the above on modestly-sized experimental clusters. In contrast, in the current work we have integrated our in situ analysis engines with a full-size scientific application (S3D, using the Legion-SPMD model), and have conducted nu- mericalmore » tests on the largest computational platform currently available for DOE science ap- plications. We also provide details regarding the design and development of a light-weight asynchronous collectives library. We describe how this library is utilized within our SPMD- Legion S3D workflow, and compare the data aggregation technique deployed herein to the approach taken within our previous work.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guan, Qiang
At exascale, the challenge becomes to develop applications that run at scale and use exascale platforms reliably, efficiently, and flexibly. Workflows become much more complex because they must seamlessly integrate simulation and data analytics. They must include down-sampling, post-processing, feature extraction, and visualization. Power and data transfer limitations require these analysis tasks to be run in-situ or in-transit. We expect successful workflows will comprise multiple linked simulations along with tens of analysis routines. Users will have limited development time at scale and, therefore, must have rich tools to develop, debug, test, and deploy applications. At this scale, successful workflows willmore » compose linked computations from an assortment of reliable, well-defined computation elements, ones that can come and go as required, based on the needs of the workflow over time. We propose a novel framework that utilizes both virtual machines (VMs) and software containers to create a workflow system that establishes a uniform build and execution environment (BEE) beyond the capabilities of current systems. In this environment, applications will run reliably and repeatably across heterogeneous hardware and software. Containers, both commercial (Docker and Rocket) and open-source (LXC and LXD), define a runtime that isolates all software dependencies from the machine operating system. Workflows may contain multiple containers that run different operating systems, different software, and even different versions of the same software. We will run containers in open-source virtual machines (KVM) and emulators (QEMU) so that workflows run on any machine entirely in user-space. On this platform of containers and virtual machines, we will deliver workflow software that provides services, including repeatable execution, provenance, checkpointing, and future proofing. We will capture provenance about how containers were launched and how they interact to annotate workflows for repeatable and partial re-execution. We will coordinate the physical snapshots of virtual machines with parallel programming constructs, such as barriers, to automate checkpoint and restart. We will also integrate with HPC-specific container runtimes to gain access to accelerators and other specialized hardware to preserve native performance. Containers will link development to continuous integration. When application developers check code in, it will automatically be tested on a suite of different software and hardware architectures.« less
Differential correlation for sequencing data.
Siska, Charlotte; Kechris, Katerina
2017-01-19
Several methods have been developed to identify differential correlation (DC) between pairs of molecular features from -omics studies. Most DC methods have only been tested with microarrays and other platforms producing continuous and Gaussian-like data. Sequencing data is in the form of counts, often modeled with a negative binomial distribution making it difficult to apply standard correlation metrics. We have developed an R package for identifying DC called Discordant which uses mixture models for correlations between features and the Expectation Maximization (EM) algorithm for fitting parameters of the mixture model. Several correlation metrics for sequencing data are provided and tested using simulations. Other extensions in the Discordant package include additional modeling for different types of differential correlation, and faster implementation, using a subsampling routine to reduce run-time and address the assumption of independence between molecular feature pairs. With simulations and breast cancer miRNA-Seq and RNA-Seq data, we find that Spearman's correlation has the best performance among the tested correlation methods for identifying differential correlation. Application of Spearman's correlation in the Discordant method demonstrated the most power in ROC curves and sensitivity/specificity plots, and improved ability to identify experimentally validated breast cancer miRNA. We also considered including additional types of differential correlation, which showed a slight reduction in power due to the additional parameters that need to be estimated, but more versatility in applications. Finally, subsampling within the EM algorithm considerably decreased run-time with negligible effect on performance. A new method and R package called Discordant is presented for identifying differential correlation with sequencing data. Based on comparisons with different correlation metrics, this study suggests Spearman's correlation is appropriate for sequencing data, but other correlation metrics are available to the user depending on the application and data type. The Discordant method can also be extended to investigate additional DC types and subsampling with the EM algorithm is now available for reduced run-time. These extensions to the R package make Discordant more robust and versatile for multiple -omics studies.
Statthaler, Karina; Schwarz, Andreas; Steyrl, David; Kobler, Reinmar; Höller, Maria Katharina; Brandstetter, Julia; Hehenberger, Lea; Bigga, Marvin; Müller-Putz, Gernot
2017-12-28
In this work, we share our experiences made at the world-wide first CYBATHLON, an event organized by the Eidgenössische Technische Hochschule Zürich (ETH Zürich), which took place in Zurich in October 2016. It is a championship for severely motor impaired people using assistive prototype devices to compete against each other. Our team, the Graz BCI Racing Team MIRAGE91 from Graz University of Technology, participated in the discipline "Brain-Computer Interface Race". A brain-computer interface (BCI) is a device facilitating control of applications via the user's thoughts. Prominent applications include assistive technology such as wheelchairs, neuroprostheses or communication devices. In the CYBATHLON BCI Race, pilots compete in a BCI-controlled computer game. We report on setting up our team, the BCI customization to our pilot including long term training and the final BCI system. Furthermore, we describe CYBATHLON participation and analyze our CYBATHLON result. We found that our pilot was compliant over the whole time and that we could significantly reduce the average runtime between start and finish from initially 178 s to 143 s. After the release of the final championship specifications with shorter track length, the average runtime converged to 120 s. We successfully participated in the qualification race at CYBATHLON 2016, but performed notably worse than during training, with a runtime of 196 s. We speculate that shifts in the features, due to the nonstationarities in the electroencephalogram (EEG), but also arousal are possible reasons for the unexpected result. Potential counteracting measures are discussed. The CYBATHLON 2016 was a great opportunity for our student team. We consolidated our theoretical knowledge and turned it into practice, allowing our pilot to play a computer game. However, further research is required to make BCI technology invariant to non-task related changes of the EEG.
NASA Astrophysics Data System (ADS)
Safari, A.; Sharifi, M. A.; Amjadiparvar, B.
2010-05-01
The GRACE mission has substantiated the low-low satellite-to-satellite tracking (LL-SST) concept. The LL-SST configuration can be combined with the previously realized high-low SST concept in the CHAMP mission to provide a much higher accuracy. The line of sight (LOS) acceleration difference between the GRACE satellite pair is the mostly used observable for mapping the global gravity field of the Earth in terms of spherical harmonic coefficients. In this paper, mathematical formulae for LOS acceleration difference observations have been derived and the corresponding linear system of equations has been set up for spherical harmonic up to degree and order 120. The total number of unknowns is 14641. Such a linear equation system can be solved with iterative solvers or direct solvers. However, the runtime of direct methods or that of iterative solvers without a suitable preconditioner increases tremendously. This is the reason why we need a more sophisticated method to solve the linear system of problems with a large number of unknowns. Multiplicative variant of the Schwarz alternating algorithm is a domain decomposition method, which allows it to split the normal matrix of the system into several smaller overlaped submatrices. In each iteration step the multiplicative variant of the Schwarz alternating algorithm solves linear systems with the matrices obtained from the splitting successively. It reduces both runtime and memory requirements drastically. In this paper we propose the Multiplicative Schwarz Alternating Algorithm (MSAA) for solving the large linear system of gravity field recovery. The proposed algorithm has been tested on the International Association of Geodesy (IAG)-simulated data of the GRACE mission. The achieved results indicate the validity and efficiency of the proposed algorithm in solving the linear system of equations from accuracy and runtime points of view. Keywords: Gravity field recovery, Multiplicative Schwarz Alternating Algorithm, Low-Low Satellite-to-Satellite Tracking
Enforcement of entailment constraints in distributed service-based business processes.
Hummer, Waldemar; Gaubatz, Patrick; Strembeck, Mark; Zdun, Uwe; Dustdar, Schahram
2013-11-01
A distributed business process is executed in a distributed computing environment. The service-oriented architecture (SOA) paradigm is a popular option for the integration of software services and execution of distributed business processes. Entailment constraints, such as mutual exclusion and binding constraints, are important means to control process execution. Mutually exclusive tasks result from the division of powerful rights and responsibilities to prevent fraud and abuse. In contrast, binding constraints define that a subject who performed one task must also perform the corresponding bound task(s). We aim to provide a model-driven approach for the specification and enforcement of task-based entailment constraints in distributed service-based business processes. Based on a generic metamodel, we define a domain-specific language (DSL) that maps the different modeling-level artifacts to the implementation-level. The DSL integrates elements from role-based access control (RBAC) with the tasks that are performed in a business process. Process definitions are annotated using the DSL, and our software platform uses automated model transformations to produce executable WS-BPEL specifications which enforce the entailment constraints. We evaluate the impact of constraint enforcement on runtime performance for five selected service-based processes from existing literature. Our evaluation demonstrates that the approach correctly enforces task-based entailment constraints at runtime. The performance experiments illustrate that the runtime enforcement operates with an overhead that scales well up to the order of several ten thousand logged invocations. Using our DSL annotations, the user-defined process definition remains declarative and clean of security enforcement code. Our approach decouples the concerns of (non-technical) domain experts from technical details of entailment constraint enforcement. The developed framework integrates seamlessly with WS-BPEL and the Web services technology stack. Our prototype implementation shows the feasibility of the approach, and the evaluation points to future work and further performance optimizations.
NASA Orbital Debris Engineering Model ORDEM2008 (Beta Version)
NASA Technical Reports Server (NTRS)
Stansbery, Eugene G.; Krisko, Paula H.
2009-01-01
This is an interim document intended to accompany the beta-release of the ORDEM2008 model. As such it provides the user with a guide for its use, a list of its capabilities, a brief summary of model development, and appendices included to educate the user as to typical runtimes for different orbit configurations. More detailed documentation will be delivered with the final product. ORDEM2008 supersedes NASA's previous model - ORDEM2000. The availability of new sensor and in situ data, the re-analysis of older data, and the development of new analytical techniques, has enabled the construction of this more comprehensive and sophisticated model. Integrated with the software is an upgraded graphical user interface (GUI), which uses project-oriented organization and provides the user with graphical representations of numerous output data products. These range from the conventional average debris size vs. flux magnitude for chosen analysis orbits, to the more complex color-contoured two-dimensional (2-D) directional flux diagrams in terms of local spacecraft pitch and yaw.
Addressing multi-label imbalance problem of surgical tool detection using CNN.
Sahu, Manish; Mukhopadhyay, Anirban; Szengel, Angelika; Zachow, Stefan
2017-06-01
A fully automated surgical tool detection framework is proposed for endoscopic video streams. State-of-the-art surgical tool detection methods rely on supervised one-vs-all or multi-class classification techniques, completely ignoring the co-occurrence relationship of the tools and the associated class imbalance. In this paper, we formulate tool detection as a multi-label classification task where tool co-occurrences are treated as separate classes. In addition, imbalance on tool co-occurrences is analyzed and stratification techniques are employed to address the imbalance during convolutional neural network (CNN) training. Moreover, temporal smoothing is introduced as an online post-processing step to enhance runtime prediction. Quantitative analysis is performed on the M2CAI16 tool detection dataset to highlight the importance of stratification, temporal smoothing and the overall framework for tool detection. The analysis on tool imbalance, backed by the empirical results, indicates the need and superiority of the proposed framework over state-of-the-art techniques.
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deelman, Ewa; Carothers, Christopher; Mandal, Anirban
Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
Deelman, Ewa; Carothers, Christopher; Mandal, Anirban; ...
2015-07-14
Here we report that computational science is well established as the third pillar of scientific discovery and is on par with experimentation and theory. However, as we move closer toward the ability to execute exascale calculations and process the ensuing extreme-scale amounts of data produced by both experiments and computations alike, the complexity of managing the compute and data analysis tasks has grown beyond the capabilities of domain scientists. Therefore, workflow management systems are absolutely necessary to ensure current and future scientific discoveries. A key research question for these workflow management systems concerns the performance optimization of complex calculation andmore » data analysis tasks. The central contribution of this article is a description of the PANORAMA approach for modeling and diagnosing the run-time performance of complex scientific workflows. This approach integrates extreme-scale systems testbed experimentation, structured analytical modeling, and parallel systems simulation into a comprehensive workflow framework called Pegasus for understanding and improving the overall performance of complex scientific workflows.« less
Adaptive subdomain modeling: A multi-analysis technique for ocean circulation models
NASA Astrophysics Data System (ADS)
Altuntas, Alper; Baugh, John
2017-07-01
Many coastal and ocean processes of interest operate over large temporal and geographical scales and require a substantial amount of computational resources, particularly when engineering design and failure scenarios are also considered. This study presents an adaptive multi-analysis technique that improves the efficiency of these computations when multiple alternatives are being simulated. The technique, called adaptive subdomain modeling, concurrently analyzes any number of child domains, with each instance corresponding to a unique design or failure scenario, in addition to a full-scale parent domain providing the boundary conditions for its children. To contain the altered hydrodynamics originating from the modifications, the spatial extent of each child domain is adaptively adjusted during runtime depending on the response of the model. The technique is incorporated in ADCIRC++, a re-implementation of the popular ADCIRC ocean circulation model with an updated software architecture designed to facilitate this adaptive behavior and to utilize concurrent executions of multiple domains. The results of our case studies confirm that the method substantially reduces computational effort while maintaining accuracy.
ERIC Educational Resources Information Center
Boulehouache, Soufiane; Maamri, Ramdane; Sahnoun, Zaidi
2015-01-01
The Pedagogical Agents (PAs) for Mobile Learning (m-learning) must be able not only to adapt the teaching to the learner knowledge level and profile but also to ensure the pedagogical efficiency within unpredictable changing runtime contexts. Therefore, to deal with this issue, this paper proposes a Context-aware Self-Adaptive Fractal Component…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jain, Atul K.
The overall objectives of this DOE funded project is to combine scientific and computational challenges in climate modeling by expanding our understanding of the biogeophysical-biogeochemical processes and their interactions in the northern high latitudes (NHLs) using an earth system modeling (ESM) approach, and by adopting an adaptive parallel runtime system in an ESM to achieve efficient and scalable climate simulations through improved load balancing algorithms.
A Hybrid Constraint Representation and Reasoning Framework
NASA Technical Reports Server (NTRS)
Golden, Keith; Pang, Wan-Lin
2003-01-01
This paper introduces JNET, a novel constraint representation and reasoning framework that supports procedural constraints and constraint attachments, providing a flexible way of integrating the constraint reasoner with a run- time software environment. Attachments in JNET are constraints over arbitrary Java objects, which are defined using Java code, at runtime, with no changes to the JNET source code.
A Review of Generic Program Visualization Systems for Introductory Programming Education
ERIC Educational Resources Information Center
Sorva, Juha; Karavirta, Ville; Malmi, Lauri
2013-01-01
This article is a survey of program visualization systems intended for teaching beginners about the runtime behavior of computer programs. Our focus is on generic systems that are capable of illustrating many kinds of programs and behaviors. We inclusively describe such systems from the last three decades and review findings from their empirical…
Prototyping distributed simulation networks
NASA Technical Reports Server (NTRS)
Doubleday, Dennis L.
1990-01-01
Durra is a declarative language designed to support application-level programming. The use of Durra is illustrated to describe a simple distributed application: a simulation of a collection of networked vehicle simulators. It is shown how the language is used to describe the application, its components and structure, and how the runtime executive provides for the execution of the application.
2004-02-26
Shorter payback periods After 19 Cost Benefit of Powerlink Rule of Thumb for Powerlink: Powerlink becomes more cost effective beyond 16 controlled...web enabled control (and management software) Increase in level of integration between building systems Increase in new features, functions, benefits ...focus on reducing run-time via Scheduling, Sensing, Switching Growing focus on payback Direct energy cost (with demand) Additional maintenance benefits
Final report: Compiled MPI. Cost-Effective Exascale Application Development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gropp, William Douglas
2015-12-21
This is the final report on Compiled MPI: Cost-Effective Exascale Application Development, and summarizes the results under this project. The project investigated runtime enviroments that improve the performance of MPI (Message-Passing Interface) programs; work at Illinois in the last period of this project looked at optimizing data access optimizations expressed with MPI datatypes.
Runtime Systems for Extreme Scale Platforms
2013-12-01
Programming in Java, PPPJ ’11, (New York, NY, USA), pp. 51–61, ACM, 2011. [76] Y. Guo, R. Barik , R. Raman, and V. Sarkar, “Work-first and help-first...95] R. Barik , J. Zhao, D. Grove, I. Peshansky, Z. Budimlić, and V. Sarkar, “Commu- nication Optimizations for Distributed-Memory X10 Programs,” in
Existing Whole-House Solutions Case Study: Build San Antonio Green, San Antonio, Texas
DOE Office of Scientific and Technical Information (OSTI.GOV)
none,
2013-06-01
PNNL, FSEC, and CalcsPlus provided technical assistance to Build San Antonio Green on three deep energy retrofits. For this gut rehab they replaced the old roof with a steeper roof and replaced drywall while adding insulation, new HVAC, sealed ducts, transfer grilles, outside air run-time ventilation, new lighting and water heater.
Sensor-Free or Sensor-Full: A Comparison of Data Modalities in Multi-Channel Affect Detection
ERIC Educational Resources Information Center
Paquette, Luc; Rowe, Jonathan; Baker, Ryan; Mott, Bradford; Lester, James; DeFalco, Jeanine; Brawner, Keith; Sottilare, Robert; Georgoulas, Vasiliki
2016-01-01
Computational models that automatically detect learners' affective states are powerful tools for investigating the interplay of affect and learning. Over the past decade, affect detectors--which recognize learners' affective states at run-time using behavior logs and sensor data--have advanced substantially across a range of K-12 and postsecondary…
Massively Scalable Near Duplicate Detection in Streams of Documents using MDSH
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bogen, Paul Logasa; Symons, Christopher T; McKenzie, Amber T
2013-01-01
In a world where large-scale text collections are not only becoming ubiquitous but also are growing at increasing rates, near duplicate documents are becoming a growing concern that has the potential to hinder many different information filtering tasks. While others have tried to address this problem, prior techniques have only been used on limited collection sizes and static cases. We will briefly describe the problem in the context of Open Source Intelligence (OSINT) along with our additional constraints for performance. In this work we propose two variations on Multi-dimensional Spectral Hash (MDSH) tailored for working on extremely large, growing setsmore » of text documents. We analyze the memory and runtime characteristics of our techniques and provide an informal analysis of the quality of the near-duplicate clusters produced by our techniques.« less
Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)
DOE Office of Scientific and Technical Information (OSTI.GOV)
The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.
Using reconfigurable hardware to accelerate multiple sequence alignment with ClustalW.
Oliver, Tim; Schmidt, Bertil; Nathan, Darran; Clemens, Ralf; Maskell, Douglas
2005-08-15
Aligning hundreds of sequences using progressive alignment tools such as ClustalW requires several hours on state-of-the-art workstations. We present a new approach to compute multiple sequence alignments in far shorter time using reconfigurable hardware. This results in an implementation of ClustalW with significant runtime savings on a standard off-the-shelf FPGA.
2016-04-30
Data Retention Specifications Daniel Smullen, Research Assistant, Carnegie Mellon University Travis Breaux, Assistant Professor, Carnegie Mellon... Carnegie Mellon University Travis Breaux, Assistant Professor, Carnegie Mellon University Cybersecurity Figure of Merit CAPT Brian Erickson, USN, SPAWAR...Integration With Data Retention Specifications Daniel Smullen—is a Research Assistant enrolled in the software engineering PhD program at Carnegie Mellon
Language Abstractions for Software-Defined Networks
2012-01-01
Academy Christopher Monsanto Princeton University Mark Reitblatt Cornell University Jennifer Rexford Princeton University Alec Story Cornell...Transactions on Networking, 17(4), August 2009. [3] Nate Foster, Rob Harrison, Michael J. Freedman, Christopher Monsanto , Jennifer Rexford, Alec Story...networks. SIGCOMM CCR, 38(2):69–74, 2008. [7] Christopher Monsanto , Nate Foster, Rob Harrison, and David Walker. A compiler and run-time system for
Core Flight System (cFS) a Low Cost Solution for SmallSats
NASA Technical Reports Server (NTRS)
McComas, David; Strege, Susanne; Wilmot, Jonathan
2015-01-01
The cFS is a FSW product line that uses a layered architecture and compile-time configuration parameters which make it portable and scalable for a wide range of platforms. The software layers that defined the application run-time environment are now under a NASA-wide configuration control board with the goal of sustaining an open-source application ecosystem.
Real-time Scheduling for GPUS with Applications in Advanced Automotive Systems
2015-01-01
129 3.7 Architecture of GPU tasklet scheduling infrastructure ...throughput. This disparity is even greater when we consider mobile CPUs, such as those designed by ARM. For instance, the ARM Cortex-A15 series processor as...stub library that replaces the GPGPU runtime within each virtual machine. The stub library communicates API calls to a GPGPU backend user-space daemon
An Interface Transformation Strategy for AF-IPPS
2012-12-01
Representational State Transfer (REST) and Java Enterprise Edition ( Java EE) to implement a reusable “translation service.” For SOAP and REST protocols, XML and...of best-of-breed open source software. The product baseline is summarized in the following table: Product Function Description Java Language...Compiler & Runtime JBoss Application Server Applications, Messaging, Translation Java EE Application Server Ruby on Rails Applications Ruby Web
Runtime Assurance Framework Development for Highly Adaptive Flight Control Systems
2015-12-01
performing a surveillance mission. The demonstration platform consisted of RTA systems for the inner- loop control, outer- loop guidance, ownship flight...For the inner- loop , the concept of employing multiple transition controllers in the reversionary control system was studied. For all feedback levels...5 RTA Protection Applied to Inner- Loop Control Systems .................................................61 5.1 General Description of Morphing Wing
Integrated Environment for Development and Assurance
2015-01-26
Jan 26, 2015 © 2015 Carnegie Mellon University We Rely on Software for Safe Aircraft Operation Embedded software systems introduce a new class of...eveloper Compute Platform Runtime Architecture Application Software Embedded SW System Engineer Data Stream Characteristics Latency jitter affects...Why do system level failures still occur despite fault tolerance techniques being deployed in systems ? Embedded software system as major source of
Architecting Service-Oriented Systems
2011-08-01
Abstract Service orientation is an approach to software systems development that has become a popular way to implement distributed, loosely coupled...runtime. The later you defer binding the more flexibility service providers and service consumers have to develop their software systems independently...Enterprise Service Bus An Enterprise Service Bus (ESB) is a software pattern that can be part of a SOA infrastructure and acts as an intermediary
An Ant Colony Optimization Based Feature Selection for Web Page Classification
2014-01-01
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods. PMID:25136678
A Component-Based Approach for Securing Indoor Home Care Applications
Estévez, Elisabet
2017-01-01
eHealth systems have adopted recent advances on sensing technologies together with advances in information and communication technologies (ICT) in order to provide people-centered services that improve the quality of life of an increasingly elderly population. As these eHealth services are founded on the acquisition and processing of sensitive data (e.g., personal details, diagnosis, treatments and medical history), any security threat would damage the public’s confidence in them. This paper proposes a solution for the design and runtime management of indoor eHealth applications with security requirements. The proposal allows applications definition customized to patient particularities, including the early detection of health deterioration and suitable reaction (events) as well as security needs. At runtime, security support is twofold. A secured component-based platform supervises applications execution and provides events management, whilst the security of the communications among application components is also guaranteed. Additionally, the proposed event management scheme adopts the fog computing paradigm to enable local event related data storage and processing, thus saving communication bandwidth when communicating with the cloud. As a proof of concept, this proposal has been validated through the monitoring of the health status in diabetic patients at a nursing home. PMID:29278370
NASA Astrophysics Data System (ADS)
Chen, Yi-Chieh; Li, Tsung-Han; Lin, Hung-Yu; Chen, Kao-Tun; Wu, Chun-Sheng; Lai, Ya-Chieh; Hurat, Philippe
2018-03-01
Along with process improvement and integrated circuit (IC) design complexity increased, failure rate caused by optical getting higher in the semiconductor manufacture. In order to enhance chip quality, optical proximity correction (OPC) plays an indispensable rule in the manufacture industry. However, OPC, includes model creation, correction, simulation and verification, is a bottleneck from design to manufacture due to the multiple iterations and advanced physical behavior description in math. Thus, this paper presented a pattern-based design technology co-optimization (PB-DTCO) flow in cooperation with OPC to find out patterns which will negatively affect the yield and fixed it automatically in advance to reduce the run-time in OPC operation. PB-DTCO flow can generate plenty of test patterns for model creation and yield gaining, classify candidate patterns systematically and furthermore build up bank includes pairs of match and optimization patterns quickly. Those banks can be used for hotspot fixing, layout optimization and also be referenced for the next technology node. Therefore, the combination of PB-DTCO flow with OPC not only benefits for reducing the time-to-market but also flexible and can be easily adapted to diversity OPC flow.
MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.
González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil
2016-12-15
MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Run-time implementation issues for real-time embedded Ada
NASA Technical Reports Server (NTRS)
Maule, Ruth A.
1986-01-01
A motivating factor in the development of Ada as the department of defense standard language was the high cost of embedded system software development. It was with embedded system requirements in mind that many of the features of the language were incorporated. Yet it is the designers of embedded systems that seem to comprise the majority of the Ada community dissatisfied with the language. There are a variety of reasons for this dissatisfaction, but many seem to be related in some way to the Ada run-time support system. Some of the areas in which the inconsistencies were found to have the greatest impact on performance from the standpoint of real-time systems are presented. In particular, a large part of the duties of the tasking supervisor are subject to the design decisions of the implementer. These include scheduling, rendezvous, delay processing, and task activation and termination. Some of the more general issues presented include time and space efficiencies, generic expansions, memory management, pragmas, and tracing features. As validated compilers become available for bare computer targets, it is important for a designer to be aware that, at least for many real-time issues, all validated Ada compilers are not created equal.
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems
Song, Fengguang; Dongarra, Jack
2014-10-01
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
Thread scheduling for GPU-based OPC simulation on multi-thread
NASA Astrophysics Data System (ADS)
Lee, Heejun; Kim, Sangwook; Hong, Jisuk; Lee, Sooryong; Han, Hwansoo
2018-03-01
As semiconductor product development based on shrinkage continues, the accuracy and difficulty required for the model based optical proximity correction (MBOPC) is increasing. OPC simulation time, which is the most timeconsuming part of MBOPC, is rapidly increasing due to high pattern density in a layout and complex OPC model. To reduce OPC simulation time, we attempt to apply graphic processing unit (GPU) to MBOPC because OPC process is good to be programmed in parallel. We address some issues that may typically happen during GPU-based OPC simulation in multi thread system, such as "out of memory" and "GPU idle time". To overcome these problems, we propose a thread scheduling method, which manages OPC jobs in multiple threads in such a way that simulations jobs from multiple threads are alternatively executed on GPU while correction jobs are executed at the same time in each CPU cores. It was observed that the amount of GPU peak memory usage decreases by up to 35%, and MBOPC runtime also decreases by 4%. In cases where out of memory issues occur in a multi-threaded environment, the thread scheduler was used to improve MBOPC runtime up to 23%.
Runtime verification of embedded real-time systems.
Reinbacher, Thomas; Függer, Matthias; Brauer, Jörg
We present a runtime verification framework that allows on-line monitoring of past-time Metric Temporal Logic (ptMTL) specifications in a discrete time setting. We design observer algorithms for the time-bounded modalities of ptMTL, which take advantage of the highly parallel nature of hardware designs. The algorithms can be translated into efficient hardware blocks, which are designed for reconfigurability, thus, facilitate applications of the framework in both a prototyping and a post-deployment phase of embedded real-time systems. We provide formal correctness proofs for all presented observer algorithms and analyze their time and space complexity. For example, for the most general operator considered, the time-bounded Since operator, we obtain a time complexity that is doubly logarithmic both in the point in time the operator is executed and the operator's time bounds. This result is promising with respect to a self-contained, non-interfering monitoring approach that evaluates real-time specifications in parallel to the system-under-test. We implement our framework on a Field Programmable Gate Array platform and use extensive simulation and logic synthesis runs to assess the benefits of the approach in terms of resource usage and operating frequency.
Prototyping Tool for Web-Based Multiuser Online Role-Playing Game
NASA Astrophysics Data System (ADS)
Okamoto, Shusuke; Kamada, Masaru; Yonekura, Tatsuhiro
This letter proposes a prototyping tool for Web-based Multiuser Online Role-Playing Game (MORPG). The design goal is to make this tool simple and powerful. The tool is comprised of a GUI editor, a translator and a runtime environment. The GUI editor is used to edit state-transition diagrams, each of which defines the behavior of the fictional characters. The state-transition diagrams are translated into C program codes, which plays the role of a game engine in RPG system. The runtime environment includes PHP, JavaScript with Ajax and HTML. So the prototype system can be played on the usual Web browser, such as Fire-fox, Safari and IE. On a click or key press by a player, the Web browser sends it to the Web server to reflect its consequence on the screens which other players are looking at. Prospected users of this tool include programming novices and schoolchildren. The knowledge or skill of any specific programming languages is not required to create state-transition diagrams. Its structure is not only suitable for the definition of a character behavior but also intuitive to help novices understand. Therefore, the users can easily create Web-based MORPG system with the tool.
NEQAIR96,Nonequilibrium and Equilibrium Radiative Transport and Spectra Program: User's Manual
NASA Technical Reports Server (NTRS)
Whiting, Ellis E.; Park, Chul; Liu, Yen; Arnold, James O.; Paterson, John A.
1996-01-01
This document is the User's Manual for a new version of the NEQAIR computer program, NEQAIR96. The program is a line-by-line and a line-of-sight code. It calculates the emission and absorption spectra for atomic and diatomic molecules and the transport of radiation through a nonuniform gas mixture to a surface. The program has been rewritten to make it easy to use, run faster, and include many run-time options that tailor a calculation to the user's requirements. The accuracy and capability have also been improved by including the rotational Hamiltonian matrix formalism for calculating rotational energy levels and Hoenl-London factors for dipole and spin-allowed singlet, doublet, triplet, and quartet transitions. Three sample cases are also included to help the user become familiar with the steps taken to produce a spectrum. A new user interface is included that uses check location, to select run-time options and to enter selected run data, making NEQAIR96 easier to use than the older versions of the code. The ease of its use and the speed of its algorithms make NEQAIR96 a valuable educational code as well as a practical spectroscopic prediction and diagnostic code.
Generalized concurrence in boson sampling.
Chin, Seungbeom; Huh, Joonsuk
2018-04-17
A fundamental question in linear optical quantum computing is to understand the origin of the quantum supremacy in the physical system. It is found that the multimode linear optical transition amplitudes are calculated through the permanents of transition operator matrices, which is a hard problem for classical simulations (boson sampling problem). We can understand this problem by considering a quantum measure that directly determines the runtime for computing the transition amplitudes. In this paper, we suggest a quantum measure named "Fock state concurrence sum" C S , which is the summation over all the members of "the generalized Fock state concurrence" (a measure analogous to the generalized concurrences of entanglement and coherence). By introducing generalized algorithms for computing the transition amplitudes of the Fock state boson sampling with an arbitrary number of photons per mode, we show that the minimal classical runtime for all the known algorithms directly depends on C S . Therefore, we can state that the Fock state concurrence sum C S behaves as a collective measure that controls the computational complexity of Fock state BS. We expect that our observation on the role of the Fock state concurrence in the generalized algorithm for permanents would provide a unified viewpoint to interpret the quantum computing power of linear optics.
Strategies for global optimization in photonics design.
Vukovic, Ana; Sewell, Phillip; Benson, Trevor M
2010-10-01
This paper reports on two important issues that arise in the context of the global optimization of photonic components where large problem spaces must be investigated. The first is the implementation of a fast simulation method and associated matrix solver for assessing particular designs and the second, the strategies that a designer can adopt to control the size of the problem design space to reduce runtimes without compromising the convergence of the global optimization tool. For this study an analytical simulation method based on Mie scattering and a fast matrix solver exploiting the fast multipole method are combined with genetic algorithms (GAs). The impact of the approximations of the simulation method on the accuracy and runtime of individual design assessments and the consequent effects on the GA are also examined. An investigation of optimization strategies for controlling the design space size is conducted on two illustrative examples, namely, 60° and 90° waveguide bends based on photonic microstructures, and their effectiveness is analyzed in terms of a GA's ability to converge to the best solution within an acceptable timeframe. Finally, the paper describes some particular optimized solutions found in the course of this work.
A Component-Based Approach for Securing Indoor Home Care Applications.
Agirre, Aitor; Armentia, Aintzane; Estévez, Elisabet; Marcos, Marga
2017-12-26
eHealth systems have adopted recent advances on sensing technologies together with advances in information and communication technologies (ICT) in order to provide people-centered services that improve the quality of life of an increasingly elderly population. As these eHealth services are founded on the acquisition and processing of sensitive data (e.g., personal details, diagnosis, treatments and medical history), any security threat would damage the public's confidence in them. This paper proposes a solution for the design and runtime management of indoor eHealth applications with security requirements. The proposal allows applications definition customized to patient particularities, including the early detection of health deterioration and suitable reaction (events) as well as security needs. At runtime, security support is twofold. A secured component-based platform supervises applications execution and provides events management, whilst the security of the communications among application components is also guaranteed. Additionally, the proposed event management scheme adopts the fog computing paradigm to enable local event related data storage and processing, thus saving communication bandwidth when communicating with the cloud. As a proof of concept, this proposal has been validated through the monitoring of the health status in diabetic patients at a nursing home.
Li, Jia; Xia, Yunni; Luo, Xin
2014-01-01
OWL-S, one of the most important Semantic Web service ontologies proposed to date, provides a core ontological framework and guidelines for describing the properties and capabilities of their web services in an unambiguous, computer interpretable form. Predicting the reliability of composite service processes specified in OWL-S allows service users to decide whether the process meets the quantitative quality requirement. In this study, we consider the runtime quality of services to be fluctuating and introduce a dynamic framework to predict the runtime reliability of services specified in OWL-S, employing the Non-Markovian stochastic Petri net (NMSPN) and the time series model. The framework includes the following steps: obtaining the historical response times series of individual service components; fitting these series with a autoregressive-moving-average-model (ARMA for short) and predicting the future firing rates of service components; mapping the OWL-S process into a NMSPN model; employing the predicted firing rates as the model input of NMSPN and calculating the normal completion probability as the reliability estimate. In the case study, a comparison between the static model and our approach based on experimental data is presented and it is shown that our approach achieves higher prediction accuracy.
Support for User Interfaces for Distributed Systems
NASA Technical Reports Server (NTRS)
Eychaner, Glenn; Niessner, Albert
2005-01-01
An extensible Java(TradeMark) software framework supports the construction and operation of graphical user interfaces (GUIs) for distributed computing systems typified by ground control systems that send commands to, and receive telemetric data from, spacecraft. Heretofore, such GUIs have been custom built for each new system at considerable expense. In contrast, the present framework affords generic capabilities that can be shared by different distributed systems. Dynamic class loading, reflection, and other run-time capabilities of the Java language and JavaBeans component architecture enable the creation of a GUI for each new distributed computing system with a minimum of custom effort. By use of this framework, GUI components in control panels and menus can send commands to a particular distributed system with a minimum of system-specific code. The framework receives, decodes, processes, and displays telemetry data; custom telemetry data handling can be added for a particular system. The framework supports saving and later restoration of users configurations of control panels and telemetry displays with a minimum of effort in writing system-specific code. GUIs constructed within this framework can be deployed in any operating system with a Java run-time environment, without recompilation or code changes.
Transformation as a Design Process and Runtime Architecture for High Integrity Software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bespalko, S.J.; Winter, V.L.
1999-04-05
We have discussed two aspects of creating high integrity software that greatly benefit from the availability of transformation technology, which in this case is manifest by the requirement for a sophisticated backtracking parser. First, because of the potential for correctly manipulating programs via small changes, an automated non-procedural transformation system can be a valuable tool for constructing high assurance software. Second, modeling the processing of translating data into information as a, perhaps, context-dependent grammar leads to an efficient, compact implementation. From a practical perspective, the transformation process should begin in the domain language in which a problem is initially expressed.more » Thus in order for a transformation system to be practical it must be flexible with respect to domain-specific languages. We have argued that transformation applied to specification results in a highly reliable system. We also attempted to briefly demonstrate that transformation technology applied to the runtime environment will result in a safe and secure system. We thus believe that the sophisticated multi-lookahead backtracking parsing technology is central to the task of being in a position to demonstrate the existence of HIS.« less
Source and listener directivity for interactive wave-based sound propagation.
Mehra, Ravish; Antani, Lakulish; Kim, Sujeong; Manocha, Dinesh
2014-04-01
We present an approach to model dynamic, data-driven source and listener directivity for interactive wave-based sound propagation in virtual environments and computer games. Our directional source representation is expressed as a linear combination of elementary spherical harmonic (SH) sources. In the preprocessing stage, we precompute and encode the propagated sound fields due to each SH source. At runtime, we perform the SH decomposition of the varying source directivity interactively and compute the total sound field at the listener position as a weighted sum of precomputed SH sound fields. We propose a novel plane-wave decomposition approach based on higher-order derivatives of the sound field that enables dynamic HRTF-based listener directivity at runtime. We provide a generic framework to incorporate our source and listener directivity in any offline or online frequency-domain wave-based sound propagation algorithm. We have integrated our sound propagation system in Valve's Source game engine and use it to demonstrate realistic acoustic effects such as sound amplification, diffraction low-passing, scattering, localization, externalization, and spatial sound, generated by wave-based propagation of directional sources and listener in complex scenarios. We also present results from our preliminary user study.
Improved HDRG decoders for qudit and non-Abelian quantum error correction
NASA Astrophysics Data System (ADS)
Hutter, Adrian; Loss, Daniel; Wootton, James R.
2015-03-01
Hard-decision renormalization group (HDRG) decoders are an important class of decoding algorithms for topological quantum error correction. Due to their versatility, they have been used to decode systems with fractal logical operators, color codes, qudit topological codes, and non-Abelian systems. In this work, we develop a method of performing HDRG decoding which combines strengths of existing decoders and further improves upon them. In particular, we increase the minimal number of errors necessary for a logical error in a system of linear size L from \\Theta ({{L}2/3}) to Ω ({{L}1-ε }) for any ε \\gt 0. We apply our algorithm to decoding D({{{Z}}d}) quantum double models and a non-Abelian anyon model with Fibonacci-like fusion rules, and show that it indeed significantly outperforms previous HDRG decoders. Furthermore, we provide the first study of continuous error correction with imperfect syndrome measurements for the D({{{Z}}d}) quantum double models. The parallelized runtime of our algorithm is poly(log L) for the perfect measurement case. In the continuous case with imperfect syndrome measurements, the averaged runtime is O(1) for Abelian systems, while continuous error correction for non-Abelian anyons stays an open problem.
Dynamic Load Balancing for Grid Partitioning on a SP-2 Multiprocessor: A Framework
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)
1994-01-01
Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single EBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Dynamic Load Balancing For Grid Partitioning on a SP-2 Multiprocessor: A Framework
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)
1994-01-01
Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single IBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Fengguang; Dongarra, Jack
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
Dynamic Distribution and Layouting of Model-Based User Interfaces in Smart Environments
NASA Astrophysics Data System (ADS)
Roscher, Dirk; Lehmann, Grzegorz; Schwartze, Veit; Blumendorf, Marco; Albayrak, Sahin
The developments in computer technology in the last decade change the ways of computer utilization. The emerging smart environments make it possible to build ubiquitous applications that assist users during their everyday life, at any time, in any context. But the variety of contexts-of-use (user, platform and environment) makes the development of such ubiquitous applications for smart environments and especially its user interfaces a challenging and time-consuming task. We propose a model-based approach, which allows adapting the user interface at runtime to numerous (also unknown) contexts-of-use. Based on a user interface modelling language, defining the fundamentals and constraints of the user interface, a runtime architecture exploits the description to adapt the user interface to the current context-of-use. The architecture provides automatic distribution and layout algorithms for adapting the applications also to contexts unforeseen at design time. Designers do not specify predefined adaptations for each specific situation, but adaptation constraints and guidelines. Furthermore, users are provided with a meta user interface to influence the adaptations according to their needs. A smart home energy management system serves as running example to illustrate the approach.
Parallel high-precision orbit propagation using the modified Picard-Chebyshev method
NASA Astrophysics Data System (ADS)
Koblick, Darin C.
2012-03-01
The modified Picard-Chebyshev method, when run in parallel, is thought to be more accurate and faster than the most efficient sequential numerical integration techniques when applied to orbit propagation problems. Previous experiments have shown that the modified Picard-Chebyshev method can have up to a one order magnitude speedup over the 12
Visual Data-Analytics of Large-Scale Parallel Discrete-Event Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ross, Caitlin; Carothers, Christopher D.; Mubarak, Misbah
Parallel discrete-event simulation (PDES) is an important tool in the codesign of extreme-scale systems because PDES provides a cost-effective way to evaluate designs of highperformance computing systems. Optimistic synchronization algorithms for PDES, such as Time Warp, allow events to be processed without global synchronization among the processing elements. A rollback mechanism is provided when events are processed out of timestamp order. Although optimistic synchronization protocols enable the scalability of large-scale PDES, the performance of the simulations must be tuned to reduce the number of rollbacks and provide an improved simulation runtime. To enable efficient large-scale optimistic simulations, one has tomore » gain insight into the factors that affect the rollback behavior and simulation performance. We developed a tool for ROSS model developers that gives them detailed metrics on the performance of their large-scale optimistic simulations at varying levels of simulation granularity. Model developers can use this information for parameter tuning of optimistic simulations in order to achieve better runtime and fewer rollbacks. In this work, we instrument the ROSS optimistic PDES framework to gather detailed statistics about the simulation engine. We have also developed an interactive visualization interface that uses the data collected by the ROSS instrumentation to understand the underlying behavior of the simulation engine. The interface connects real time to virtual time in the simulation and provides the ability to view simulation data at different granularities. We demonstrate the usefulness of our framework by performing a visual analysis of the dragonfly network topology model provided by the CODES simulation framework built on top of ROSS. The instrumentation needs to minimize overhead in order to accurately collect data about the simulation performance. To ensure that the instrumentation does not introduce unnecessary overhead, we perform a scaling study that compares instrumented ROSS simulations with their noninstrumented counterparts in order to determine the amount of perturbation when running at different simulation scales.« less
NASA Astrophysics Data System (ADS)
Evans, M. N.; Selmer, K. J.; Breeden, B. T.; Lopatka, A. S.; Plummer, R. E.
2016-09-01
We describe an algorithm to correct for scale compression, runtime drift, and amplitude effects in carbonate and cellulose oxygen and carbon isotopic analyses made on two online continuous flow isotope ratio mass spectrometry (CF-IRMS) systems using gas chromatographic (GC) separation. We validate the algorithm by correcting measurements of samples of known isotopic composition which are not used to estimate the corrections. For carbonate δ13C (δ18O) data, median precision of validation estimates for two reference materials and two calibrated working standards is 0.05‰ (0.07‰); median bias is 0.04‰ (0.02‰) over a range of 49.2‰ (24.3‰). For α-cellulose δ13C (δ18O) data, median precision of validation estimates for one reference material and five working standards is 0.11‰ (0.27‰); median bias is 0.13‰ (-0.10‰) over a range of 16.1‰ (19.1‰). These results are within the 5th-95th percentile range of subsequent routine runtime validation exercises in which one working standard is used to calibrate the other. Analysis of the relative importance of correction steps suggests that drift and scale-compression corrections are most reliable and valuable. If validation precisions are not already small, routine cross-validated precision estimates are improved by up to 50% (80%). The results suggest that correction for systematic error may enable these particular CF-IRMS systems to produce δ13C and δ18O carbonate and cellulose isotopic analyses with higher validated precision, accuracy, and throughput than is typically reported for these systems. The correction scheme may be used in support of replication-intensive research projects in paleoclimatology and other data-intensive applications within the geosciences.
Quinoa - Adaptive Computational Fluid Dynamics, 0.2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bakosi, Jozsef; Gonzalez, Francisco; Rogers, Brandon
Quinoa is a set of computational tools that enables research and numerical analysis in fluid dynamics. At this time it remains a test-bed to experiment with various algorithms using fully asynchronous runtime systems. Currently, Quinoa consists of the following tools: (1) Walker, a numerical integrator for systems of stochastic differential equations in time. It is a mathematical tool to analyze and design the behavior of stochastic differential equations. It allows the estimation of arbitrary coupled statistics and probability density functions and is currently used for the design of statistical moment approximations for multiple mixing materials in variable-density turbulence. (2) Inciter,more » an overdecomposition-aware finite element field solver for partial differential equations using 3D unstructured grids. Inciter is used to research asynchronous mesh-based algorithms and to experiment with coupling asynchronous to bulk-synchronous parallel code. Two planned new features of Inciter, compared to the previous release (LA-CC-16-015), to be implemented in 2017, are (a) a simple Navier-Stokes solver for ideal single-material compressible gases, and (b) solution-adaptive mesh refinement (AMR), which enables dynamically concentrating compute resources to regions with interesting physics. Using the NS-AMR problem we plan to explore how to scale such high-load-imbalance simulations, representative of large production multiphysics codes, to very large problems on very large computers using an asynchronous runtime system. (3) RNGTest, a test harness to subject random number generators to stringent statistical tests enabling quantitative ranking with respect to their quality and computational cost. (4) UnitTest, a unit test harness, running hundreds of tests per second, capable of testing serial, synchronous, and asynchronous functions. (5) MeshConv, a mesh file converter that can be used to convert 3D tetrahedron meshes from and to either of the following formats: Gmsh, (http://www.geuz.org/gmsh), Netgen, (http://sourceforge.net/apps/mediawiki/netgen-mesher), ExodusII, (http://sourceforge.net/projects/exodusii), HyperMesh, (http://www.altairhyperworks.com/product/HyperMesh).« less
Runtime Verification of C Programs
NASA Technical Reports Server (NTRS)
Havelund, Klaus
2008-01-01
We present in this paper a framework, RMOR, for monitoring the execution of C programs against state machines, expressed in a textual (nongraphical) format in files separate from the program. The state machine language has been inspired by a graphical state machine language RCAT recently developed at the Jet Propulsion Laboratory, as an alternative to using Linear Temporal Logic (LTL) for requirements capture. Transitions between states are labeled with abstract event names and Boolean expressions over such. The abstract events are connected to code fragments using an aspect-oriented pointcut language similar to ASPECTJ's or ASPECTC's pointcut language. The system is implemented in the C analysis and transformation package CIL, and is programmed in OCAML, the implementation language of CIL. The work is closely related to the notion of stateful aspects within aspect-oriented programming, where pointcut languages are extended with temporal assertions over the execution trace.
NASA Technical Reports Server (NTRS)
Hollis, Brian R.; Griffith, Wayland C.; Yanta, William J.
1991-01-01
A fine-wire thermocouple probe was used to determine freestream stagnation temperatures in hypersonic flows. Data were gathered in a N2 blowdown wind tunnel with runtimes of 1-5 s. Tests were made at supply pressures between 30 and 1400 atm and supply temperatures between 700 and 1900 K, with Mach numbers of 14 to 16. An iterative procedure requiring thermocouple data, pilot pressure measurements, and supply conditions was used to determine test cell stagnation temperatures. Probe conduction and radiation losses, as well as real gas behavior of N2, were accounted for during analysis. Temperature measurement error was found to be 5 to 10 percent. A correlation was drawn between thermocouple diameter Reynolds number and temperature recovery ratio. Transient probe behavior was studied and was found to be adequate in temperature gradients up to 1000 K/s.
Learning from Bees: An Approach for Influence Maximization on Viral Campaigns
Sankar, C. Prem; S., Asharaf
2016-01-01
Maximisation of influence propagation is a key ingredient to any viral marketing or socio-political campaigns. However, it is an NP-hard problem, and various approximate algorithms have been suggested to address the issue, though not largely successful. In this paper, we propose a bio-inspired approach to select the initial set of nodes which is significant in rapid convergence towards a sub-optimal solution in minimal runtime. The performance of the algorithm is evaluated using the re-tweet network of the hashtag #KissofLove on Twitter associated with the non-violent protest against the moral policing spread to many parts of India. Comparison with existing centrality based node ranking process the proposed method significant improvement on influence propagation. The proposed algorithm is one of the hardly few bio-inspired algorithms in network theory. We also report the results of the exploratory analysis of the network kiss of love campaign. PMID:27992472
ICESat-2 laser Nd:YVO4 amplifier
NASA Astrophysics Data System (ADS)
Sawruk, Nicholas W.; Burns, Patrick M.; Edwards, Ryan E.; Litvinovitch, Viatcheslav; Martin, Nigel; Witt, Greg; Fakhoury, Elias; Iskander, John; Pronko, Mark S.; Troupaki, Elisavet; Bay, Michael M.; He, Charles C.; Wang, Liqin L.; Cavanaugh, John F.; Farrokh, Babak; Salem, Jonathan A.; Baker, Eric
2018-02-01
We report on the cause and corrective actions of three amplifier crystal fractures in the space-qualified laser systems used in NASA Goddard Space Flight Center's (GSFC) Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2). The ICESat-2 lasers each contain three end-pumped Nd:YVOO4 amplifier stages. The crystals are clamped between two gold plated copper heat spreaders with an indium foil thermal interface material, and the crystal fractures occurred after multiple years of storage and over a year of operational run-time. The primary contributors are high compressive loading of the NdYVO4 crystals at the beginning of life, a time dependent crystal stress caused by an intermetallic reaction of the gold plating and indium, and slow crack growth resulting in a reduction in crystal strength over time. An updated crystal mounting scheme was designed, analyzed, fabricated and tested. Thee fracture slab failure analysis, finite-element modeling and corrective actions are presented.
Safe and Efficient Support for Embeded Multi-Processors in ADA
NASA Astrophysics Data System (ADS)
Ruiz, Jose F.
2010-08-01
New software demands increasing processing power, and multi-processor platforms are spreading as the answer to achieve the required performance. Embedded real-time systems are also subject to this trend, but in the case of real-time mission-critical systems, the properties of reliability, predictability and analyzability are also paramount. The Ada 2005 language defined a subset of its tasking model, the Ravenscar profile, that provides the basis for the implementation of deterministic and time analyzable applications on top of a streamlined run-time system. This Ravenscar tasking profile, originally designed for single processors, has proven remarkably useful for modelling verifiable real-time single-processor systems. This paper proposes a simple extension to the Ravenscar profile to support multi-processor systems using a fully partitioned approach. The implementation of this scheme is simple, and it can be used to develop applications amenable to schedulability analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Begoli, Edmon; Dunning, Ted; Charlie, Frasure
We present a service platform for schema-leess exploration of data and discovery of patient-related statistics from healthcare data sets. The architecture of this platform is motivated by the need for fast, schema-less, and flexible approaches to SQL-based exploration and discovery of information embedded in the common, heterogeneously structured healthcare data sets and supporting components (electronic health records, practice management systems, etc.) The motivating use cases described in the paper are clinical trials candidate discovery, and a treatment effectiveness analysis. Following the use cases, we discuss the key features and software architecture of the platform, the underlying core components (Apache Parquet,more » Drill, the web services server), and the runtime profiles and performance characteristics of the platform. We conclude by showing dramatic speedup with some approaches, and the performance tradeoffs and limitations of others.« less
Binary Associative Memories as a Benchmark for Spiking Neuromorphic Hardware
Stöckel, Andreas; Jenzen, Christoph; Thies, Michael; Rückert, Ulrich
2017-01-01
Large-scale neuromorphic hardware platforms, specialized computer systems for energy efficient simulation of spiking neural networks, are being developed around the world, for example as part of the European Human Brain Project (HBP). Due to conceptual differences, a universal performance analysis of these systems in terms of runtime, accuracy and energy efficiency is non-trivial, yet indispensable for further hard- and software development. In this paper we describe a scalable benchmark based on a spiking neural network implementation of the binary neural associative memory. We treat neuromorphic hardware and software simulators as black-boxes and execute exactly the same network description across all devices. Experiments on the HBP platforms under varying configurations of the associative memory show that the presented method allows to test the quality of the neuron model implementation, and to explain significant deviations from the expected reference output. PMID:28878642
RootJS: Node.js Bindings for ROOT 6
NASA Astrophysics Data System (ADS)
Beffart, Theo; Früh, Maximilian; Haas, Christoph; Rajgopal, Sachin; Schwabe, Jonas; Wolff, Christoph; Szuba, Marek
2017-10-01
We present rootJS, an interface making it possible to seamlessly integrate ROOT 6 into applications written for Node.js, the JavaScript runtime platform increasingly commonly used to create high-performance Web applications. ROOT features can be called both directly from Node.js code and by JIT-compiling C++ macros. All rootJS methods are invoked asynchronously and support callback functions, allowing non-blocking operation of Node.js applications using them. Last but not least, our bindings have been designed to platform-independent and should therefore work on all systems supporting both ROOT 6 and Node.js. Thanks to rootJS it is now possible to create ROOT-aware Web applications taking full advantage of the high performance and extensive capabilities of Node.js. Examples include platforms for the quality assurance of acquired, reconstructed or simulated data, book-keeping and e-log systems, and even Web browser-based data visualisation and analysis.
Object-based media and stream-based computing
NASA Astrophysics Data System (ADS)
Bove, V. Michael, Jr.
1998-03-01
Object-based media refers to the representation of audiovisual information as a collection of objects - the result of scene-analysis algorithms - and a script describing how they are to be rendered for display. Such multimedia presentations can adapt to viewing circumstances as well as to viewer preferences and behavior, and can provide a richer link between content creator and consumer. With faster networks and processors, such ideas become applicable to live interpersonal communications as well, creating a more natural and productive alternative to traditional videoconferencing. In this paper is outlined an example of object-based media algorithms and applications developed by my group, and present new hardware architectures and software methods that we have developed to enable meeting the computational requirements of object- based and other advanced media representations. In particular we describe stream-based processing, which enables automatic run-time parallelization of multidimensional signal processing tasks even given heterogenous computational resources.
Run-time parallelization and scheduling of loops
NASA Technical Reports Server (NTRS)
Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay
1990-01-01
Run time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases, where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run time, wave fronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run time reordering of loop indices can have a significant impact on performance. Furthermore, the overheads associated with this type of reordering are amortized when the loop is executed several times with the same dependency structure.
UniGene Tabulator: a full parser for the UniGene format.
Lenzi, Luca; Frabetti, Flavia; Facchin, Federica; Casadei, Raffaella; Vitale, Lorenza; Canaider, Silvia; Carinci, Paolo; Zannotti, Maria; Strippoli, Pierluigi
2006-10-15
UniGene Tabulator 1.0 provides a solution for full parsing of UniGene flat file format; it implements a structured graphical representation of each data field present in UniGene following import into a common database managing system usable in a personal computer. This database includes related tables for sequence, protein similarity, sequence-tagged site (STS) and transcript map interval (TXMAP) data, plus a summary table where each record represents a UniGene cluster. UniGene Tabulator enables full local management of UniGene data, allowing parsing, querying, indexing, retrieving, exporting and analysis of UniGene data in a relational database form, usable on Macintosh (OS X 10.3.9 or later) and Windows (2000, with service pack 4, XP, with service pack 2 or later) operating systems-based computers. The current release, including both the FileMaker runtime applications, is freely available at http://apollo11.isto.unibo.it/software/
Barreiros, Willian; Teodoro, George; Kurc, Tahsin; Kong, Jun; Melo, Alba C. M. A.; Saltz, Joel
2017-01-01
We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies. PMID:29081725
Bolis, A; Cantwell, C D; Kirby, R M; Sherwin, S J
2014-01-01
We investigate the relative performance of a second-order Adams–Bashforth scheme and second-order and fourth-order Runge–Kutta schemes when time stepping a 2D linear advection problem discretised using a spectral/hp element technique for a range of different mesh sizes and polynomial orders. Numerical experiments explore the effects of short (two wavelengths) and long (32 wavelengths) time integration for sets of uniform and non-uniform meshes. The choice of time-integration scheme and discretisation together fixes a CFL limit that imposes a restriction on the maximum time step, which can be taken to ensure numerical stability. The number of steps, together with the order of the scheme, affects not only the runtime but also the accuracy of the solution. Through numerical experiments, we systematically highlight the relative effects of spatial resolution and choice of time integration on performance and provide general guidelines on how best to achieve the minimal execution time in order to obtain a prescribed solution accuracy. The significant role played by higher polynomial orders in reducing CPU time while preserving accuracy becomes more evident, especially for uniform meshes, compared with what has been typically considered when studying this type of problem.© 2014. The Authors. International Journal for Numerical Methods in Fluids published by John Wiley & Sons, Ltd. PMID:25892840
Behavioral and Temporal Pattern Detection Within Financial Data With Hidden Information
2012-02-01
probabilistic pattern detector to monitor the pattern. 15. SUBJECT TERMS Runtime verification, Hidden data, Hidden Markov models, Formal specifications...sequences in many other fields besides financial systems [L, TV, LC, LZ ]. Rather, the technique suggested in this paper is positioned as a hybrid...operation of the pattern detector . Section 7 describes the operation of the probabilistic pattern-matching monitor, and section 8 describes three
ART/Ada design project, phase 1. Task 1 report: Overall design
NASA Technical Reports Server (NTRS)
Allen, Bradley P.
1988-01-01
The design methodology for the ART/Ada project is introduced, and the selected design for ART/Ada is described in detail. The following topics are included: object-oriented design, reusable software, documentation techniques, impact of Ada, design approach, and differences between ART-IM 1.5 and ART/Ada 1.0 prototype. Also, Ada generator and ART/Ada runtime systems are discussed.
A Case Study in Software Adaptation
2002-01-01
1 A Case Study in Software Adaptation Giuseppe Valetto Telecom Italia Lab Via Reiss Romoli 274 10148, Turin, Italy +39 011 2288788...configuration of the service; monitoring of database connectivity from within the service; monitoring of crashes and shutdowns of IM servers; monitoring of...of the IM server all share a relational database and a common runtime state repository, which make up the backend tier, and allow replicas to
A PLUG-AND-PLAY ARCHITECTURE FOR PROBABILISTIC PROGRAMMING
2017-04-01
programs that use discrete numerical distributions, but even then, the space of possible outcomes may be uncountable (as a solution can be infinite...also identify conditions guaranteeing that all possible outcomes are finite (and then the probability space is discrete ). 2.2.2 The PlogiQL...and not determined at runtime. Nevertheless, the PRAiSE team plans to extend their solution to support numerical (continuous or discrete
Traveler Phase 1A Joint Review
NASA Technical Reports Server (NTRS)
St. John, Clint; Scofield, Jan; Skoog, Mark; Flock, Alex; Williams, Ethan; Guirguis, Luke; Loudon, Kevin; Sutherland, Jeffrey; Lehmann, Richard; Garland, Michael;
2017-01-01
The briefing contains the preliminary findings and suggestions for improvement of methods used in development and evaluation of a multi monitor runtime assurance architecture for autonomous flight vehicles. Initial system design, implementation, verification, and flight testing has been conducted. As of yet detailed data review is incomplete, and flight testing has been limited to initial monitor force fights. Detailed monitor flight evaluations have yet to be performed.
RELIABILITY, AVAILABILITY, AND SERVICEABILITY FOR PETASCALE HIGH-END COMPUTING AND BEYOND
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chokchai "Box" Leangsuksun
2011-05-31
Our project is a multi-institutional research effort that adopts interplay of RELIABILITY, AVAILABILITY, and SERVICEABILITY (RAS) aspects for solving resilience issues in highend scientific computing in the next generation of supercomputers. results lie in the following tracks: Failure prediction in a large scale HPC; Investigate reliability issues and mitigation techniques including in GPGPU-based HPC system; HPC resilience runtime & tools.
Tool Integration Framework for Bio-Informatics
2007-04-01
Java NetBeans [11] based Integrated Development Environment (IDE) for developing modules and packaging computational tools. The framework is extremely...integrate an Eclipse front-end for Desktop Integration. Eclipse was chosen over Netbeans owing to a higher acceptance, better infrastructure...5.0. This version of Dashboard ran with NetBeans IDE 3.6 requiring Java Runtime 1.4 on a machine with Windows XP. The toolchain is executed by
DOE Office of Scientific and Technical Information (OSTI.GOV)
Werner, Mike
Why this utility? After years of upgrading the Java Runtime Environment (JRE) or the Java Software Development Kit (JDK/SDK), a Windows computer becomes littered with so many old versions that the machine may become a security risk due to exploits targeted at those older versions. This utility helps mitigate those vulnerabilities by searching for, and removing, versions 1.3.x thru 1.7.x of the Java JRE and/or JDK/SDK.
Rule-Based Runtime Verification
NASA Technical Reports Server (NTRS)
Barringer, Howard; Goldberg, Allen; Havelund, Klaus; Sen, Koushik
2003-01-01
We present a rule-based framework for defining and implementing finite trace monitoring logics, including future and past time temporal logic, extended regular expressions, real-time logics, interval logics, forms of quantified temporal logics, and so on. Our logic, EAGLE, is implemented as a Java library and involves novel techniques for rule definition, manipulation and execution. Monitoring is done on a state-by-state basis, without storing the execution trace.
Generic and Automated Runtime Program Repair
2012-09-01
other person or corporation; or convey any rights or permission to manufacture, use, or sell any patented invention that may relate to them... PERSON PATRICK M. HURLEY a. REPORT U b. ABSTRACT U c. THIS PAGE U 19b. TELEPONE NUMBER (Include area code) N/A Standard Form 298...Public Release; Distribution Unlimited. 2. Introduction Software bugs are ubiquitous, and fixing them remains a difficult, time- consuming , and manual
1989-03-24
Specified Test Verification Matri_ .. 39 3.2.6.5 Test Generation Assistance. .............. . .. ......... 40 3.2.7 Maintenance...lack of intimate knowledge of how the runtime links to the compiler generated code. Furthermore, the runime must meet a rigorous set of tests to insure...projects, and is not provided. Along with the library, a set of tests should be provided to verify the accuracy of the library after changes have been
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor
NASA Astrophysics Data System (ADS)
Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.
2014-03-01
List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mckie, Jim
2012-01-09
This report documents the results of work done over a 6 year period under the FAST-OS programs. The first effort was called Right-Weight Kernels, (RWK) and was concerned with improving measurements of OS noise so it could be treated quantitatively; and evaluating the use of two operating systems, Linux and Plan 9, on HPC systems and determining how these operating systems needed to be extended or changed for HPC, while still retaining their general-purpose nature. The second program, HARE, explored the creation of alternative runtime models, building on RWK. All of the HARE work was done on Plan 9. Themore » HARE researchers were mindful of the very good Linux and LWK work being done at other labs and saw no need to recreate it. Even given this limited funding, the two efforts had outsized impact: _ Helped Cray decide to use Linux, instead of a custom kernel, and provided the tools needed to make Linux perform well _ Created a successor operating system to Plan 9, NIX, which has been taken in by Bell Labs for further development _ Created a standard system measurement tool, Fixed Time Quantum or FTQ, which is widely used for measuring operating systems impact on applications _ Spurred the use of the 9p protocol in several organizations, including IBM _ Built software in use at many companies, including IBM, Cray, and Google _ Spurred the creation of alternative runtimes for use on HPC systems _ Demonstrated that, with proper modifications, a general purpose operating systems can provide communications up to 3 times as effective as user-level libraries Open source was a key part of this work. The code developed for this project is in wide use and available at many places. The core Blue Gene code is available at https://bitbucket.org/ericvh/hare. We describe details of these impacts in the following sections. The rest of this report is organized as follows: First, we describe commercial impact; next, we describe the FTQ benchmark and its impact in more detail; operating systems and runtime research follows; we discuss infrastructure software; and close with a description of the new NIX operating system, future work, and conclusions.« less
New Approaches For Asteroid Spin State and Shape Modeling From Delay-Doppler Radar Images
NASA Astrophysics Data System (ADS)
Raissi, Chedy; Lamee, Mehdi; Mosiane, Olorato; Vassallo, Corinne; Busch, Michael W.; Greenberg, Adam; Benner, Lance A. M.; Naidu, Shantanu P.; Duong, Nicholas
2016-10-01
Delay-Doppler radar imaging is a powerful technique to characterize the trajectories, shapes, and spin states of near-Earth asteroids; and has yielded detailed models of dozens of objects. Reconstructing objects' shapes and spins from delay-Doppler data is a computationally intensive inversion problem. Since the 1990s, delay-Doppler data has been analyzed using the SHAPE software. SHAPE performs sequential single-parameter fitting, and requires considerable computer runtime and human intervention (Hudson 1993, Magri et al. 2007). Recently, multiple-parameter fitting algorithms have been shown to more efficiently invert delay-Doppler datasets (Greenberg & Margot 2015) - decreasing runtime while improving accuracy. However, extensive human oversight of the shape modeling process is still required. We have explored two new techniques to better automate delay-Doppler shape modeling: Bayesian optimization and a machine-learning neural network.One of the most time-intensive steps of the shape modeling process is to perform a grid search to constrain the target's spin state. We have implemented a Bayesian optimization routine that uses SHAPE to autonomously search the space of spin-state parameters. To test the efficacy of this technique, we compared it to results with human-guided SHAPE for asteroids 1992 UY4, 2000 RS11, and 2008 EV5. Bayesian optimization yielded similar spin state constraints within a factor of 3 less computer runtime.The shape modeling process could be further accelerated using a deep neural network to replace iterative fitting. We have implemented a neural network with a variational autoencoder (VAE), using a subset of known asteroid shapes and a large set of synthetic radar images as inputs to train the network. Conditioning the VAE in this manner allows the user to give the network a set of radar images and get a 3D shape model as an output. Additional development will be required to train a network to reliably render shapes from delay-Doppler images.This work was supported by NASA Ames, NVIDIA, Autodesk and the SETI Institute as part of the NASA Frontier Development Lab program.
Analyzing Power Supply and Demand on the ISS
NASA Technical Reports Server (NTRS)
Thomas, Justin; Pham, Tho; Halyard, Raymond; Conwell, Steve
2006-01-01
Station Power and Energy Evaluation Determiner (SPEED) is a Java application program for analyzing the supply and demand aspects of the electrical power system of the International Space Station (ISS). SPEED can be executed on any computer that supports version 1.4 or a subsequent version of the Java Runtime Environment. SPEED includes an analysis module, denoted the Simplified Battery Solar Array Model, which is a simplified engineering model of the ISS primary power system. This simplified model makes it possible to perform analyses quickly. SPEED also includes a user-friendly graphical-interface module, an input file system, a parameter-configuration module, an analysis-configuration-management subsystem, and an output subsystem. SPEED responds to input information on trajectory, shadowing, attitude, and pointing in either a state-of-charge mode or a power-availability mode. In the state-of-charge mode, SPEED calculates battery state-of-charge profiles, given a time-varying power-load profile. In the power-availability mode, SPEED determines the time-varying total available solar array and/or battery power output, given a minimum allowable battery state of charge.
The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code.
Kunkel, Susanne; Schenck, Wolfram
2017-01-01
NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling.
The NEST Dry-Run Mode: Efficient Dynamic Analysis of Neuronal Network Simulation Code
Kunkel, Susanne; Schenck, Wolfram
2017-01-01
NEST is a simulator for spiking neuronal networks that commits to a general purpose approach: It allows for high flexibility in the design of network models, and its applications range from small-scale simulations on laptops to brain-scale simulations on supercomputers. Hence, developers need to test their code for various use cases and ensure that changes to code do not impair scalability. However, running a full set of benchmarks on a supercomputer takes up precious compute-time resources and can entail long queuing times. Here, we present the NEST dry-run mode, which enables comprehensive dynamic code analysis without requiring access to high-performance computing facilities. A dry-run simulation is carried out by a single process, which performs all simulation steps except communication as if it was part of a parallel environment with many processes. We show that measurements of memory usage and runtime of neuronal network simulations closely match the corresponding dry-run data. Furthermore, we demonstrate the successful application of the dry-run mode in the areas of profiling and performance modeling. PMID:28701946
The Filament Sensor for Near Real-Time Detection of Cytoskeletal Fiber Structures
Eltzner, Benjamin; Wollnik, Carina; Gottschlich, Carsten; Huckemann, Stephan; Rehfeldt, Florian
2015-01-01
A reliable extraction of filament data from microscopic images is of high interest in the analysis of acto-myosin structures as early morphological markers in mechanically guided differentiation of human mesenchymal stem cells and the understanding of the underlying fiber arrangement processes. In this paper, we propose the filament sensor (FS), a fast and robust processing sequence which detects and records location, orientation, length, and width for each single filament of an image, and thus allows for the above described analysis. The extraction of these features has previously not been possible with existing methods. We evaluate the performance of the proposed FS in terms of accuracy and speed in comparison to three existing methods with respect to their limited output. Further, we provide a benchmark dataset of real cell images along with filaments manually marked by a human expert as well as simulated benchmark images. The FS clearly outperforms existing methods in terms of computational runtime and filament extraction accuracy. The implementation of the FS and the benchmark database are available as open source. PMID:25996921
Andreassen, Trine Naalsund; Havnen, Hilde; Spigset, Olav; Falch, Berit Margrethe Hasle; Skråstad, Ragnhild Bergene
2018-01-01
Phosphatidylethanol (PEth) is an alcohol biomarker formed in the presence of ethanol in the body. Both due to its specificity and because it has a detection window of up to several weeks after alcohol intake, its application potential is broader than for other ethanol biomarkers. The aim of this study was to develop and validate a robust method for PEth in whole blood with fast and efficient sample extraction and a short analytical runtime, suitable for high throughput routine purposes. A validated ultra-performance liquid chromatography tandem mass spectrometry (UPLC®-MSMS) method for quantification of PEth 16:0/18:1 in the range 0.05-4.00 μM (R2 ≥ 0.999) is presented. PEth 16:0/18:1 and the internal standard (IS) PEth-d5 (0.55 μM), were extracted from whole blood (150 μL) by simple protein precipitation with 2-propanol (450 μL). Chromatography was achieved using a BEH-phenyl (2.1 × 30 mm, 1.7 μm) column and a gradient elution combining ammonium formate (5 mM, pH 10.1) and acetonitrile at a flow rate of 0.5 mL/min. Runtime was 2.3 min. The mass spectrometer was monitored in negative mode with multiple reaction monitoring (MRM). The m/z 701.7 > 255.2 and 701.7 > 281.3 transitions were monitored for PEth 16:0/18:1 and the m/z 706.7 > 255.3 for PEth-d5. Limit of quantification was 0.03 μM (coefficient of variation, CV = 6.7%, accuracy = 99.3%). Within-assay and between-assay imprecision were 0.4-3.3% (CV ≤ 7.1%). Recoveries were 95-102% (CV ≤ 4.9%). Matrix effects after IS correction ranged from 107% to 112%. PEth 16:0/18:1 in patient samples were stable for several days at 30°C. Repeated freezing (-80°C) and thawing did not affect the concentration. After thawing and analysis patient samples were stable at 4-8°C for at least 4 weeks. Results from a proficiency test program, showing |Z| values ≤1.2, confirm the validity of the method. Analysis of the first 3,169 samples sent to our laboratory for routine use has demonstrated its properties as a robust method suitable for high throughput purposes.
ImatraNMR: Novel software for batch integration and analysis of quantitative NMR spectra
NASA Astrophysics Data System (ADS)
Mäkelä, A. V.; Heikkilä, O.; Kilpeläinen, I.; Heikkinen, S.
2011-08-01
Quantitative NMR spectroscopy is a useful and important tool for analysis of various mixtures. Recently, in addition of traditional quantitative 1D 1H and 13C NMR methods, a variety of pulse sequences aimed for quantitative or semiquantitative analysis have been developed. To obtain actual usable results from quantitative spectra, they must be processed and analyzed with suitable software. Currently, there are many processing packages available from spectrometer manufacturers and third party developers, and most of them are capable of analyzing and integration of quantitative spectra. However, they are mainly aimed for processing single or few spectra, and are slow and difficult to use when large numbers of spectra and signals are being analyzed, even when using pre-saved integration areas or custom scripting features. In this article, we present a novel software, ImatraNMR, designed for batch analysis of quantitative spectra. In addition to capability of analyzing large number of spectra, it provides results in text and CSV formats, allowing further data-analysis using spreadsheet programs or general analysis programs, such as Matlab. The software is written with Java, and thus it should run in any platform capable of providing Java Runtime Environment version 1.6 or newer, however, currently it has only been tested with Windows and Linux (Ubuntu 10.04). The software is free for non-commercial use, and is provided with source code upon request.
Application of Bounded Linear Stability Analysis Method for Metrics-Driven Adaptive Control
NASA Technical Reports Server (NTRS)
Bakhtiari-Nejad, Maryam; Nguyen, Nhan T.; Krishnakumar, Kalmanje
2009-01-01
This paper presents the application of Bounded Linear Stability Analysis (BLSA) method for metrics-driven adaptive control. The bounded linear stability analysis method is used for analyzing stability of adaptive control models, without linearizing the adaptive laws. Metrics-driven adaptive control introduces a notion that adaptation should be driven by some stability metrics to achieve robustness. By the application of bounded linear stability analysis method the adaptive gain is adjusted during the adaptation in order to meet certain phase margin requirements. Analysis of metrics-driven adaptive control is evaluated for a second order system that represents a pitch attitude control of a generic transport aircraft. The analysis shows that the system with the metrics-conforming variable adaptive gain becomes more robust to unmodeled dynamics or time delay. The effect of analysis time-window for BLSA is also evaluated in order to meet the stability margin criteria.
Innovative Active Networking Services
2004-03-01
implementation of the ML programming language and runtime system. OCaml offers a programming environment that can be formally analyzed; 3. University... language such as Java or OCaml . A typical PLANet (PLAN Active network) node would look as in Figure 1. The University of Kansas /ITTC 6 Innovative... language . Hence we will be discussing it alone. 2.1.2 OCaml OCaml provides several of the design goals required for a service level language . Some of
Real-time Cooperative Behavior for Tactical Mobile Robot Teams
2001-02-01
control of multirobot missions. In particu- lar he used videogame scenarios to develop these skills, which might account for the intuition that those...to develop the following innovative research results for tacti- cal mobile robot teams: 1. A suite of new fault-tolerant reactive behaviors, 2. A...depicts the overall system architecture developed for this effort. It contains 3 major subsystems: Executive, Premission, and Runtime. The executive
Run-Time Support for Rapid Prototyping
1988-12-01
prototyping. One such system is the Computer-Aided Proto- typing System (CAPS). It combines rapid prototypng with automatic program generation. Some of the...a design database, and a design management system [Ref. 3:p. 66. By using both rapid prototyping and automatic program genera- tion. CAPS will be...Most proto- typing systems perform these functions. CAPS is different in that it combines rapid prototyping with a variant of automatic program
2016-03-01
calculated dependency graph, which is used by the game logic to populate the game interface with various “clues”. The math runtime module is shown in...variable dependency data described in Section 3.3. In the game narrative, the “energy signature” table (See Figure 25) that delivers this...introduction of data dependency clues (see “energy signatures” above), which replaced a presentation of the FSAs that was included in the Phase One game . We
Data-Adaptable Modeling and Optimization for Runtime Adaptable Systems
2016-06-08
execution scenarios e . Enables model -guided optimization algorithms that outperform state-of-the-art f. Understands the overhead of system...the Data-Adaptable System Model (DASM), that facilitates design by enabling the designer to: 1) specify both an application’s task flow as well as...systems. The MILAN [3] framework specializes in the design, simulation , and synthesis of System On Chip (SoC) applications using model -based techniques
An Overview of ARL’s Multimodal Signatures Database and Web Interface
2007-12-01
ActiveX components, which hindered distribution due to license agreements and run-time license software to use such components. g. Proprietary...Overview The database consists of multimodal signature data files in the HDF5 format. Generally, each signature file contains all the ancillary...only contains information in the database, Web interface, and signature files that is releasable to the public. The Web interface consists of static